Tag Archives: Case Studies

Monitoring MDM Certificates with Lab9 Pro and Zabbix

Post Syndicated from Michael Kammer original https://blog.zabbix.com/monitoring-mdm-certificates-with-lab9-pro-and-zabbix/31621/

Lab9 Pro is the B2B division of Lab9, Belgium’s leading Apple Premium Partner. With over 30 years of experience, Lab9 Pro specializes in integrating and supporting Apple systems within businesses, educational institutions, and public organizations. Beyond Apple expertise, Lab9 Pro also designs, implements, and maintains complete IT infrastructures, including networks, servers, storage, and security solutions.

The challenge

It’s impossible to manage devices at organizations without the use of a good MDM (Mobile Device Management) system such as Jamf. As the leading provider of Apple device management solutions, Jamf empowers organizations to deploy, manage, and secure Apple devices at scale.

Even in smaller organizations Jamf is the right solution, as small and medium-sized enterprises (SMEs) often lack the resources to manage their MDM systems. Offering an MSP model solves a lot of problems for these customers.

For Apple device management, the typical customer has a few certificates issued by Apple, which require approval of the user agreement by the Apple business or school manager. Without getting too technical about Apple Device management, depending on the customer the certificates need to be renewed on different dates. If the user agreement is not approved, automated device enrollment will stop working.

Lab9 Pro found themselves needing to check all certificates and user agreements for MSP customers manually, which involved an unacceptably high error rate that often caused discontinuity of the MDM system.

The solution

Lab9 Pro were already using Zabbix to monitor customer environments and their own infrastructure, including storage, firewalls, switches, and more. Because Zabbix offers a wide variety of options that make it possible to monitor almost anything, it was only logical to explore whether Zabbix could also be used to monitor the MDM certificates.

The research phase

Step one was to check the availability of certificate information. Unfortunately, Apple Business Manager’s API did not help much, as it does not provide certificate details. Instead, the team at Lab9 Pro investigated the Jamf API.

Although it doesn’t directly return certificate information either,  they found something even more useful – Jamf’s API provides customer instance notifications. These include alerts when certificates (VPP, PUSH, DEP, etc.) are about to expire (typically 10 days in advance) as well as when the Device Enrollment Program (user agreement) is not approved.

Zabbix implementation

Since Lab9 Pro manages multiple MSP tenants, they created a dedicated Zabbix template. This template includes both pre-filled and empty macros:

Pre-filled macros:

• {$JAMF.AUTH.INTERVAL}: Interval for retrieving the bearer token
• {$JAMF.NOTIF.INTERVAL}: Interval for retrieving Jamf notifications
• {$JAMF.PATH.AUTH}: API path for retrieving the bearer token
• {$JAMF.PATH.NOTIFICATIONS}: API path for retrieving Jamf notifications

Empty macros:

• {$JAMF.URL}: Jamf URL
• {$JAMF.API.USER}: Jamf user account for authentication
• {$JAMF.API.PASSWORD}: Jamf password (stored as a secret value)

The team configured an item to perform an API call to retrieve the bearer token. A preprocessing rule in JavaScript stores this token in a variable. Discovery rules proved very useful for executing API calls to retrieve Jamf notifications using the bearer token. This was achieved by configuring preprocessing steps and Low-Level Discovery (LLD) macros to pass the Jamf URL and bearer token. Trigger prototypes for each certificate were also added within the same discovery rule.

The results

Whenever a certificate is nearing expiration, a problem is automatically displayed on Lab9 Pro’s Zabbix dashboard, which is visible on TV screens placed throughout their office in order to make sure the entire team is aware of upcoming certificate renewals.

Since Lab9 Pro began monitoring MDM certificates through the Jamf API, they have experienced zero expired certificates, which in turn has allowed them to avoid situations where devices become unmanaged and require a full setup again.

Zabbix makes it possible for Lab9 Pro to keep their clients’ MDM systems operational, while allowing them to either proactively inform them when certificates need to be renewed or handle the renewal process on their behalf.

The post Monitoring MDM Certificates with Lab9 Pro and Zabbix appeared first on Zabbix Blog.

The ATS Group and a Regional Telecom Provider

Post Syndicated from Michael Kammer original https://blog.zabbix.com/the-ats-group-and-a-regional-telecom-provider/29671/

Our Premium Partners at the ATS Group have a regional telecom provider on the West Coast of the United States as one of their key clients. The provider covers a massive geographical area on a limited budget and serves thousands of (primarily rural) customers.

The Challenge

After recent price hikes by the “big-box” monitoring solutions, the provider needed an alternative with a more stable pricing model. Simply put, their budget was shrinking, but their software monitoring costs were expanding.

The provider had a large stock of non-traditional IT equipment that all needed to be monitored effectively, and they also had only one month to get all monitored devices and endpoints over to a new solution.

On top of that, many of the provider’s legacy systems were directly related to regulatory compliance and therefore needed to be operational from day one.

The Solution

The provider set about migrating to a complete and robust Zabbix 7.0 solution that would eliminate any foreseeable issues – even the loss of an entire data center.

There were a few initial hiccups in the implementation when it came to getting PostgreSQL set up with database proxies, but the ATS Group team quickly arrived at an architecture that the provider was happy with. The clear and easy-to-follow Zabbix documentation was of particular help.

The Results

The new Zabbix solution, as implemented, was able to monitor a number of things that had previously been challenging, including:

• Doors. The provider badly needed a solution for monitoring doors, including entrance and exit doors as well as cabinet doors in data centers. They had long-term compliance issues with doors sticking open, employees forgetting to close doors, etc. Zabbix made it easy to develop custom SNMP traps that send alerts in case of open doors, solving the issue.

• Weather. The provider’s services are available over a large and varied geographical area that encompasses multiple states. The ability of Zabbix to predict weather changes across this area has been an important added bonus, with the provider now being able to get future weather alerts that can be used to compare against equipment tolerance levels. Personnel can then be sent to affected areas in anticipation of weather events, instead of being purely reactive.

• SLAs. The provider functions as an ISP that provides internet access to customers in rural areas, many of whom may not have other means of accessing the world around them. As such, they not only feel a strong sense of duty to provide consistent uptime, but they are bound by a strict set of service level agreements (SLAs). With Zabbix, it’s possible to provide SLAs for some of the remote edge equipment involved by building an integration with ServiceNow.

In conclusion

The telecom provider in question trusts Zabbix to guarantee rural broadband access for thousands of customers over an enormous geographic area. Zabbix not only gets the job done more effectively than other monitoring solutions, it does so at a fraction of the cost.

The post The ATS Group and a Regional Telecom Provider appeared first on Zabbix Blog.

Migration to Zabbix 7.0

Post Syndicated from Rogerio Batista original https://blog.zabbix.com/migration-to-zabbix-7-0/29594/

Based in northern Brazil, TO HOST Data Centers provides regional cloud services with a focus on cloud computing, colocation, and infrastructure management. With 35 suppliers and partners and over 5,000 monitored assets, their mission is to provide innovative IT infrastructure products and services with a high level of proficiency, in order to meet the high standards required by their clients and partners. To do this, they need to monitor internal applications, data center assets, devices, and customer environments, ensuring high availability and optimal performance.

The challenge:

TO HOST’s monitoring environment included a standalone server (Zabbix, FrontEnd, Database) with the following:

  • Hosts: ~600
  • Itens/Metrics: ~90.000
  • Average period for history table: 45~60 days
  • Average period for trends table: 365 days
  • Average period for events table: 365 days
  • 3 Internal Proxies
  • 8 Client Proxies
  • ~30 External Active Agents

TO HOST needed a clean installation of Zabbix Server and Zabbix Proxy version 7.0.x on separate virtual machines with an updated operating system (Oracle 9), plus a migration of the current monitoring environment database to the new version, while preserving history and data integrity.

Their production servers were outdated, featuring a CentOS 7 version that was originally installed with Zabbix version 5.2.x and updated to version 6.0.x in 2022. The migration needed to retain historical data and ensure compatibility with Zabbix 7.0.x, while keeping service interruptions to a minimum.

A number of risks were anticipated and planned for – during the data migration process, it was understood that there may be failures in migrating the database due to version incompatibility and that there was a distinct possibility of collection failures that would require corrections after migration, if any data sources were not properly mapped.

All graphs needed to be reviewed and optimized to take advantage of the new widget models and improvements in Zabbix 7.0. Due to the changes in data sources (and because of the migration to a new operating system and a new version of the Zabbix Server) there was potential version incompatibility.

Directories containing custom scripts and images were mapped and files were copied in order to ensure integrity, and the TO HOST team was prepared for possible service interruptions during the upgrade process, standing ready to notify users about the planned maintenance and creating procedures to minimize the impact.

The solution:

Step one was to make sure that the change to Zabbix 7.0 was appropriately planned. A change schedule was created, and all relevant stakeholders were notified of the operation. A virtualized environment was then set up on Oracle 9, in order to guarantee a clean installation.

Once that was done, Zabbix 7.0 was installed, keeping in mind that the imported database could not exist on the new server. Next up was a full backup and the cloning of the database for integrity validation pre-migration. At this point, the To Host team stopped the data collection service, started the backup, and started restore.

From that point, it became a simple matter of carrying out automated database versioning and data source mapping corrections. The data mapping during the Zabbix 7.0 migration involved updating the database structure to meet the new version’s requirements, such as changes to MySQL instances, fields, and storage formats.

Data mapping in the Zabbix migration process involved the following:

  • Database Version: During migration, the database structure changed to align with the requirements of Zabbix 7.0. This included different versioning of MySQL instances, as well as modifications to fields, tables, and storage formats within the database.
  • Import and Update Process: The legacy database (version 6) was exported and then imported into the new Zabbix 7.0 installation. During the process, Zabbix ran automatic update scripts to convert the old database into the new format.
  • Data Sources: Each item monitored in Zabbix was associated with a unique key (item key) that defined how data was collected and processed. No changes were identified in this process.
  • Tools and Validations: Mapping validation was performed during the import/restore process, where error logs indicated inconsistencies. During testing, inconsistencies were found in the validation, requiring a command to update the keys replicated on the migration.

Data collection services were then restarted, and all stakeholders were notified of the completion of the change.

The results:

Zabbix 7.0’s new dashboards and improved visual configuration have increased the satisfaction of internal customers, while having a tangible impact on operational efficiency and customer satisfaction.

The implementation and management of Zabbix 7.0 has enhanced the continuous visibility and integrity of TO HOST’s IT systems, enabling real-time monitoring and alerting, facilitating proactive issue resolution, and guaranteeing optimal infrastructure performance.

Many users have noted that the asynchronous polling method used in Zabbix 7.0 significantly reduces the time taken for metric collection. This allows for faster incident detection and resolution in TO HOST’s critical environment, while the addition of multi-factor authentication and improved access controls has helped to enhance security in monitoring environments and keep cyber threats at bay.

TO HOST’s future plans include exploring advanced Zabbix 7.0 features and continuous performance monitoring. A roadmap is already in place to leverage the additional automation and security enhancements that Zabbix 7.0 can provide.

The post Migration to Zabbix 7.0 appeared first on Zabbix Blog.

Case Study: Monitoring Railway Infrastructure for Infrabel

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/case-study-monitoring-railway-infrastructure-for-infrabel/28035/

Infrabel is a government-owned public limited company that builds, owns, maintains, and upgrades the Belgian railway network, makes its capacity available to railway operator companies, and handles train traffic control. Headquartered in Brussels, Infrabel employs over 9,000 people and manages 3,602 kilometers of rail lines.

The challenge

Infrabel needed a monitoring solution that was flexible enough to manage not only infrastructure, but also OS level metrics, data centers, service and application states, and the availability of railway infrastructure components.

The solution

To begin with, Zabbix agents are deployed on railway station screens and broadcasting systems. This is possible because under the hood these pieces of hardware they run Debian OS, which means they can be monitored on the OS level by Zabbix agents right out of the box with our official templates.

This can be very easily automated together with low level discovery, autoregistration, or network discovery. Devices can be pinged from Zabbix proxies or Zabbix servers to check if they are available. If they are unavailable, Zabbix sends a notification, after which an engineer either restores the network connectivity or replaces the hardware.

In addition, Infrabel also uses Zabbix to retrieve and monitor data collected from ActiveMQ. This is where a combination of custom bash scripts and Zabbix sender is used, so the required data (also related to the railway infrastructure and data centre, hardware, and software) is retrieved from ActiveMQ via Bash script, then forwarded to Zabbix sender via a wrapper script, sent to the Zabbix server or proxy, stored and analyzed in Zabbix, and acted upon if required.

The results

Infrabel found that they could get the most out of Zabbix by integrating it with a third-party ticketing system they were already using. The integration itself is simple – when Zabbix generates a problem, the Zabbix API is then used to retrieve the problems related to a particular set of triggers that need to be forwarded to this third-party system.

These alerts are then forwarded via API to whatever system Infrabel requires – Zabbix has a variety of integrations available right out-of-the-box using web hooks, including Slack, JIRA, Microsoft Teams, and many others. Messengers can also be used with Zabbix, but Infrabel has opted to use Zabbix API for their custom ticketing solution.

In conclusion

Infrabel is the perfect example of how the flexibility of Zabbix allows it to adapt to any industry or need. The possibility to use Zabbix API, web hooks, or a combination of both was a game-changer for Infrabel – just as it could be for any customer in any industry.

You can learn more about what we can do for customers across a variety of industries by visiting our website or requesting a demo.

The post Case Study: Monitoring Railway Infrastructure for Infrabel appeared first on Zabbix Blog.

Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021

Post Syndicated from Wolfgang Alper original https://blog.zabbix.com/zabbix-meets-television-clever-use-of-zabbix-features-by-wolfgang-alper-zabbix-summit-online-2021/19181/

TV broadcasting infrastructures have seen many great paradigm shifts over the years. From TV to live streaming – the underlying architecture consists of many moving parts supplied by different vendors and solutions. Any potential problems can cause critical downtimes, which are simply not acceptable. Let’s look at how Zabbix fits right into such a dynamic and ever-changing environment.

The full recording of the speech is available on the official Zabbix Youtube channel.

In this post, I will talk about how Zabbix is used in ZDF – Zweites Deutsche Fernsehen (Second German Television). I will specifically focus on the most unique and interesting use cases, and I hope that you will be able to use this knowledge in your next project.

ZDF – Some history

Before we move on with our unique use cases, I would like to introduce you to the history of ZDF. This will help you understand the scope and the potential complexity and scale of the underlying systems and company policies.

  • In 1961, the federal states established a central non-profit television broadcaster – Zweites Deutsches Fernsehen
  • In 1963 on April 1, ZDF officially went on air and had reached 61 percent of television viewers
  • On the Internet, a selection of programs is offered via live stream or video-on-demand through the ZDFmediathek, which has been in existence since 2001
  • Since February 2013, ZDF has been broadcasting its programs around the clock as an internet live stream
  • As of today, ZDF is one of the largest public broadcasters in Europe with permanent bureaus worldwide and is also present on various platforms like Youtube, Facebook, etc.

Here we can see that over the years, ZDF has made some major leaps – from a television broadcaster with the majority percentage of viewers to offering on-demand video service and moving to 24/7 internet live streams. ZDF has also scaled up its presence along with multiple different digital platforms as well as its physical presence all over the globe.

Integrating Zabbix with an external infrastructure monitoring system

In our first use case, we will cover integrating Zabbix with an external infrastructure monitoring system. As opposed to monitoring IT metrics like hard drive space, memory usage, or CPU loads – this external system is responsible for monitoring devices like power generators, transmission stations, and other similar components. The idea was to pass the states of these components to Zabbix. This way, Zabbix would serve as a central “Umbrella” monitoring system.

In addition, the components that are monitored by the external system have states and severities, but the severities are not static and can vary depending on the monitored component. What this means is that each component could generate problems of varying severities. We had to figure out a way to assign the correct severities to each of the external components. Our approach was split into multiple steps:

  • Use Zabbix built-in HTTP check to get LLD discovery data
    • The external monitoring system provides an API, which we can use to obtain the necessary LLD information by using the HTTP checks
    • Zabbix-sender was used for testing since the HTTP items support receiving data from it
  • Use Zabbix built-in HTTP check as a collector to obtain the component status metrics
  • Define item prototypes as dependant items to extract data from collector item
  • Create “smart “trigger prototypes to respect severity information from the LLD data

The JSON below is an example of the LLD data that we are receiving from the external monitoring systems. In addition to component names, descriptions, and categories, we are also providing the severity information. The severities that have a value of -1 are not used, while other severities are cross-checked with the status value retrieved from the returned metrics:

{
"{#NAME}": "generator-secondary",
"{#DISPLAYNAME}": "Secondary power generator",
"{#DESCRIPTION}": "Secondary emergency power generator",
"{#CATEGORY}": "Powersupply",
"{#PRIORITY.INFORMATION}": -1,
"{#PRIORITY.WARNING}": -1,
"{#PRIORITY.AVERAGE}": -1,
"{#PRIORITY.HIGH}": 1,
"{#PRIORITY.DISASTER}": 2
}

Below we can see the returned metrics – the component name and its current status. For example, status = 1 value references the {#PRIORITY.HIGH} from the LLD JSON data.

"generator-primary": {
"status": 0,
"message": "Generator is healthy."
},
"generator-secondary": {
"status": 1,
"message": "Generator is not working properly."
},

We can see that the first generator returns status = 0, which means that the generator is healthy and there are no problems, while the secondary generator is currently not working properly – status = 1 and should generate a problem with severity High.

Below we can see how the item prototypes are created for each of the components – one item prototype collects the message information, while the other collects the current status of the component. We use JSONPath preprocessing to obtain these values from our master item.

As for the trigger prototypes – we have defined a trigger prototype for each of the trigger severities. The trigger prototypes will then create triggers depending on the information contained in the LLD macros for a given component.

As you can see, the trigger expressions are also quite simple – each trigger simply checks if the last received component status matches the specific trigger threshold status value.

The resulting metrics provide us both the status value and the component status message. As we can see, the triggers are also generating problems with dynamic severities.

Improving the solution with LLD overrides

The solution works – but we can do better! You might have already guessed the underlying issue with this approach: our LLD rule creates triggers for every severity, even if it isn’t used. The threshold value for these unused triggers will use value -1, which we will never receive, so the unused triggers will always stay in the OK state. Effectively – we have created 5 trigger definitions, while in our example, we require only 2 triggers.

How can we resolve this? Thankfully, Zabbix provides just the right tool for the job – LLD Overrides! We have created 5 overrides on our discovery rule – one for each severity:

In the override conditions, we will specify that if the value contained in the priority LLD macros is equal to -1, we will not be discovering the trigger of the specific severity.

The final result looks much cleaner – now we have only two trigger definitions instead of five. 

 

This is a good example of how we can use LLD together with master items obtaining data from external APIs and also improve the LLD logic by using LLD overrides.

“Sphinx” application monitoring using Graylog REST API

For our second example, we will be monitoring the Sphinx application by using the Graylog REST API. Graylog is a log management tool that we use for log collection – it is not used for any kind of alerting. We also have an application called Sphinx, which consists of three components – a Web component, an App component, and a WCF Gateway component. Our goal here is to:

  • Use Zabbix for evaluating error messages related to Sphinx from Graylog
  • Monitor the number of errors in user-defined time intervals for different components and alert when a threshold is exceeded
  • Analyze the incoming error message and prepare them for a user-friendly output sorted by error types

The main challenges posed by this use-case are:

  • How to obtain Sphinx component information from Graylog
  • How to handle certificate problems (DH_KEY_TOO_SMALL / Diffie-Hellman key) due to an outdated version of the installed Graylog server
  • How to sort the error messages coming in “Free form” without explicit error types

Collecting the data from Graylog

Since the Graylog application used in the current scenario was outdated, we had to work around the certificate issues by using the Zabbix external check item type. Once again, we will be using master and dependent item logic – we will create three master items (one for each component) and retrieve the component data. All additional information will be retrieved by the dependent items as to not cause extra performance impact by flooding the Graylog API endpoint. The data itself was parsed and sorted by using Javascript preprocessing. The dependent item prototypes are used here to create the items for the obtained stats and the data used for visualizing each error type on a user-friendly dashboard.

Let’s take a look at the detailed workflow for this use case:

  • An External check for scanning the Graylog stream Sphinx App Raw
  • A dependent item which analyzes and filters the raw data by using preprocessing Sphinx App Raw Filtered
  • This dependent item is used as a master item for our LLD Sphinx App Error LLD
  • The same dependent item is also used as a master item for our item prototypes – Sphinx App Error count and Sphinx App Error List

Effectively this means that we perform only a single call to the Graylog API, and all of the heavy lifting is done by the dependent item in the middle of our workflow.
The following workflow is used to obtain the information only about the App component – remember, we have two other components where this will have to be implemented – Web and Gateway.

In total, we will have three master items for each of the APP components:

They will use the following shell script to execute the REST API call to the Graylog API:

graylog2zabbix.sh[{$GRAYLOG_USERNAME},{$GRAYLOG_PASSWORD},{HOST.CONN},{$GRAYLOG_PORT},search/universal/relative?
query=name%3Asphinx-app%20AND%20stage%3Aproduction%20AND%20level%3A(ERROR%20OR
%20FATAL)&range=1800&limit=50&filter=streams%3A60000a8c1c09f9862279966e&fields=name%2Clevel
%2Cmessage&decorate=true]

The data that we obtain this way is extremely hard to work with without any additional processing. It very much looks like a set of regular log entries – this complicates the execution of any kind of logic in reaction to receiving this kind of data:

For this reason, we have created a dependent item, which uses preprocessing to filter and sort this data. The dependent item preprocessing is responsible for:

  • Analyzing the error messages
  • Defining the error type
  • Sorting the raw data so that we can work with it more easily

We have defined two preprocessing steps to process this data. We have the JSONPath preprocessing step to select the message from the response and a Javascript preprocessing script that does the heavy lifting. You can see the Javascript script below. It uses Regex and performs data preparation and sorting. In the last line, you can see that the data is transformed back into JSON, so we can work with it down the line by using the JSONpath preprocessing steps for our dependent items.

Below we can see the result. The data stream has been sorted and arranged by error types, which you can see on the left-hand side. All of the logged messages are now children that belong to one of these error types.

We have also created  3 LLD rules – one for each component. These LLD rules create items for each error type for each component. To achieve this, there is also some additional JSONPath and Javascript preprocessing done on the LLD rule itself:

The end result is a dashboard that uses the collected information to display the error count per component. Attached to the graph, we can see some additional details regarding the log messages related to the detected errors.

Monitoring of TV broadcast trucks

I would like to finish up this bost by talking about a completely different use case – monitoring of TV broadcast trucks!

In comparison to the previous use cases – the goals and challenges here are quite unique. We are interested in a completely different set of metrics and have to utilize a different approach to obtain them. Our goals are:

  • Monitor several metrics from different systems used in the TV broadcast truck
  • Monitor the communication availability and quality between the broadcast truck and the transmitting station
  • Only monitor the broadcast truck when it is in use

One of the main challenges for this use case is avoiding false alarms. How can we avoid false positives if a broadcast truck can be put into operation at any time without notifying the monitoring team? The end goal is to monitor the truck when it’s in use and stop monitoring it when it’s not in use.

  • Each broadcast truck is represented by a host in Zabbix – this way, we can easily put it into maintenance
  • A control host is used to monitor the connection states of all broadcasting trucks
  • We decided on creating a middleware application that would be able to implement start/stop monitoring logic
    • This was achieved by switching the maintenance on/off by using the Zabbix API
  • A specific application in the broadcasting truck then tells Zabbix how long to monitor it and when to enable the maintenance for the said truck

Below we can see the truck monitoring workflow. The truck control host gets the status for each truck to decide when to start monitoring the truck. The middleware then starts/stops the monitoring of a truck by using Zabbix API to control the maintenance periods for the trucks. Once a truck is in service, it also passes the monitoring duration to the middleware, so the middleware can decide when the monitoring of a specific truck should be turned off.

Next, let’s look at the truck control workflow from the Zabbix side.

  • Each broadcast truck is represented by a single trigger on the control host
    • The trigger actions forward the information that the truck maintenance period should be disabled to the middleware
  • Middleware uses the Zabbix API to disable the maintenance for the specific truck
  • The truck is now monitored
  • The truck forwards the Monitoring duration to the middleware
  • Once the monitoring duration is over, the middleware enables the maintenance for the specific truck

Finally, the trucks are displayed on a map which can be placed on our dashboards. The map displays if the truck is maintenance (not active) and if it has any problems. This way, we can easily monitor our broadcast truck fleet.

From gathering data from external systems to performing complex data transformations with preprocessing and monitoring our whole fleet of broadcast trucks – I hope you found these use cases useful and were able to learn a thing or two about the flexibility of different Zabbix features!

The post Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Zabbix 5.0 – My happiness and disenchantment

Post Syndicated from Dennis Ananiev original https://blog.zabbix.com/zabbix-5-0-my-happiness-and-disenchantment/14107/

Zabbix is an open-source solution, and all features are available out of the box for free. You don’t have to pay for the pro, or business, or community versions. You can download Zabbix source files or packages from the official site and use them in your enterprise or your home lab, test and apply or even suggest your changes. Zabbix offers many new features in every release, and it’s an excellent approach to interact with the community. This post will share my experience with Zabbix and my opinion of improvements made in Zabbix 5.2.

Contents

I. Pros (3:49)

    1. Global view Dashboard (3:49)
    2. Host configuration (7:19)
    3. Discovery rules (11:56)
    4. Maintenance (15:46)

II. Cons (20:13)

Pros

Global view Dashboard

Improvements start from the central Zabbix 5.2 dashboard — it’s totally different from the earlier versions. Now it looks more clear and user-friendly.

Global view Dashboard

Now, we have a hiding vertical menu. Since this is a Global view dashboard, we can see hosts by availability and problems by the severity level (we didn’t have this opportunity in earlier versions), as well as system information.

From the Global view dashboard, you can configure the widgets. For instance, you can choose how many lines you can see in the problems panel.

Configuring widgets in the Dashboard

In earlier versions, you could see only 20 problems in your Dashboard, and you could change this parameter only in the Zabbix source code if you had some PHP knowledge. Now you can choose how many problems you display in the Show line field. This is really convenient as you might have a really enormous infrastructure and almost 200 problems per day filling in the Dashboard. In earlier versions, if the Zabbix Server was down, you could not see the previous problems without opening the menu “Last values”. Now you can choose the number of problems to display. In addition, you can choose to display the problems of a certain severity level only or to display only tags. For duty admins, it’s pretty good to see operational data with problems and show unacknowledged only.

This is convenient to Zabbix engineers or admins as sometimes admins monitor only certain parts of the infrastructure: some servers, databases, or middleware levels. In this case, you can choose to display Host groups or Tags for different layers. Then all you need is to click Apply.

Host configuration

There are many other configuration options that make the life of an engineer more comfortable. For instance, in Configuration > Hosts, new features are available.

New Hosts configuration

  • Here, as opposed to the earlier Zabbix versions, you can filter hosts by a specific proxy or specific tags. This made it hard to understand, which proxy was monitoring a specific host, especially if you were monitoring, for instance, one or two thousand hosts. The new feature saves you a lot of time as you don’t have to open other pages and try to find the necessary information.
  • Another new feature in the Hosts dashboard is the improved Items configuration.

Improved Items configuration

Here, if you click any item, for instance, the one collecting CPU data, you can now use the new Execute now and Test buttons to test values without waiting for an update interval.

New Execute now and Test buttons

So, if you click Test > Get value and test, you can get the value from a remote host immediately.

Using Get value and test button

Clicking the Test button, you can also check the correct Type for your data collection. Execute now allows you to pull a request to the remote host and return data back without waiting for a response, and immediately find the required information in the Latest data without waiting for an update interval.

Requesting data without waiting for update interval

You normally don’t need to collect data such as hostname or OS name very often. Such data is collected once per day or once per hour. However, you might not want to stay online waiting for collection. So, you can click Execute now and collect the data immediately.

NOTE. Execute now and Test buttons are available only starting from Zabbix 5.x.

Discovery rules

  • Another Zabbix configuration tool — Discovery rules were also improved. Previously, if we needed to discover some data, for instance, from a Linux server, such as Mounted filesystem discovery or Network interface discovery, we had to stay online and wait for the data to be collected. Now with Execute now and Test buttons, you don’t have to wait for the stated update interval and get values immediately.

New Discovery rules options

So, if you click Get value and test, you immediately get all data Types and all file system names for all partitions on the server, as well s JSON array. Here, you can check what data you do and don’t need and then exclude certain data using regular expressions. It’s a really big achievement to add the ability Test and Execute Now button everywhere because it makes system more complex and dynamic.

  • In earlier Zabbix versions, in Item prototypes, we couldn’t change anything in bulk. You had to open each of the items, for instance, Free nodes or Space utilization, and change what you need for each of them. Now, you can check All items box and use Mass update button.

Mass update for Items prototype

For instance, we can change all update intervals for all items at once.

Changing all update intervals at once

Previously, we could mass update only items and some triggers, while now we can use Mass update for item prototypes as well. Item prototypes are used very often in our everyday operations, for instance, to discover data by SNMP as SNMP is collecting data for network or storage devices where item prototypes are really important. For instance, NetApp storage may have about 1,500 items, and it is really difficult to change update interval history for such an enormous number of items. Now, you just click Mass update, change parameters for item prototypes, and apply changes to all items at once.

Maintenance

Maintenance has been a headache for many Zabbix engineers and administrators for ages. In Zabbix 4.2, we had three Maintenance menus: Maintenance, Periods, and Hosts and groups.

Maintenance settings in earlier Zabbix versions

Windows or Linux administrators using Zabbix only for monitoring their stuff could just select the period using Active since and Active till and didn’t know what to do if data collection and maintenance didn’t work correctly. For instance, if we started replacing RAM in the data center at 8 a.m. and spent two hours, we could set Active till to 10 a.m. However, surprisingly, it didn’t work.

In Zabbix 5.x, the team used a different approach — a separate menu for all items, which previously was displayed in three separate tabs.

Now you can set up all parameters in one window.

Improved Maintenance settings

NOTE. In most cases, Active since and Active till don’t work correctly for setting up downtime. To set up the downtime, the Period field should be used to choose Period type, date, and the number of days or hours needed to fix RAM in our example.

 

Maintenance period settings

Setting downtime period due to maintenance

This change is not intuitive; however, you should put attention to your Maintenance period settings when receiving calls from your admins and engineers about maintenance alerts. In addition, Maintenance period settings are more detailed, so you just need to practice selecting the required parameters. However, this is the question to the Zabbix team to make these parameter settings more user-friendly.

Cons

Unfortunately, some problems have been inherited from the earlier Zabbix versions.

  • For instance, in Administration > Users you still can’t change any parameters or clone users with the same characteristics, you have to create each user separately. If you have a thousand users, this will give you a headache to create all of them manually if you don’t know much about Zabbix API or Ansible.

Limited Users setting options

  • In addition, Zabbix doesn’t have any mechanisms for importing LDAP/SAML users and LDAP?SAML groups. It is still hard to create and synchronize this account with, for instance, Active Directory or other service directories. Active Directory administrator might change the users’ surname and move them to some other department, and Zabbix administrator won’t know about this due to this synchronization gap.
  • There are obvious drawbacks to the Zabbix menu. For instance, Hosts are still available under Monitoring, Inventory, and Configuration sections, which might be messy for the newbies as it is difficult to decide, which menu should be used. So, merging these menus will be a step forward to usability.
  • Lastly, in the Configuration > Hosts menu there was a drop-down list for host groups and templates, but in the newest Zabbix only the Select button is left. Now, without the drop-down list, it is tricky for newbies to choose host groups and templates.

Selecting host groups and templates