Tag Archives: Zabbix 6.0 LTS

HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6

Post Syndicated from Mark Vilensky original https://blog.zabbix.com/hpc-monitoring-transitioning-from-nagios-and-ganglia-to-zabbix-6/27313/

My name is Mark Vilensky, and I’m currently the Scientific Computing Manager at the Weizmann Institute of Science in Rehovot, Israel. I’ve been working in High-Performance Computing (HPC) for the past 15 years.

Our base is at the Chemistry Faculty at the Weizmann Institute, where our HPC activities follow a traditional path — extensive number crunching, classical calculations, and a repertoire that includes handling differential equations. Over the years, we’ve embraced a spectrum of technologies, even working with actual supercomputers like the SGI Altix.

Our setup

As of now, our system boasts nearly 600 compute nodes, collectively wielding about 25,000 cores. The interconnect is Infiniband, and for management, provisioning, and monitoring, we rely on Ethernet. Our storage infrastructure is IBM GPFS on DDN hardware, and job submissions are facilitated through PBS Professional.

We use VMware for the system management. Surprisingly, the team managing this extensive system comprises only three individuals. The hardware landscape features HPE, Dell, and Lenovo servers.

The path to Zabbix

Recent challenges have surfaced in the monitoring domain, prompting considerations for an upgrade to Red Hat 8 or a comparable distribution. Our existing monitoring framework involved Nagios and Ganglia, but they had some severe limitations — Nagios’ lack of scalability and Ganglia’s Python 2 compatibility issues have become apparent.

Exploring alternatives led us to Zabbix, a platform not commonly encountered in supercomputing conferences but embraced by the community. Fortunately, we found a great YouTube channel by Dmitry Lambert that not only gives some recipes for doing things but also provides an overview required for planning, sizing, and avowing future troubles.

Our Zabbix setup resides in a modest VM, sporting 16 CPUs, 32 GB RAM, and three Ethernet interfaces, all operating within the Rocky 8.7 environment. The database relies on PostgreSQL 14 and Timescale DB2 version 2.8, with slight adjustments to the default configurations for history and trend settings.

Getting the job done

The stability of our Zabbix system has been noteworthy, showcasing its ability to automate tasks, particularly in scenarios where nodes are taken offline, prompting Zabbix to initiate maintenance cycles automatically. Beyond conventional monitoring, we’ve tapped into Zabbix’s capabilities for external scripts, querying the PBS server and GPFS server, and even managing specific hardware anomalies.

The Zabbix dashboard has emerged as a comprehensive tool, offering a differentiated approach through host groups. These groups categorize our hosts, differentiating between CPU compute nodes, GPU compute nodes, and infrastructure nodes, allowing tailored alerts based on node types.

Alerting and visualization

Our alerting strategy involves receiving email alerts only for significant disasters, a conscious effort to avoid alert fatigue. The presentation emphasizes the nuanced differences in monitoring compute nodes versus infrastructure nodes, focusing on availability and potential job performance issues for the former and services, memory, and memory leaks for the latter.

The power of visual representations is underscored, with the utilization of heat maps offering quick insights into the cluster’s performance.

Final thoughts

In conclusion, our journey with Zabbix has not only delivered stability and automation but has also provided invaluable insights for optimizing resource utilization. I’d like to express my special appreciation for Andrei Vasilev, a member of our team whose efforts have been instrumental in making the transition to Zabbix.

The post HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6 appeared first on Zabbix Blog.

Kubernetes monitoring with Zabbix – Part 3: Extracting Prometheus metrics with Zabbix preprocessing

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/kubernetes-monitoring-with-zabbix-part-3-extracting-prometheus-metrics-with-zabbix-preprocessing/25639/

In the previous Kubernetes monitoring blog post, we explored the functionality provided by the Kubernetes integration in Zabbix and discussed use cases for monitoring and alerting to events in a cluster, such as changes in replicas or CPU pressure.

In the final part of this series on monitoring Kubernetes with Zabbix, we will show how the Kubernetes integration uses Prometheus to parse data from kube-state-metrics and how users can leverage this functionality to monitor the many cloud-native applications that expose Prometheus metrics by default.

Want to see Kubernetes monitoring in action? Watch Part 3 of our Kubernetes monitoring video guide.

Prometheus Data Model

Prometheus is an open-source toolkit for monitoring and alerting created by SoundCloud. Prometheus was the second hosted project to join the Cloud-native Computing Foundation in 2016, after Kubernetes. As such, users of Kubernetes have adopted Prometheus extensively.

Lines in the model begin with or without a pound sign. Lines beginning with a pound sign specify metadata that includes help text and type information. Additional lines follow where the first key is the metric name with optional labels specified, followed by the value, and optionally concluding with a timestamp. If a timestamp is absent, the assumption is that the timestamp is equal to the time of collection.

http_requests_total{job=”nginx”,instance=”10.0.0.1:443”} 15 1677507349983

Using Prometheus with Kubernetes Monitoring

Let’s start with an example from the kube-state-metrics endpoint, installed in the first part of this series. Below is the output for the /metrics endpoint used by the Kubernetes integration, showing the metric kube_job_created. Each metric has help text followed by a line starting with that metric name, labels describing each job, and creation time as the sample value.

# HELP kube_job_created Unix creation timestamp
# TYPE kube_job_created gauge
kube_job_created{namespace="jdoe",job_name="supportreport-supportreport-27956880"} 1.6774128e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-0-27957840"} 1.6774704e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-1-27956280"} 1.6773768e+09
kube_job_created{namespace="jdoe",job_name="activetrials-activetrials-27958380"} 1.6775028e+09
kube_job_created{namespace="default",job_name="core-cache-tags-27900015"} 1.6740009e+09
kube_job_created{namespace="default",job_name="core-cleanup-pipes-27954860"} 1.6772916e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27954060"} 1.6772436e+09
kube_job_created{namespace="default",job_name="core-correlation-cron-1671562914"} 1.671562914e+09
kube_job_created{namespace="jtroy",job_name="jtroy-clickhouse-default-0-maintenance-27613440"} 1.6568064e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-0-27956880"} 1.6774128e+09
kube_job_created{namespace="default",job_name="core-cleanup-sessions-27896445"} 1.6737867e+09
kube_job_created{namespace="default",job_name="report-image-findings-report-27937095"} 1.6762257e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27933900"} 1.676034e+09
kube_job_created{namespace="default",job_name="core-cache-tags-27899775"} 1.6739865e+09
kube_job_created{namespace="ssmith",job_name="test-auto-merger"} 1.653574763e+09
kube_job_created{namespace="default",job_name="report-image-findings-report-1650569984"} 1.650569984e+09
kube_job_created{namespace="ssmith",job_name="auto-merger-and-mailer-auto-merger-and-mailer-27952200"} 1.677132e+09
kube_job_created{namespace="default",job_name="core-create-pipes-pxc-user"} 1.673279381e+09
kube_job_created{namespace="jdoe",job_name="activetrials-activetrials-1640610000"} 1.640610005e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27943980"} 1.6766388e+09
kube_job_created{namespace="default",job_name="core-cache-accounting-map-27958085"} 1.6774851e+09

Zabbix collects data from this endpoint in the “Get state metrics.” The item uses a script item type to get data from the /metrics endpoint. Dependent items that use a Prometheus pattern as a preprocessing step to obtain data relevant to the dependent item are created.

Prometheus and Out-Of-The-Box Templates

Zabbix also offers many templates for applications that expose Prometheus metrics, including etcd. Etcd is a distributed key-value store that uses a simple HTTP interface. Many cloud applications use etcd, including Kubernetes. Following is a description of how to set up an etcd “host” using the built-in etcd template.

A new host is created called “Etcd Application” with an agent interface specified that provides the location of the application API. The interface port does not matter because a macro sets the port. The “Etcd by HTTP” template is attached to the host.

The “Get node metrics” item is the master item that collects Prometheus metrics. Testing this item shows that it returns Prometheus formatted metrics. The master item creates many dependent items that parse the Prometheus metrics. In the dependent item, “Maximum open file descriptors,” the maximum number of open file descriptors is obtained by adding the “Prometheus pattern” preprocessing step. This metric is available with the metric name process_max_fds.

Custom Prometheus Templates

 

While it is convenient when Zabbix has a template for the application you want to monitor, creating a new template for an application that exposes a /metrics endpoint but does not have an associated template is easy.

One such application is Argo CD. Argo CD is a GitOps continuous delivery tool for Kubernetes. An “application” represents each deployment in Kubernetes. Argo CD uses Git to keep applications in sync.

Argo CD exposes a Prometheus metrics endpoint that we can be used to monitor the application. The Argo CD documentation site includes information about available metrics.

In Argo CD, the metrics service is available at the argocd-metrics service. Following is a demonstration of creating an Argo CD template that collects Prometheus metrics. Install Argo CD in a cluster with a Zabbix proxy installed before starting. To do this, follow the Argo CD “Getting Started” guide.

Create a new template called, “Argo CD by HTTP” in the “Templates/Applications” group. Add three macros to the template. Set {$ARGO.METRICS.SERVICE.PORT} to the default of 8082. Set {$ARGO.METRICS.API.PATH} to “/metrics.” Set the last macro, {$ARGO.METRICS.SCHEME} to the default of “http.”

Open the template and click “Items -> Create item.” Name this item “Get Application Metrics” and give it the “HTTP agent” type. Set the key to argocd.get_metrics with a “Text” information type. Set the URL to {$ARGO.METRICS.SCHEME}://{HOST.CONN}:{$ARGO.METRICS.SERVICE.PORT}/metrics. Set the History storage period to “Do not keep history.”

Create a new host to represent Argo. Go to “Hosts -> Create host”. Name the host “Argo CD Application” and assign the newly created template. Define an interface and set the DNS name to the name of the metrics service, including the namespace, if the Argo CD deployment is not in the same namespace as the Zabbix proxy deployment. Connect to DNS and leave the port as the default because the template does not use this value. Like in the etcd template, a macro sets the port. Set the proxy to the proxy located in the cluster. In most cases, the macros do not need to be updated.

Click “Test -> Get value and test” to test the item. Prometheus metrics are returned, including a metric called argocd_app_info. This metric collects the status of the applications in Argo. We can collect all deployed applications with a discovery rule.

Navigate to the Argo CD template and click “Discovery rules -> Create discovery rule.” Call the rule “Discover Applications.” The type should be “Dependent item” because it depends on the metrics collection item. Set the master item to the “Get Application Metrics” item. The key will be argocd.applications.discovery. Go to the preprocessing tab and add a new step called, “Prometheus to JSON.” The preprocessing step will convert the application data to JSON, which will look like the one below.

[{"name":"argocd_app_info","value":"1","line_raw":"argocd_app_info{dest_namespace=\"monitoring\",dest_server=\"https://kubernetes.default.svc\",health_status=\"Healthy\",name=\"guestbook\",namespace=\"argocd\",operation=\"\",project=\"default\",repo=\"https://github.com/argoproj/argocd-example-apps\",sync_status=\"Synced\"} 1","labels":{"dest_namespace":"monitoring","dest_server":"https://kubernetes.default.svc","health_status":"Healthy","name":"guestbook","namespace":"argocd","operation":"","project":"default","repo":"https://github.com/argoproj/argocd-example-apps","sync_status":"Synced"},"type":"gauge","help":"Information about application."}]

Set the parameters to “argocd_app_info” to gather all metrics with that name. Under “LLD Macros”, set three macros. {#NAME} is set to the .labels.name key, {#NAMESPACE} is set to the .labels.dest_namespace key, and {#SERVER} is set to .labels.dest_server.

Let us create some item prototypes. Click “Create item prototype” and name it “{#NAME}: Health Status.” Set it as a dependent item with a key of argocd.applications[{#NAME}].health. The type of information will be “Character.” Set the master item to “Get Application Metrics.”

In preprocessing, add a Prometheus pattern step with parameters argocd_app_info{name=”{#NAME}”}. Use “label” and set the label to health_status. Add a second step to “Discard unchanged with heartbeat” with the heartbeat set to 2h.

Clone the prototype to create another item called “{#NAME}: Sync status.” Change the key to argocd.applications.sync[{#NAME}]. Under “Preprocessing” change the label to sync_status.

Now, when viewing “Latest Data” the sync and health status are available for each discovered application.

Conclusion

We have shown how Zabbix templates, such as the Kubernetes template, and the etcd template utilize Prometheus patterns to extract metric data. We have also created templates for new applications that expose Prometheus data. Because of the adoption of Prometheus in Kubernetes and cloud-native applications, Zabbix benefits by parsing this data so that Zabbix can monitor Kubernetes and cloud-native applications.

I hope you enjoyed this series on monitoring Kubernetes and cloud-native applications with Zabbix. Good luck on your monitoring journey as you learn to monitor with Zabbix in a containerized world.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group. She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few. As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group:

The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT. Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries. They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

Kubernetes monitoring with Zabbix – Part 2: Understanding the discovered resources

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/kubernetes-monitoring-with-zabbix-part-2-understanding-the-discovered-resources/25476/

In the previous blog post, we installed the Zabbix Agent Helm Chart and set up official Kubernetes templates to monitor a cluster in Zabbix. In this edition, part 2, we will explore the functionality provided by the Kubernetes integration in Zabbix and discuss use cases for monitoring and alerting on events in a cluster. (This post assumes that the Kubernetes integration has been set up in at least one cluster using the helm chart and provided templates.)

Want to see Kubernetes monitoring in action? Watch Part 2 of our Kubernetes monitoring video guide.

Node and Component Discovery

Following integration setup, the templates will discover control plane components, each node, and the kubelet associated with it using the Kubernetes API via a “Script” item type.

Note:

In the last blog post, I showed a managed EKS cluster. Control plane components cannot be discovered in an EKS cluster because AWS does not make them directly available through the API. For the sake of demonstrating the full capabilities of the integration, this post will use screenshots depicting a cluster that was created using the kubeadm utility.

In the latest version of Zabbix (6.2 at the time of writing), control plane components are discovered via node labels added only for clusters created with kubeadm. Depending on your setup, you may be able to add the same node labels to your own control plane nodes or modify the template to use your specific labels.

This example cluster has 4 worker nodes and 1 master node. The control plane runs entirely on the master node.

Zabbix’s “Low-Level Discovery” is the backbone of the Kubernetes integration. Zabbix discovers each node and creates two hosts to represent them in the cluster. The first host attaches the “Linux by Zabbix Agent” template to it, and the second attaches a custom Kubelet template called “Kubernetes Kubelet by HTTP. Zabbix also creates items for most standard objects like pods, deployments, replicasets, job, cronjob, etc.

Node and Kubernetes Performance Metrics

In this example, there are four discovered worker nodes with the “Linux by Zabbix Agent” template attached to them. The template will provide metrics about the machines running in the cluster.

Each worker host’s “System performance” dashboard shows system load, CPU usage, and memory usage metrics.

Zabbix will also collect Kubernetes-specific metrics related to the nodes. “Latest Data” for the Kubernetes Nodes host shows metrics such as the Allocatable CPU available to pods and the node’s memory capacity.

Alerts are generated for events such as the allocation of too much CPU. This could indicate that capacity should be increased, assuming that the memory and CPU limits set on the pod label are accurate.

The Kubernetes integration also monitors object states. As a best practice, any tool used to monitor Kubernetes should be monitoring and alerting critical status changes within the cluster. The image above shows the triggers related to the health of a pod. There are also triggers when certain conditions are detected by the nodes, like memory or CPU pressure.

Zabbix discovers objects like pods, deployments, and Replicasets, and triggers on object states.  For example, pods that are not up or deployments that do not have the correct number of replicas up.

In this example, a cluster is running a Kubernetes dashboard deployment with 3 replicas. By running the following command, we can see that all 3 replicas are up. Under “Latest Data,” Zabbix shows those 3 replicas available out of the 3 desired.

kubectl get deployment kubernetes-dashboard



To mimic a pod crashing, the pod is edited to use an invalid image tag.

kubectl edit pod <pod name>

The image tag is changed to  “invalid.tag, “ which is unavailable for the image. This causes the pod to fail because it can no longer pull the image. Output now shows that one pod is no longer ready.

Looking at the data in Zabbix, the number of available replicas is only 3, while the number of unavailable replicas is now 1.

On the problems page, there are two new problems. Both alerted that there is a mismatch between the number of replicas for the dashboard and the number of desired replicas.

Changing the tag back to a valid one should cause those problems to be resolved.

The Kubernetes templates offer many metrics and triggers, including most provided by Prometheus and Alert Manager. With some Zabbix experience and the ability to navigate kube-state-metrics and Kubernetes APIs, creating new items is possible.

What’s Next?

Above is an example of the output from the kube-state-metrics API. Unlike most APIs that return data in JSON format, the kube-state-metrics API uses the Prometheus data model to supply metrics.

As you get comfortable with Kubernetes monitoring in Zabbix, you may want to parse your own metrics from kube-state-metrics and create new items.

In the next video, we will learn how to monitor applications with Prometheus in Zabbix.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group.  She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few.  As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group: The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT.  Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries.  They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

A guide to migrating to Zabbix 6.0 LTS by Edgars Melveris / Zabbix Summit Online 2021

Post Syndicated from Edgars Melveris original https://blog.zabbix.com/a-guide-to-migrating-to-zabbix-6-0-lts-by-edgars-melveris-zabbix-summit-online-2021/18569/

Upgrading to a new software version can be an intimidating process, especially if you are upgrading your Zabbix instance for the first time. In this blog post, we will take a look at the upgrade process itself, the necessary pre-requisites, and also what changes you can expect to the existing functionality when you’ve migrated to Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

Pre-upgrade checklist

Database versions

The first step before performing the upgrade to a new Zabbix version is ensuring that your underlying infrastructure is ready for the upgrade process. There are some changes in Zabbix that you should be aware of and address before the upgrade. One of these changes is the list of supported database engines and their versions for Zabbix 6.0 LTS:

  • MySQL/Percona 8.0.x
  • MariaDB 10.5.0 -10.6.x
  • PostgreSQL 13.x
  • Oracle 19c – 21c

And if you’re using PostgreSQL + TimescaleDB or Zabbix proxies:

  • TimescaleDB 2.0.1-2.3
  • SQLite 3.3.5 – 3.34.x

You may have noticed that we have increased the version requirements for the Zabbix backend databases. The reason for this is Zabbix utilizes the features that only these newer database versions provide, thus ensuring optimal Zabbix performance. If you’re using an unsupported database version, Zabbix will not start. There will be a configuration parameter to override this behavior, but that is not recommended since we cannot ensure that your Zabbix version will work without encountering any performance issues or crashes. A database upgrade to a supported version should be performed first before moving to Zabbix 6.0 LTS.

Supported operating systems

Zabbix supports all Linux distributions and many other Unix-like operating systems. Unfortunately, it is not feasible to provide Zabbix packages for each and every distribution out there. One of the major changes that were made back in Zabbix 5.2 – we no longer provide packages for RHEL/CentOS 7. This is because some of the libraries included in these distributions are outdated, and it becomes more and more complicated to build Zabbix on these OS versions. Though it is still possible to build Zabbix from sources if you provide the correct versions of the required libraries.

Some of the officially supported operating systems for Zabbix 6.0 LTS:

  • RHEL/CentOS/Oracle Linux 8
  • Ubuntu 18.04+
  • Debian 10+
  • SLES 12+

Other installation options

There are additional Zabbix deployment options:

  • Docker – all of the dependencies are already provided in the official docker images
  • Cloud image – the image includes all of the required dependencies
  • Zabbix appliance – All of the available Zabbix appliance images contain the required dependencies

Environment review

Before upgrading between major Zabbix versions, it is very much recommended to do an environment review and take a look at the pending maintenance tasks for our environment and also do a health check. Some of the things that we should consider before performing the upgrade to Zabbix 6.0 LTS:

  • Apply any required OS or DB upgrades before upgrading Zabbix and check for any issues before moving on
  • Check for any customizations in your installation – are there any DB schema changes? Any custom modules or patches?
    • The best way to test this is to make a copy of your existing Zabbix instance and test the upgrade in a QA environment.
  • Are the required packages available for all of the Zabbix components?
  • Are all proxies installed on the supported OS versions?
  • Check the documentation for any known issues in the version to which you are upgrading

Important changes that affect the upgrade process

There are some changes in Zabbix 6.0 LTS that could potentially affect the upgrade process or your existing Zabbix workflows.

API Changes

Below is a list of documentation pages related to API changes between versions 5.0 and 6.0:

  • https://www.zabbix.com/documentation/6.0/manual/api/changes_5.4_-_6.0
  • https://www.zabbix.com/documentation/current/manual/api/changes_5.2_-_5.4
  • https://www.zabbix.com/documentation/5.2/manual/api/changes_5.0_-_5.2

Some of the more important API changes:

  • Trigger and calculated/aggregated item syntax change introduced in Zabbix 5.4 also change the API calls responsible for creating triggers (ZBXNEXT-6451)
    • You will need to change the trigger syntax in your API calls to avoid any issues
  • For user.create and user.update methods the user_medias parameter was renamed to medias (ZBX-17955)
    • user_medias parameter is now deprecated
  • The type property is no longer supported for user.create, user.update, and user.get methods (ZBXNEXT-6148)
    • The type property is not supported for the user object since we now define it in user roles
  • Items no longer support applications. Applications have been replaced with tags (ZBXNEXT-2976)
  • Since value maps can no longer be defined globally, the valuemap.create and valuemap.get methods now require a hostid field (ZBXNEXT-5868)

Other important changes

There are a couple of important changes that users should be aware of when migrating to Zabbix 6.0 LTS:

  • Previously, trailing spaces in passwords – both when setting a password and entering it, were trimmed. This has been changed, and trailing spaces in passwords are no longer trimmed.
  • Global value maps that remain unused will be removed
  • Existing audit log records will be removed due to major changes in the audit log design.

Upgrade steps

Next,  let’s discuss the steps that you should take to perform the upgrade procedure in a correct and safe manner:

  • Backup your database, as well as any customizations (external scripts, alert scripts) and configuration files
  • Update the Zabbix server and Zabbix frontend
    • Once the new Zabbix server process is started, it will automatically check the database schema version and automatically upgrade it
    • Depending on the database size and the version from which you are migrating – this can take a while
    • Once the automatic database schema upgrade is done, the Zabbix server will be started automatically
  • Update your proxies. Proxies are required to have the same major version as the Zabbix server
  • Check if there are no issues and your Zabbix instance is up and running
    • Check if the metrics are being collected by your Zabbix server and Zabbix proxies
    • Check if the triggers are detecting any problems and if you’re receiving notifications about them

Backup

Let’s take a more in-depth look at the backup process and discuss the required steps with some examples:

  • Backup the database – methods depend on the DB type
    • In most cases, you can ignore history and trends tables – simply backing up only your configuration data
    • History and trends tables tend to be extremely large. That’s why the above approach is a lot faster
    • If at some point you are required to perform a restore from this backup, history, and trends tables will have to be manually recreated
  • Backup the Zabbix configuration files
  • Optionally – backup any custom alert scripts, external scripts, and any other customizations

Example MySQL database backup with history and trends tables ignored:

mysqldump -uroot -p --single-transaction --ignore-table=zabbix.history --ignore-table=zabbix.history_uint --ignoretable=zabbix.history_text --ignore-table=zabbix.history_log --ignore-table=zabbix.history_str --ignore-table=zabbix.trends --ignore-table=zabbix.trends_uint zabbix | gzip > zabbix_backup.sql.gz

Backing up the configuration

At the very least, you should back up the configuration files located in:

  • /etc/zabbix/*
  • external scripts from /usr/lib/zabbix/externalscripts/
  • alert scripts from /usr/lib/zabbix/alertscripts
  • /etc/httpd/conf.d/zabbix.conf
  • /etc/php-fpm.d/zabbix.conf

Upgrade process with docker

There are multiple approaches to running Zabbix in docker. For this example, we will assume that you’re running Zabbix server and Zabbix frontend in docker and using the official Zabbix docker images with a MySQL backend database and apache web backend.

Stop the Zabbix server, frontend, and proxy containers:

docker stop my-zabbix-server
docker stop my-zabbix-frontend

Start the Zabbix 6.0 LTS container and point it at the same backend database:

docker run --name my-zabbix-server-6.0 -e DB_SERVER_HOST="some-mysql-server" -e
MYSQL_USER="some-user" -e MYSQL_PASSWORD="some-password" -d zabbix/zabbixserver-mysql:6.0-latest

Once again, an automatic DB schema upgrade will be started

Lastly, start the Zabbix frontend container:

docker run --name my-zabbix-web-apache-6.0 -e DB_SERVER_HOST="some-mysqlserver" -e MYSQL_USER="some-user" -e MYSQL_PASSWORD="some-password" -e ZBX_SERVER_HOST= "my-zabbix-server-6.0" -d zabbix/zabbix-web-apache-mysql:6.0-latest

Upgrade process with Zabbix packages

Upgrading the main Zabbix components

If you’re using the official Zabbix packages, then the upgrade process will take a few more steps and can seem a bit more complicated. Let’s take a look at the required upgrade steps in detail. For our example, we will use a CentOS 8 OS distribution.

Install the Zabbix 6.0 LTS release package. This will add the necessary Zabbix 6.0 LTS repository information:

rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm

Clear the DNF package manager cache:

dnf clean all

Install all of the required packages:

dnf install zabbix-server-mysql zabbix-web zabbix-web-mysql zabbix-web-deps
zabbix-apache-conf zabbix-selinux-policy

Start Zabbix components and observe the log file. You should see that the database schema upgrade is in progress. Once it has finished, all of the internal Zabbix processes should be started without any issues:

17602:20210921:131335.333 completed 96% of database upgrade
17602:20210921:131335.355 completed 97% of database upgrade
17602:20210921:131335.379 completed 98% of database upgrade
17602:20210921:131335.606 completed 99% of database upgrade
17602:20210921:131335.711 completed 100% of database upgrade
17602:20210921:131335.711 database upgrade fully completed
17602:20210921:131335.804 server #0 started [main process]
17602:20210921:131335.808 server #2 started [configuration syncer #1]
17602:20210921:131335.810 server #1 started [service manager #1]

Upgrading the Zabbix proxies

In addition, we are also required to update our Zabbix proxies. The procedure is very similar to what we did in our previous steps:

Install the Zabbix 6.0 LTS release package:

rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm

Clear the DNF package manager cache:

dnf clean all

Update the Zabbix proxy packages:

dnf update zabbix-proxy-mysql(pgsql, sqlite3)

For MySQL, PostgreSQL, and Oracle proxy backend databases, the DB schema is performed automatically.

For Zabbix proxies using SQLite3 backend databases, automatic database schema upgrade is not supported. We will simply have to remove the old SQLite3 database file – it will then be automatically recreated once we start the Zabbix proxy.

rm –rf /tmp/proxy.sqlite

Post-upgrade tasks

After the upgrade to Zabbix 6.0 LTS, there are a few additional tasks that we should take care of. Let’s take a look at what needs to be done.

History table primary key

Zabbix 6.0 LTS backend database history table schema has been changed. These tables now contain primary keys. The upgrade or these history tables is not done automatically since it can cause additional downtime. Depending on the size of the database, executing the required changes can be extremely slow since every record in the history tables needs to be altered. In addition, duplicate entries in history tables could potentially cause this manual database schema upgrade to fail. There are multiple benefits to the history table schema changes:

  • All history tables will now have primary keys
  • Decreased history table storage size
  • Increased history table query performance
  • Not recommended when upgrading an existing instance

For new Zabbix 6.0 LTS installations, this change will be included by default, while for the existing installations, it is recommended to thoroughly test the history table schema change procedure and evaluate the potential downtimes. The exact history table upgrade steps will be documented with the release of Zabbix 6.0 LTS.

Check new processes

There are some new Zabbix processes that have been added to Zabbix 6.0 LTS that yous should be aware of:

  • StartHistoryPollers
    • The process responsible for handling calculated, aggregated, and internal checks requiring a database connection
    • The default value is 5. Consider increasing this number if you have many such items
  • If migrating from 4.0: StartLLDProcessors
    • Worker process for low-level discovery tasks
    • The default value is 2. Consider increasing if you have many low-level discovery rules.

Update the existing templates

If you’ve performed a Zabbix upgrade before, you will be aware of the fact that Zabbix does not update your existing templates automatically since we assume there could be some custom changes performed by the end-users on said templates. Therefore, to see the aforementioned new processes in the Zabbix server internal monitoring graphs, you should download and import the latest Zabbix server template.

You can download the templates from the official Zabbix git page. You can read the release notes to see the full list of the updated templates and changes that have been performed on said templates.

Update the Zabbix agents

You may also consider upgrading your Zabbix agents. This is not mandatory since Zabbix agents are backward compatible, so you can use an older version of Zabbix agents with Zabbix 6.0 LTS. All of the previous functionality will continue to function, but you may still consider updating the agents since the updates could contain some bug fixes or support for a brand new set of items.

Upgrading the Zabbix agent:

dnf install zabbix-agent

Upgrading the Zabbix agent 2:

dnf install zabbix-agent2

New Zabbix packages

You may have noticed that in Zabbix 6.0 LTS, there are multiple new packages. Most of these packages are repackagings of some of the old components for better package management, but there are exceptions:

  • zabbix-selinux-policy – basic SELinux policy for Zabbix
  • zabbix-sql-scripts – All of the .sql backend database scripts
    • These used to be a part of the zabbix-server package
    • This package is required to, for example, deploy the initial Zabbix database schema or data during the Zabbix install process.
  • zabbix-web-service – The service responsible for the scheduled report generation

Questions

Q: Are my custom templates going to be affected in any way by the upgrade process?

A: Yes, all of your templates will continue to work. Any changes that we have made to trigger syntax, for example, will be automatically applied to your existing entities.

 

Q: How long will the migration process take? How can I estimate the downtime?

A: Unfortunately, it’s impossible to estimate a precise downtime duration without creating a QA copy of your existing Zabbix instance with the same exact hardware and checking the downtime duration there with a test upgrade. At the end of the day, this will depend not only on the size of the database but also on the size of individual tables, the version from which you are upgrading, and how optimized your software and hardware are.

 

Q: What about migrating from a very old version – say Zabbix 3.0 or older?

A: It should work, but there can be some caveats and additional pre-requisites required for the older version upgrades. I would recommend going through our previous Summit recordings since we have covered the upgrade process for older versions in previous years. Those should provide you a pre-requisite checklist that you can perform before upgrading to Zabbix 6.0 LTS.

The post A guide to migrating to Zabbix 6.0 LTS by Edgars Melveris / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Top 10 reasons to migrate to Zabbix 6.0 LTS by Dmitry Krupornitsky / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/top-10-reasons-to-migrate-to-zabbix-6-0-lts-by-dmitry-krupornitsky-zabbix-summit-online-2021/18445/

Today we will take a look at the top 10 reasons to migrate to Zabbix 6.0 LTS. We will discuss features and changes included not only in Zabbix 6.0 LTS but also in intermediate major versions – Zabbix 5.2 and Zabbix 5.4.

The full recording of the speech is available on the official Zabbix Youtube channel.

High availability

With Zabbix 6.0 LTS, native support for Zabbix server high availability clusters is finally here. High availability setups can protect you from software and hardware failures and allow you to minimize downtime while performing maintenance tasks. Before Zabbix 6.0 LTS, users were required to use a dedicated piece of clustering software to enable high availability. Most users used a combination of Corosync + pacemaker software. This required additional knowledge related to these tools, to ensure a proper high availability cluster setup, configuration, maintenance, and other tasks related to managing your Zabbix high availability cluster. You could also use other 3rd party vendor solutions, but such solutions also require additional knowledge and in many cases incur additional licensing costs.

The native Zabbix server high availability cluster is an opt-in solution that provides high availability for the Zabbix server component. This solution consists of multiple Zabbix server instances – nodes, where each node is configured separately and uses the same database. Each node has two modes of operation – active or standby. Only a single node can be active at a time. The standby nodes do not perform any data collection, data processing, or any other Zabbix server activities. The standby nodes do not listen for connection on ports and have a minimal number of connections established to the Zabbix backend database. The high availability nodes are compatible with one another across different minor Zabbix server versions.

Learn how to deploy your own Zabbix server high availability cluster by following the steps provided in our Zabbix Summit blog post dedicated to this topic.

New Zabbix interface options

Zabbix 6.0 LTS provides multiple Zabbix interface improvements. One of the major changes that the users will notice when switching to Zabbix 6.0 LTS is the migration from screens to dashboards. The screens will be migrated to dashboards automatically during the upgrade. Dashboards consist of multiple highly customizable widgets, which can be placed on a dashboard with a click of a button. With Zabbix 6.0 LTS many new widgets will be available for different purposes – more flexible views of your metrics with the Single item value widget, a Geomap widget for a better overview of your infrastructure state, Top N/Bottom N views provide a whole new way to look at your metrics and more.

Now you will be able to save your favorite problem filters and access your filters in tabs for more simple filtering of the commonly accessed problem views.

Zabbix 6.0 LTS introduces timezone configuration on a per-user basis. Users can now have their preferred timezone configured via the user settings in the Zabbix frontend. The same is also true for language – this can also now be configured individually for each user.

The Zabbix frontend is now more customizable than ever. There are several ways in which you can customize your Zabbix frontend:

  • Replace the Zabbix logo with your company’s branding
  • Hide links to Zabbix support/integration pages
  • Set a custom help page link
  • Change the copyright notice in the footer of the frontend.

Implementing these changes requires customizing the underlying PHP code – we tried to make this as simple and accessible as possible, so you can quickly make the necessary changes yourself.

There are also many other Interface improvements, such as multi-page dashboards, third-level menus, graph improvements, and many others.

Improved security

Security is always something that we focus on when developing Zabbix. Zabbix 6.0 LTS brings many new security-related improvements and features:

  • User roles allow you to define roles with granular permissions related to the frontend access and the actions that each user role is permitted to perform
    • Roles are still based on user types – Zabbix User, Admin, Super admin, and user type restrictions still apply, but can be further customized per each role
    • User group to host group permissions (Read, Read/Write, Deny) still need to be used in combination with roles to ensure granular access to your data
    • For example, now we can define users that have access to host configuration but restrict access to other configuration sections.

In Zabbix 6.0 LTS it is possible to define custom password complexity requirements for Zabbix frontend logins. We can define password length/complexity policies and prohibit the usage of easy to guess common passwords.

The Zabbix API has also seen some security improvements. Now it is possible to generate a persistent API token for a particular user, define an expiration date and use the token in your API calls, without the need to regularly re-issue a new API token.

Zabbix 5.2 release also added the ability to store sensitive information in an external vault. As of the release of Zabbix 6.0 LTS, only HashiCorp Vault is supported, but CyberArk Vault support is also coming in Zabbix 6.2 release.

A set of architectural and structural measures have been taken to completely restructure the Zabbix Audit log. The updated Audit log entry contains records of all configuration changes made by the Zabbix server and Zabbix frontend. The new Audit log also contains additional filtering options, such as filtering Audit log entries based on the operation during which the changes were performed. The new Audit log is not only more detailed but also reworked with minimum performance impact in mind.

Scalability improvements

Many scalability improvements have been introduced between the Zabbix 5.0 LTS release and Zabbix 6.0 LTS release. These improvements not only improve the performance of existing Zabbix instances but also lay the groundwork for the design of upcoming features in later releases.

Previously, trend-based trigger functions would always use database queries to obtain the required data. Starting from Zabbix 5.4, a new type of cache – Trend function cache, has been introduced. This cache stores the results of calculated trend functions. When processing the trend functions, the Zabbix server will check the Trend function cache for the cached results. In case of failure, the Zabbix server will read the data from the database and cache the results.

The scalability improvements allow for better parallel data processing on Zabbix servers with heavy loads. Zabbix Instances with tens of thousands or more new values per second will greatly benefit from the improved performance.

The introduction of the graceful startup of the Zabbix server can help you improve performance and prevent unwanted downtimes, especially with large distributed environments. Whenever a Zabbix server gets started up after downtime, the existing Zabbix proxies start sending the data backlog to the Zabbix server. it is extremely important to maintain the stability and performance of the Zabbix server during this time window. Graceful startup improves the Zabbix server data backlog handling logic during such situations.

To prevent unwanted delays and other issues when using zabbix_get and zabbix_sender command-line tools, it is now possible to define a custom Timeout parameter for these tools.

Advanced business service monitoring

The new Busines service monitoring features allow Zabbix users to not only define complex service trees but also receive alerts in situations where the status of a business service has been changed. This is valuable to every user that wishes to monitor their business services, no matter how simple or complex the service is.

Combined with a large number of new and improved service status calculation rules. By defining custom service weights and advanced service status propagation rules, the business services can be defined in an extremely flexible fashion. Services are also not linked to individual triggers anymore, instead, we use tag-based service mapping to map our services to problem events.

The service functionality has also received scalability improvements. Zabbix can support the monitoring of over 100 000 business services. The scalability improvements have been implemented from both the UI/UX and the performance perspectives.

The old all-or-nothing business service permission approach has been redesigned to a granular read/write permissions for individual business services. This is not only an improvement from the security perspective, but also adds the ability to define services in a multi-tenant fashion, where each tenant has access only to the services that they own.

With the redesign of the business services, we have added the support for root cause analysis, allowing users to see the underlying problem which caused a particular service to change its state.

You can read more about Business service monitoring in our Zabbix Summit blog post dedicated to this topic.

Tag and template improvements

Item applications have been replaced with tags. This design decision adds consistency to filtering, mapping, grouping, and other tag-related functions when it comes to different Zabbix entities. Tags can also be used to provide additional information related to your entities in a manner that is much more flexible than it was with applications.

Universal template IDs introduced for each of the template elements allow you to define much more robust template management workflows, especially when you combine this with a CI/CD template management approach. These IDs are unique and can be used to match a particular template entity, such as item, trigger, graph, and so on. By utilizing the Universal template IDs, Zabbix now understands which entity we are trying to update, which entity no longer exists, whether it is a new entity or we are adjusting an existing entity. The default template export format is now YAML, though JSON and XML formats are still supported. This was done to improve the template management usability since the YAML format is more user-friendly and easier to edit manually. All of the official Zabbix templates available on the Zabbix git page have already been converted to the YAML format.

The redesign of the templates has also allowed us to improve the visualization of the changes made when importing a template. Now users can see the list of changes in a diff-like display and understand the impact that the template import will have on the  Zabbix entities.

Value maps have been moved to host and template levels. This is another design decision that we made to enable support for fully self-contained templates, that are easy to manage and deploy, and can be easily imported into different Zabbix environments. While global value maps might be easy to manage in small environments, this is not the case in larger environments, where different teams are working with a single or between multiple Zabbix instances. Therefore, the global value maps have been removed.

Reporting and visualization

With the addition of Scheduled reports functionality, any dashboard can now be converted into a scheduled report. While this feature was originally added in Zabbix 5.4, with the release of Zabbix 6.0 LTS and a set of new widgets, the reporting functionality has gained a lot of additional value that these widgets grant specifically from the reporting perspective. Users can create scheduled reports and receive them in their mailbox at a specific time either on a daily, weekly, monthly, or yearly basis. The time period for which the report will provide the information can also be selected.

The new Geographical map widget allows you to quickly deploy a geomap with an overview of the state of your infrastructure. The geomap widget supports filters, so we can display only a particular part of your infrastructure. Zabbix uses an open-source Javascript interactive maps library called Leaflet and supports multiple map providers such as OpenStreetMap, OpenTopoMap, USGS US Topo, and more. Users also have the ability to define and use a custom map tile provider. The map will display your infrastructure and also highlight any detected problems as well as display problem counters. This is a major step forward from the old approach, which required users to use the regular map functionality together with Zabbix API scripting, to provide information on a geographical map.

Advanced problem detection

Zabbix 5.4 release introduced a new unified syntax for defining trigger expressions, calculated, and aggregated items. There are multiple benefits that come with the new trigger syntax. First off – the syntax is now unified and can be used for defining triggers, calculated items, and providing values in maps or graph names. The syntax also has a more functional approach, instead of being object-oriented. This allows us to solve many complex use cases, for example dynamically calculate or aggregate a value from all hosts tagged with a specific tag or belonging to a specific host group. Aggregated item type has also been removed and users can now define aggregate checks under the calculated item type.

New monitoring functionality and integrations

As with every major release, Zabbix 6.0 LTS comes with a set of new items and improves the functionality of already existing items:

  • It is now possible to monitor SSL certificate validity and expiration data, such as the expiry date, issuer, version, subject, and more
  • New Zabbix Agent 2 metrics allow you to collect file owner information, file properties, extended interface info, extended TCP info, SHA2 hashes for files, and more
  • New templates for NGINX+, HPE/Dell servers, CISCO ASAv, Cloudflare

Finally – Zabbix 6.0 LTS

Many of our users and customers prefer sticking with the LTS releases instead of upgrading between each major version. As with every LTS release, there are major benefits to sticking with Zabbix 6.0 LTS:

  • LTS release receive thorough testing and full long term support
    • 3 years of full support – general, critical, and security fixes/improvements
    • 5 years of limited support – critical and security fixes

Questions

Q: Which of the current versions are still supported and for how long are they going to remain supported? What updates can we expect these versions to receive?

A: Currently we have three supported major versions available. Zabbix 5.4, which will not be supported after the release of Zabbix 6.0 LTS. We also still provide support for Zabbix 5.0 LTS and Zabbix 4.0 LTS. Zabbix 5.0 LTS will continue receiving full support until the middle of 2023 and limited support until the middle of 2025, while Zabbix 4.0 LTS will receive limited support until November 2023.

 

Q: Could you elaborate on how tags are more flexible than applications and are there any other benefits to using tags?

A: Zabbix already supports tags for most of the essential Zabbix objects, such as triggers, hosts, host prototypes, and templates. With the introduction of tags for items, tags can now be found everywhere. This way you can have tags that provide different additional information and assign values for your objects. Tags have several usages – for example, we can use them to mark events. If we have an item with a tag, this tag will mark any problem related to this item. Problem events will inherit tags from the whole tag chain – hosts, templates, triggers, items, and more. Further down the line, we can use our actions to react to specific tags. If you recall, Business services are also mapped to problems based on the tag mapping. Of course, tags can also be used for filtering and grouping different Zabbix objects.

 

Q: Is there a guideline to the migration process from an older version to Zabbix 6.0 LTS? Is there a change list that I can look at to see what other features have received an overhaul?

A: Regarding the upgrade itself – our documentation contains guidelines for both upgrading from packages and upgrading from sources. The documentation may also contain upgrade notes regarding any extra steps or precautions required when upgrading to a particular version. Regarding the feature changes – we recommend reading through the major version release notes. For example, if you’re upgrading from Zabbix 5.0 LTS to Zabbix 6.0 LTS, make sure to familiarize yourself not only with the Zabbix 6.0 LTS release notes, but also read through the Zabbix 5.2 and Zabbix 5.4 release notes, since changes introduced in these versions will also be a part of Zabbix 6.0 LTS.

The post Top 10 reasons to migrate to Zabbix 6.0 LTS by Dmitry Krupornitsky / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Gaining new insights with Business service monitoring by Aleksandrs Petrovs-Gavrilovs / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/gaining-new-insights-with-business-service-monitoring-by-aleksandrs-petrovs-gavrilovs-zabbix-summit-online-2021/17973/

Zabbix 6.0 LTS comes with a complete redesign of the service monitoring. From improved business service scalability to advanced service status calculation logic and alerting. Let’s take a look at the Business Service monitoring feature and how you can use it to ensure full transparency for your business services.

The full recording of the speech is available on the official Zabbix Youtube channel.

Business services can be quite complex. They tend to consist of many different moving parts with redundancy and failover mechanism in place, all of which need to be taken into consideration when we wish to analyze the current status of our services.

BSM Checklist

Let’s take a look at what needs to be done so we can successfully define and monitor our business service:

  • First, we have to define what exactly is our business service and what components does it consist of?
  • We need to understand what are our expectations when it comes to service uptime. When should the service be up and running? What are the acceptable downtimes? Should it run 24/7/365 or maybe it’s a service that is critical only during our working hours?
  • Once we know what needs to be monitored, we need to make sure that we are collecting the data that reflects the status of different service components.
  • Finally – we have to find a suitable tool to track and measure our service.

Define your business

Let’s take a look at how a business may look like. As I mentioned before – business services can consist of many different components. Let’s take a look at an example of how business services may look like:

The tree structure here represents our Business services. We can see that we have classified the services into two branches – Internal services and User services. The User services consist of components such as Websites, Helpdesk services, Phones. These general services are based on lower-level components such as the actual physical phones for the phone service, underlying software for the Website and Helpdesk services, and so on.

This can make things quite complicated since usually, organizations will have many more components to take care of. That’s why, let’s see how we can simplify this tree and define our services in a more simple manner, like the service tree below:

Now we are left with only 3 levels for our services. Let’s take a look at how we can move this to Zabbix:

Here we can see a high-level view of our services. Once again we have our Internal services and User services. These here high-level services consist of child services and define what these components consist of and what their SLAs should be. We can also define tags to provide additional details to our services – which customer uses the service, the type of service, maybe even the location that the service is used in – this part is completely up to your imagination.

Once you have defined the services, their respective components and have linked them to the problems by using tags, you will finally be able to see the full picture. Zabbix will display not only the status of the service but also the root cause of the problem. This way we can provide service status information not only on the service owner level but also provide information that your technical staff can use to fix the issue.

Configuring SLAs

Configuring Business Service monitoring can be done from the MonitoringServices section. In Zabbix 6.0 LTS you are not required to start defining the service tree from the root service. Now you can define your own root level services. To create a service, all we have to do is switch to the Edit mode by clicking the Edit button in the upper right corner of the services screen and click the Create service button right next to it. We have also made some additional changes to the service section UI/UX. Now you also have multiple fast edit buttons next to each service. You can use them to Add a child service, edit an existing service, or delete an existing service.

Next, let’s take a look at the actual service creation steps.

  • We need to provide a name for our service
  • If the service is not a top-level service you have to select a parent service
  • Define problem tags. Problems tagged with the matching tags will affect the service status
  • Define the status calculation rule

Major improvements have been made to status calculation rules. We still support the old logic of the Use the most critical of child services / Most critical if all children have problems / Set status to ok, but there are also many advanced service status calculation rules.

  • Now we have the ability to select a specific status (Warning, Average, High, and so on) for our service in case of a problem
  • Select the number of children, More than/Less than N children, Percentage of children that should be affected for the parent service status change to take place
  • Define weights for child services and perform status changes based on the weight of the affected child services

Child services can also apply different propagation rules for the parent service

  • Child services can Increase or decrease the parent status service status by N severities, ignore the child service, apply a fixed status or apply the status depending on the problem severity

For our example let’s use an HA cluster use case. HA clusters consist of multiple nodes – for our example, we will use 3 nodes.

  • First, we define that the HA cluster consists of 3 nodes – 3 child services.
  • Each node will have equal weight – 1
  • On the parent service, we will define multiple status rules
    • If the weight of the child services is 1 (1 node is down) – the parent service will change its status to Warning
    • If the weight of the child services is 2 (2 nodes are down) – the parent service will change its status to Average
    • If the weight of the child services is 3 (all nodes are down) – the parent service will change its status to Disaster

In the above image, we can see how the corresponding status change will look like in the Services section. Note that we can also see the root cause of the parent service status change in the Root cause column.

We also have the ability to define the acceptable SLAs as well as SLA calculation uptime and downtime periods for our services. We have the option to define scheduled uptimes and downtimes, during which SLA should or shouldn’t be calculated (Such as weekends, for example), as well as one-time downtimes for one-time maintenance purposes.

Services can utilize tags to provide additional information about your services, such as the service type, service customer, service location, and more. On top of that, tags can also be used in the Service action condition logic, so you can define granular alerting logic for your service status changes.

The Child services tab allows you to quickly look at the related child services, their problem tags, and status calculation rules.

Child services can also be crosslinked between multiple parent services. This means that you don’t have to duplicate and recreate child services if they are used as a component of multiple parent services.

Track, solve and measure

Once we have configured our service, what remains is keeping track of our service statuses, SLAs and staying notified about service status changes and their root cause.

For this purpose, it is vital to secure access to our services. This is especially critical for MSPs, which may have multiple customers and each customer should have access only to the services related to that particular customer. To that end, the Roles section has also received an update related to the Service permissions. We can now define Read-Write and Read access to either specific services or services marked with a particular tag.

The Root cause section displays the root cause problems that affected the service status change. You will be able to click on the root cause problem and open it in the Problems section for further analysis of what caused your services to change their status and which host has been affected by it.

Previously I mentioned alerting on service status change, so let’s dig deeper into that. In Zabbix 6.0 LTS we have added a new type of action – Service actions. Zabbix can now react to service status changes and notify you when a service changes its status. The Service action conditions can analyze if a status has been changed on a particular service, a service that matches or contains a specific string in its name, tag, or tag value. If the conditions are true, Zabbix can send out an email, deliver a phone call or an SMS, create a ticket in your helpdesk system or perform any other alerting and notification workflow.

Many other BSM features are coming as we continue the development of Zabbix 6.0 LTS:

  • SLA graphical visualizations with support for over 100k services
  • Daily, Monthly, Weekly SLA reports
  • New service tree and SLA reporting widgets available from the dashboard
  • Service tree import and export
  • Impact analysis – see which service affects other related services in what way.

Questions

Q: Will the existing services be migrated to Zabbix 6.0 LTS?
A: The existing services will be migrated to Zabbix 6.0 LTS during the upgrade. All of the configuration for the existing services will stay intact after the migration.

Q: Does host maintenance suppress service calculation in Zabbix 6.0 LTS?
A: Host maintenance will not affect the service calculation. If you wish to define maintenance periods for your services –  use scheduled or one-time downtime options when configuring an individual service.

Q: How are the Fixed status and Ignore this service calculation rules going to work?
A: Fixed status services will not change their status no matter what happens to the child services – the service status will remain fixed. As for Ignore this service – the service status change will be ignored and will not affect the parent services.

What’s new in Zabbix 6.0 LTS by Artūrs Lontons / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/whats-new-in-zabbix-6-0-lts-by-arturs-lontons-zabbix-summit-online-2021/17761/

Zabbix 6.0 LTS comes packed with many new enterprise-level features and improvements. Join Artūrs Lontons and take a look at some of the major features that will be available with the release of Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

If we look at the Zabbix roadmap and Zabbix 6.0 LTS release in particular, we can see that one of the main focuses of Zabbix development is releasing features that solve many complex enterprise-grade problems and use cases. Zabbix 6.0 LTS aims to:

  • Solve enterprise-level security and redundancy requirements
  • Improve performance for large Zabbix instances
  • Provide additional value to different types of Zabbix users – DevOPS and ITOps teams, Business process owner, Managers
  • Continue to extend Zabbix monitoring and data collection capabilities
  • Provide continued delivery of official integrations with 3rd party systems

Let’s take a look at the specific Zabbix 6.0 LTS features that can guide us towards achieving these goals.

Zabbix server High Availability cluster

With the release of Zabbix 6.0 LTS, Zabbix administrators will now have the ability to deploy Zabbix server HA cluster out-of-the-box. No additional tools are required to achieve this.

Zabbix server HA cluster supports an unlimited number of Zabbix server nodes. All nodes will use the same database backend – this is where the status of all nodes will be stored in the ha_node table. Nodes will report their status every 5 seconds by updating the corresponding record in the ha_node table.

To enable High availability, you will first have to define a new parameter in the Zabbix server configuration file: HANodeName

  • Empty by default
  • This parameter should contain an arbitrary name of the HA node
  • Providing value to this parameter will enable Zabbix server cluster mode

Standby nodes monitor the last access time of the active node from the ha_node table.

  • If the difference between last access time and current time reaches the failover delay, the cluster fails over to the standby node
  • Failover operation is logged in the Zabbix server log

It is possible to define a custom failover delay – a time window after which an unreachable active node is considered lost and failover to one of the standby nodes takes place.

As for the Zabbix proxies, the Server parameter in the Zabbix proxy configuration file now supports multiple addresses separated by a semicolon. The proxy will attempt to connect to each of the nodes until it succeeds.

Other HA cluster related features:

  • New command-line options to check HA cluster status
  • hanode.get API method to obtain the list of HA nodes
  • The new internal check provides LLD information to discover Zabbix server HA nodes
  • HA Failover event logged in the Zabbix Audit log
  • Zabbix Frontend will automatically switch to the active Zabbix server node

You can find a more detailed look at the Zabbix Server HA cluster feature in the Zabbix Summit Online 2021 speech dedicated to the topic.

Business service monitoring

The Services section has received a complete redesign in Zabbix 6.0 LTS. Business Service Monitoring (BSM) enables Zabbix administrators to define services of varying complexity and monitor their status.

BSM provides added value in a multitude of use cases, where we wish to define and monitor services based on:

  • Server clusters
  • Services that utilize load balancing
  • Services that consist of a complex IT stack
  • Systems with redundant components in place
  • And more

Business Service monitoring has been designed with scalability in mind. Zabbix is capable of monitoring over 100k services on a single Zabbix instance.

For our Business Service example, we used a website, which depends on multiple components such as the network connection, DB backend, Application server, and more. We can see that the service status calculation is done by utilizing tags and deciding if the existing problems will affect the service based on the problem tags.

In Zabbix 6.0 LTS there are many ways how service status calculations can be performed. In case of a problem, the service state can be changed to:

  • The most critical problem severity, based on the child service problem severities
  • The most critical problem severity, based on the child service problem severities, only if all child services are in a problem state
  • The service is set to constantly be in an OK state

Changing the service status to a specific problem severity if:

  • At least N or N% of child services have a specific status
  • Define service weights and calculate the service status based on the service weights

There are many other additional features, all of which are covered in our Zabbix Summit Online 2021 speech dedicated to Business Service monitoring:

  • Ability to define permissions on specific services
  • SLA monitoring
  • Business Service root cause analysis
  • Receive alerts and react on Business Service status change
  • Define Business Service permissions for multi-tenant environments

New Audit log schema

The existing audit log has been redesigned from scratch and now supports detailed logging for both Zabbix server and Zabbix frontend operations:

  • Zabbix 6.0 LTS introduces a new database structure for the Audit log
  • Collision resistant IDs (CUID) will be used for ID generation to prevent audit log row locks
  • Audit log records will be added in bulk SQL requests
  • Introducing Recordset ID column. This will help users recognize which changes have been made in a particular operation

The goal of the Zabbix 6.0 LTS audit log redesign is to provide reliable and detailed audit logging while minimizing the potential performance impact on large Zabbix instances:

  • Detailed logging of both Zabbix frontend and Zabbix server records
  • Designed with minimal performance impact in mind
  • Accessible via Zabbix API

Implementing the new audit log schema is an ongoing effort – further improvements will be done throughout the Zabbix update life cycle.

Machine learning

New trend functions have been added which utilize machine learning to perform anomaly detection and baseline monitoring:

  • New trend function – trendstl, allows you to detect anomalous metric behavior
  • New trend function – baselinewma, returns baseline by averaging data periods in seasons
  • New trend function – baselinedev, returns the number of standard deviations

An in-depth look into Machine learning in Zabbix 6.0 LTS is covered in our Zabbix Summit Online 2021 speech dedicated to machine learning, anomaly detection, and baseline monitoring.

New ways to visualize your data

Collecting and processing metrics is just a part of the monitoring equation. Visualization and the ability to display our infrastructure status in a single pane of glass are also vital to large environments. Zabbix 6.0 LTS adds multiple new visualization options while also improving the existing features.

  • The data table widget allows you to create a summary view for the related metric status on your hosts
  • The Top N and Bottom N functions of the data table widget allow you to have an overview of your highest or lowest item values
  • The single item widget allows you to display values for a single metric
  • Improvements to the existing vector graphs such as the ability to reference individual items and more
  • The SLA report widget displays the current SLA for services filtered by service tags

We are proud to announce that Zabbix 6.0 LTS will provide a native Geomap widget. Now you can take a look at the current status of your IT infrastructure on a geographic map:

  • The host coordinates are provided in the host inventory fields
  • Users will be able to filter the map by host groups and tags
  • Depending on the map zoom level – the hosts will be grouped into a single object
  • Support of multiple Geomap providers, such as OpenStreetMap, OpenTopoMap, Stamen Terrain, USGS US Topo, and others

Zabbix agent – improvements and new items

Zabbix agent and Zabbix agent 2 have also received some improvements. From new items to improved usability – both Zabbix agents are now more flexible than ever. The improvements include such features as:

  • New items to obtain additional file information such as file owner and file permissions
  • New item which can collect agent host metadata as a metric
  • New item with which you can count matching TCP/UDP sockets
  • It is now possible to natively monitor your SSL/TLS certificates with a new Zabbix agent2 item. The item can be used to validate a TLS/SSL certificate and provide you additional certificate details
  • User parameters can now be reloaded without having to restart the Zabbix agent

In addition, a major improvement to introducing new Zabbix agent 2 plugins has been made. Zabbix agent 2 now supports loading stand-alone plugins without having to recompile the Zabbix agent 2.

Custom Zabbix password complexity requirements

One of the main improvements to Zabbix security is the ability to define flexible password complexity requirements. Zabbix Super admins can now define the following password complexity requirements:

  • Set the minimum password length
  • Define password character requirements
  • Mitigate the risk of a dictionary attack by prohibiting the usage of the most common password strings

UI/UX improvements

Improving and simplifying the existing workflows is always a priority for every major Zabbix release. In Zabbix 6.0 LTS we’ve added many seemingly simple improvements, that have major impacts related to the “feel” of the product and can make your day-to-day workflows even smoother:

  • It is now possible to create hosts directly from MonitoringHosts
  • Removed MonitoringOverview section. For improved user experience, the trigger and data overview functionality can now be accessed only via dashboard widgets.
  • The default type of information for items will now be selected automatically depending on the item key.
  • The simple macros in map labels and graph names have been replaced with expression macros to ensure consistency with the new trigger expression syntax

New templates and integrations

Adding new official templates and integrations is an ongoing process and Zabbix 6.0 LTS is no exception here’s a preview for some of the new templates and integrations that you can expect in Zabbix 6.0 LTS:

  • f5 BIG-IP
  • Cisco ASAv
  • HPE ProLiant servers
  • Cloudflare
  • InfluxDB
  • Travis CI
  • Dell PowerEdge

Zabbix 6.0 also brings a new GitHub webhook integration which allows you to generate GitHub issues based on Zabbix events!

Other changes and improvements

But that’s not all! There are more features and improvements that await you in Zabbix 6.0 LTS. From overall performance improvements on specific Zabbix components, to brand new history functions and command-line tool parameters:

  • Detect continuous increase or decrease of values with new monotonic history functions
  • Added utf8mb4 as a supported MySQL character set and collation
  • Added the support of additional HTTP methods for webhooks
  • Timeout settings for Zabbix command-line tools
  • Performance improvements for Zabbix Server, Frontend, and Proxy

Questions and answers

Q: How can you configure geographical maps? Are they similar to regular maps?

A: Geomaps can be used as a Dashboard widget. First, you have to select a Geomap provider in the Administration – General – Geographical maps section. You can either use the pre-defined Geomap providers or define a custom one. Then, you need to make sure that the Location latitude and Location longitude fields are configured in the Inventory section of the hosts which you wish to display on your map. Once that is done, simply deploy a new Geomap widget, filter the required hosts and you’re all set. Geomaps are currently available in the latest alpha release, so you can get some hands-on experience right now.

Q: Any specific performance improvements that we can discuss at this point for Zabbix 6.0 LTS?

A: There have been quite a few. From the frontend side – we have improved the underlying queries that are related to linking new templates, therefore the template linkage performance has increased. This will be very noticeable in large instances, especially when linking or unlinking many templates in a single go.
There have also been improvements to Server – Proxy communication. Specifically – the logic of how proxy frees up uncompressed data. We’ve also introduced improvements on the DB backend side of things – from general improvements to existing queries/logic, to the introduction of primary keys for history tables, which we are still extensively testing at this point.

Q: Will you still be able to change the type of information manually, in case you have some advanced preprocessing rules?

A: In Zabbix 6.0 LTS Zabbix will try and automatically pick the corresponding type of information for your item. This is a great UX improvement since you don’t have to refer to the documentation every time you are defining a new item. And, yes, you will still be able to change the type of information manually – either because of preprocessing rules or if you’re simply doing some troubleshooting.

Zabbix 6.0 LTS – The next great leap in monitoring by Alexei Vladishev / Zabbix Summit Online 2021

Post Syndicated from Alexei Vladishev original https://blog.zabbix.com/zabbix-6-0-lts-the-next-great-leap-in-monitoring-by-alexei-vladishev-zabbix-summit-online-2021/17683/

The Zabbix Summit Online 2021 keynote speech by Zabbix founder and CEO Alexei Vladishev focuses on the role of Zabbix in modern, dynamic IT infrastructures. The keynote speech also highlights the major milestones leading up to Zabbix 6.0 LTS and together we take a look at the future of Zabbix.

The full recording of the speech is available on the official Zabbix Youtube channel.

Digital transformation journey
Infrastructure monitoring challenges
Zabbix – Universal Open Source enterprise-level monitoring solution
Cost-Effectiveness
Deploy Anywhere
Monitor Anything
Monitoring of Kubernetes and Hybrid Clouds
Data collection and Aggregation
Security on all levels
Powerful Solution for MSPs
Scalability and High Availability
Machine learning and Statistical analysis
More value to users
New visualization capabilities
IoT monitoring
Infrastructure as a code
Tags for classification
What’s next?
Advanced event correlation engine
Multi DC Monitoring
Zabbix Release Schedule
Zabbix Roadmap
Questions

Digital transformation journey

First, let’s talk about how Zabbix plays a role as a part of the Digital Transformation journey for many companies.

As IT infrastructures evolve, there are many ongoing challenges. Most larger companies for example have a set of legacy systems that require to be integrated with more modern systems. This results in a mix of legacy and new technologies and protocols. This means that most management and monitoring tools need to support all of these technologies – Zabbix is no exception here.

Hybrid clouds, containers, and container orchestration systems such as K8S and OpenShift have also played an immense part in the digital transformation of enterprises. It has been a very major paradigm shift – from physical machines to virtual machines, to containers and hybrid parts. We certainly must provide the required set of technologies to monitor such environments and the monitoring endpoints unique to them.

The rapid increase in the complexity of IT infrastructures caused by the two previous points requires our tools to be a lot more scalable than before. We have many more moving parts, likely located in different locations that we need to stay aware of. This also means that any downtime is not acceptable – this is why the high availability of our tools is also vital to us.

Let’s not forget that with increased complexity, many new potential security attack vectors arise and our tools need to support features that can help us with minimizing the security risks.

But making our infrastructures more agile usually comes at a very real financial cost. We must not forget that most of the time we are working with a dedicated budget for our tools and procedures.

Infrastructure monitoring challenges

The increase in the complexity of IT infrastructures also poses multiple monitoring challenges that we have to strive to overcome:

  • Requirements for scalability and high availability for our tools
    • The growing number of devices and networks as well as the increased complexity of IT infrastructures
  • Increasingly complex infrastructures often force us to utilize multiple tools to obtain the required metrics
    • This leads to a requirement for a single pane of glass to enable centralized monitoring
  • Collecting values is often not enough – we need to be able to leverage the collected data to gain the most value out of it
  • We need a solution that can deliver centralized visualization and reporting based on the obtained data
  • Our tools need to be hand-picked so that they can deliver the best ROI in an already complex infrastructure

Zabbix – Universal Open Source enterprise-level monitoring solution

Zabbix is a Universal free and Open Source enterprise-level monitoring solution. The tool comes at absolutely no cost and is available for everyone to try out and use. Zabbix provides the monitoring of modern IT infrastructures on multiple levels.

Universal is the term that we are focusing on. Given the open-source nature of the product, Zabbix can be used in infrastructures of different sizes – from small and medium organizations to large, globe-spanning enterprises. Zabbix is also capable of delivering monitoring of the whole IT stack – from hardware and network monitoring to high-level monitoring such as Business Service monitoring and more.

Cost-Effectiveness

Zabbix delivers a large set of enterprise-grade features at no cost! Features such as 2FA, Single sign-on solutions, no restrictions when it comes to data collection methods, number of monitored devices and services, or database size.

  • Exceptionally low total cost of ownership
    • Free and Open Source solution with quality and security in mind
    • Backed by reliable vendors, a global partner network, and commercial services, such as the 24/7 support
    • No limitations regarding how you use the software
    • Free and readily available documentation, HOWTOs, community resources, videos, and more.
    • Zabbix engineers are easy to find and hire for your organization
    • Cost is fully under your control – Zabbix Commercial services are under fixed-price agreements

Deploy Anywhere

Our users always have the choice of where and how they wish to deploy Zabbix. With official packages for the most popular operating systems such as RHEL, Oracle Linux, Ubuntu, Raspberry Pi OS, and more. With official Helm charts, you can quickly also deploy Zabbix in a Kubernetes cluster or in your OpenShift instance. We also provide official Docker container images with pre-installed Zabbix components that you can deploy in your environment.

We also provide one-click deployment options for multiple cloud service providers, such as Amazon AWS, Microsoft Azure, Google Cloud, Openstack, and many other cloud service providers.

Monitor Anything

With Zabbix, you can monitor anything – from legacy solutions to modern systems. With a large selection of official solutions and substantial community backing our users can be sure that they can find a suitable approach to monitor their IT infrastructure components. There are hundreds of ready-to-use monitoring solutions by Zabbix.

Whenever you deploy a new IT solution in your enterprise, you will want to tie it together with the existing toolset. Zabbix provides many out of the box integrations for the most popular ticketing and alerting systems

Recently we have introduced advanced search capabilities for the Zabbix integrations page, which allows you to quickly lookup the integrations that currently exist on the market. If you visit the Zabbix integrations page and look up a specific vendor or tool, you will see a list of both the official solutions supported by Zabbix and also a long list of community solutions backed by our users, partners, and customers.

Monitoring of Kubernetes and Hybrid Clouds

Nowadays many existing companies are considering migrating their existing infrastructure to either solutions such as Kubernetes or OpenShift, or utilizing cloud service providers such as Amazon AWS or Microsoft Azure.

I am proud to announce, that with the release of Zabbix 6.0 LTS, Zabbix will officially support out-of-the-box monitoring of OpenShfit and Kubernetes clusters.

Data collection and Aggregation

Let’s cover a few recent features that improve the out-of-the-box flexibility of Zabbix by a large margin.

Synthetic monitoring is a feature that was introduced a year ago in Zabbix version 5.2 and it has already become quite popular with our user base. The feature enables monitoring of different devices and solutions over the HTTP protocol. By using synthetic monitoring Zabbix can connect to your HTTP endpoints, such as cloud APIs, Kubernetes, and OpenShift APIs, and other HTTP endpoints, collect the metrics and then process them to extract the required information. Synthetic monitoring is extremely transparent and flexible – it can be fine-tuned to communicate with any HTTP endpoints.

Another major feature introduced in Zabbix 5.4 is the new trigger syntax. This enables our users to define much more flexible trigger expressions, supporting many new problem detection use cases. In addition, we can use this syntax to perform flexible data aggregation operations. For example, now we can aggregate data filtered by wildcards, tags, and host groups, instead of specifying individual items. This is extremely valuable for monitoring complex infrastructures, such as Kubernetes or cloud environments. At the same time, the new syntax is a lot more simple to learn and understand when compared to the old trigger syntax.

Security on all levels

Many companies are concerned about security and data protection when it comes to the tools that they are using in their day-to-day tasks. I’m happy to tell you that Zabbix follows the highest security standards when it comes to the development and usage of the product.

Zabbix is secure by design. In the diagram below you can see all of the Zabbix components, all of which are interconnected, like Zabbix Agent, Server, Proxy, Database, and Frontend. All of the communication between different Zabbix components can be encrypted by using strong encryption protocols like TLS.

If you’re using Zabbix Agent, the agent does not require root privileges. You can run Zabbix Agent under a normal user with all of the necessary user level restrictions in place. Zabbix agent can also be restricted with metric allow and deny lists, so it has access only to the metrics which are permitted for collection by your company policies.

The connections between the Zabbix database backend and the Zabbix Frontend and Zabbix Server also support encryption as of version 5.0 LTS.

As for the frontend component – users can add an additional security layer for their Zabbix frontends by configuring 2FA and SSO logins. Zabbix 6.0 LTS also introduces flexible login password complexity requirements, which can reduce the security breach risk if your frontend is exposed to the internet. To ensure that Zabbix meets the highest standards of the company security compliance, the new Audit log, introduced in Zabbix 6.0 LTS, is capable of logging all of the Zabbix Frontend and Zabbix Server operations.

For an additional security layer – sensitive information like Usernames, Passwords, API keys can be stored in an external vault. Currently, Zabbix supports secret storage in the HashiCorp Vault. Support for the CyberArk vault will be added in the Zabbix 6.2 release.

Another Zabbix feature – the Zabbix API, is often used for the automation of day-to-day configuration workflows, as well as custom integrations and data migration tasks. Zabbix 5.4 added the ability to create API tokens for particular frontend users with pre-defined token expiration dates.

In Zabbix 5.2 we added another layer for the Zabbix Frontend user permissions – User Roles. Now it is possible to define granular user roles with different types of rights and privileges, assigned to specific types of users in your organization. With User Roles, we can define which parts of the Zabbix UI the specific user role has access to and which UI actions the members of this role can perform. This can be combined with API method restrictions which can also be defined for a particular role.

Powerful Solution for MSPs

When we combine all of these features, we can see how Zabbix becomes a powerful solution for MSP customers. MSPs can use Zabbix as an added value service. This way they can provide a monitoring service for their customers and get additional revenue out of it. It is possible to build a customer portal which is a combination of User Roles for read-only access to dashboards and customized UI, rebranding option – which was just introduced in Zabbix 6.0 LTS, and a combination of SLA reporting together with scheduled PDF reports, so the customers can receive reports on a weekly, daily or monthly basis.

Scalability and High Availability

With a growing number of devices and ever-increasing network complexity, Scalability and High availability are extremely important requirements.

Zabbix provides Load balancing options for Zabbix UI and Zabbix API. In order to scale the Zabbix Frontend and Zabbix API, we can simply deploy additional Zabbix Frontend nodes, thus introducing redundancy and high availability.

Zabbix 6.0 LTS comes with out-of-the-box support for the Zabbix Server High Availability cluster. If one of the Zabbix Server nodes goes down, Zabbix will automatically switch to one of the standby nodes. And the best thing about the Zabbix Server High Availability cluster – it takes only 5 minutes to get it up and running. the HA cluster is very easy to configure and use.

One of the features in our future roadmap is introducing support for the History API to work with different time-series DB backends for extra efficiency and scalability. Another feature that we would like to implement in the future is load balancing for Zabbix Servers and Zabbix Proxies. Combining all of these features would truly make Zabbix a cloud-native application with unlimited horizontal scalability.

Machine learning and Statistical analysis

Defining static trigger thresholds is a relatively simple task, but it doesn’t scale too well in dynamic environments. With Machine Learning and Statistical Analysis, we can analyze our data trends and perform anomaly detection. This has been greatly extended in Zabbix 6.0 LTS with Anomaly Detection and Baseline Monitoring functionality.

Zabbix 6.0 Adds an extended set of functions for trend analysis and trend prediction. These support multiple flexible parameters, such as the ability to define seasonality for your data analysis. This is another way how to get additional insights out of the data collected by Zabbix

More value to users

When I think about the direction that Zabbix is headed in, and look at the Zabbix roadmap, one of the main questions I ask is “How can we deliver more value to our enterprise users?”

In Zabbix 6.0 LTS we made some major steps to make Zabbix fit not only for infrastructure monitoring but also fit for Business Service monitoring – the monitoring of services that we provide for our end-users or internal company users. Zabbix 6.0 LTS comes with complex service level object definitions, real-time SLA reporting, multi-tenancy options, Business Service alerting options, and root cause and Impact analysis.

New visualization capabilities

It is important to present the collected data in a human-readable way. That’s why we invest a lot of time and effort in order to improve the native visualization capabilities. In Zabbix 6.0 LTS we have introduced Geographical Maps together with additional widgets for TOP N reporting and templated and multi-page dashboards.

The introduction of reports in Zabbix 5.2 allowed our users to leverage their Zabbix Dashboards to generate scheduled PDF reports with respect to user permissions. Our users can generate daily, weekly, monthly or yearly reports and send them to their infrastructure administrators or customers.

IoT monitoring

With the introduction of support for Modbus and MQTT protocols, Zabbix can be used to monitor IoT devices and obtain environmental information from different sensors such as temperature, humidity, and more. In addition, Zabbix can now be used to monitor factory equipment, building management systems, IoT gateways, and more.

Infrastructure as a code

With IT infrastructures growing in scale, automation is more important than ever. For this reason, many companies prefer preserving and deploying their infrastructure as code. With the support of YAML format for our templates, you can now keep them in a git repository and by utilizing CI/CD tools you can now deploy your templates automatically.

This enables our users to manage their templates in a central location – the git repository, which helps users to perform change management and versioning and then deploy the template to Zabbix by using CI/CD tools.

Tags for classification

Over the past few versions, we have made a major push to support tags for most Zabbix entities. The switch from applications to tags in Zabbix 5.4 made the tool much more flexible. Tags can now be used for the classification of items, triggers, hosts, business services. The tags that the users define can also be used in alerting, filtering, and reporting.

What’s next?

You’re probably wondering – what’s coming next? What are the main vectors for the future development of Zabbix?

First off – we will continue to invest in usability. While the tool is made by professionals for professionals, it is important for us to make using the tool as easy as possible. Improvements to the Zabbix Frontend, general usability, and UX can be expected very soon.

We plan to continue to invest in the visualization and reporting capabilities of Zabbix. We want all data collected by our monitoring tool to provide information in a single pane of glass. This way our users can see the full picture of their environment while also seeing the root cause analysis for the ongoing problems that we face. This way we can get most of the data that Zabbix collects.

Extending the scope of monitoring is an ongoing process for us. We would like to implement additional features for compliance monitoring. I think that we will be able to introduce a solution for application performance monitoring very soon. We’d like to make log monitoring more powerful and comprehensive. monitoring of public and private clouds is also very important for us, given the current IT paradigms.

We’d like to make sure that Zabbix is absolutely extendable on all levels. While we can already extend Zabbix with different types of plugins, webhooks, and UI modules there’s more to come in the near future.

The topic of high availability, scalability, and load balancing is extremely important to us. We will continue building on the existing foundations to make Zabbix a truly cloud-native solution.

Advanced event correlation engine

Advanced event processing is a really important topic. When we talk about a monitoring solution, we pay very much attention to the number of metrics that we are collecting. We mustn’t forget, that for large-scale environments the number of events that we generate based on those metrics is also extremely important. We need to keep control and manage the ever-growing number of different events coming from different sources. This is why we would like to focus on noise reduction, specifically – root cause analysis.

For this reason, we can expect Zabbix to introduce an advanced event correlation model in the future. This model should have the ability to filter and deduplicate the events as well as perform event enrichment, thus leading to a much better root cause analysis.

Multi DC Monitoring

Currently, Multi DC monitoring can be done with Zabbix by deploying a distributed Zabbix instance that utilizes Zabbix proxies. But there are use cases, where it would be more beneficial to have multiple Zabbix servers deployed across different datacenters – all reporting to a single location for centralized event processing, centralized visualization, and reporting as well as centralized dashboards. This is something that is coming soon to Zabbix.

Zabbix Release Schedule

Of course, the burning question is – when is Zabbix 6.0 LTS going to be released? And we are very close to finalizing the next LTS release. I would expect Zabbix 6.0 LTS to be officially released in January 2022.

As for Zabbix 6.2 and 6.4 – these releases are still planned for Q2 and Q4, 2022. The next LTS release – Zabbix 7.0 LTS is planned to be released in Q2, 2023.

Zabbix Roadmap

If you want to follow the development of Zabbix – we have a special page just for that – the Zabbix Roadmap. Here you can find up-to-date information about the development plans for Zabbix 6.2, 6.4, and 7.0 LTS. The Roadmap also represents the current development status of Zabbix 6.0 LTS.

Questions

Q: What would you say is the main benefit of why users should migrate from Zabbix 5.0/4.0 or older versions to 6.0 LTS?

A: I think that Zabbix 6.0 LTS is a very different product – even when you compare it with the relatively recent Zabbix 5.0 LTS. It comes with many improvements, some of which I mentioned here in my keynote. For example, Business Service monitoring provides huge added value to enterprise customers.

With the new trigger syntax and the new functions related to anomaly detection and baseline monitoring our users can get much more out of the data that they already have in their monitoring tool.

The new visualization options – multiple new widgets, geographical maps, scheduled PDF reporting provide a lot of added value to our end-users and to their customers as well.

Q: Any plans to make changes on the Zabbix DB backend level – make it more scaleable or completely redesign it?

A: Right now we keep all of our information in a relational database such as MySQL or PostgreSQL. We have added the support for TimescaleDB which brings some huge advantages to our users, thanks to improved data storage and performance efficiency.

But we still have users that wish to connect different storage engines to Zabbix – maybe specifically optimized to keep time-series data. Actually, this is already on our roadmap. Our plan is to introduce a unified API for historical data so that if you wish to attach your own storage, we just have to deploy a plugin that will communicate both with our historical API and also talk to the storage engine of your choosing. This feature is coming and is already on our Roadmap.

Q: What is your personal favorite feature? Something that you 100% wanted to see implemented in Zabbix 6.0 LTS?

A: I see Zabbix 6.0 LTS as a combination of Zabbix 5.2, 5.4, and finally the features introduced directly in Zabbix 6.0 LTS. Personally, I think that my favorite features in Zabbix 6.0 LTS are features that make up the latest implementation of Anomaly detection.

We could be at the very beginning of exploring more advanced machine learning and statistical analysis capabilities, but I’m pretty sure that with every new release of Zabbix there will be new features related to machine learning, anomaly detection, and trend prediction.

This could provide a way for Zabbix to generate and share insights with our users. Analysis of what’s happening with your system, with your metrics – how the metrics in your system behave.

Summary of Zabbix Summit Online 2021, Zabbix 6.0 LTS release date and Zabbix Workshops

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/summary-of-zabbix-summit-online-2021-zabbix-6-0-lts-release-date-and-zabbix-workshops/17155/

Now that the Zabbix Summit Online 2021 has concluded, we are thrilled to report we hosted attendees from over 3000 organizations from more than 130 countries all across the globe.

This year, the main focus of the speeches was the upcoming Zabbix 6.0 LTS release, as well as speeches focused on automating Zabbix data collection and configuration, Integrating Zabbix within existing company infrastructures, and migrating from legacy tools to Zabbix. 21 speakers in total presented their use cases and talked about new Zabbix features during the Summit with over 8 hours of content.

In case you missed the Summit or wish to come back to some of the speeches – both the presentations (in PDF format) and the videos of the speeches are available on the Zabbix Summit Online 2021 Event page.

Zabbix 6.0 LTS release date

As for Zabbix 6.0 LTS – as per our statement during the event, you can expect Zabbix 6.0 LTS to release in early 2022. At the time of this post, the latest pre-release version is Zabbix 6.0 Alpha 7, with the first Beta version scheduled for release VERY soon. Feel free to deploy the latest pre-release version and take a look at features such as Geomaps, Business Service monitoring, improved Audit log, UX improvements, Anomaly detection with Machine Learning, and more! The list of the latest released Zabbix 6.0 versions as well as the improvements and fixes they contain is available in the Release notes section of our website.

Zabbix 6.0 LTS Workshops

The workshops will focus on particular Zabbix 6.0 LTS features and will be available once the Zabbix 6.0 LTS is released. The workshops will provide a unique chance to learn and practice the configuration of specific Zabbix 6.0 LTS features under the guidance of a certified Zabbix trainer at absolutely no cost! Some of the topics covered in the workshops will include – Deploying Zabbix server HA cluster, Creating triggers for Baseline monitoring and Anomaly detection, Displaying your infrastructure status on Geomaps, Deploying Business Service monitoring with root cause analysis, and more!

Upcoming events

But there’s more! On December 9 2021 Zabbix will host PostgreSQL Monitoring Day with Zabbix & Postgres Pro. The speeches will focus on monitoring PostgreSQL databases, running Zabbix on PostgreSQL DB backends with TimescaleDB, and securing your Zabbix + PostgreSQL instances. If you’re currently using PostgreSQL DB backends r plan to do so in the future – you definitely don’t want to miss out!

As for 2022 – you can expect multiple meetups regarding Zabbix 6.0 LTS features and use cases, as well as events focused on specific monitoring use cases. More information will be publicly available with the release of Zabbix 6.0 LTS.