Tag Archives: conferences

Reimagining Democracy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/12/reimagining-democracy.html

Last week, I hosted a two-day workshop on reimagining democracy.

The idea was to bring together people from a variety of disciplines who are all thinking about different aspects of democracy, less from a “what we need to do today” perspective and more from a blue-sky future perspective. My remit to the participants was this:

The idea is to start from scratch, to pretend we’re forming a new country and don’t have any precedent to deal with. And that we don’t have any unique interests to perturb our thinking. The modern representative democracy was the best form of government mid-eighteenth century politicians technology could invent. The twenty-first century is a very different place technically, scientifically, and philosophically. What could democracy look like if it were reinvented today? Would it even be democracy­—what comes after democracy?

Some questions to think about:

  • Representative democracies were built under the assumption that travel and communications were difficult. Does it still make sense to organize our representative units by geography? Or to send representatives far away to create laws in our name? Is there a better way for people to choose collective representatives?
  • Indeed, the very idea of representative government is due to technological limitations. If an AI system could find the optimal solution for balancing every voter’s preferences, would it still make sense to have representatives­—or should we vote for ideas and goals instead?
  • With today’s technology, we can vote anywhere and any time. How should we organize the temporal pattern of voting—­and of other forms of participation?
  • Starting from scratch, what is today’s ideal government structure? Does it make sense to have a singular leader “in charge” of everything? How should we constrain power­—is there something better than the legislative/judicial/executive set of checks and balances?
  • The size of contemporary political units ranges from a few people in a room to vast nation-states and alliances. Within one country, what might the smaller units be­—and how do they relate to one another?
  • Who has a voice in the government? What does “citizen” mean? What about children? Animals? Future people (and animals)? Corporations? The land?
  • And much more: What about the justice system? Is the twelfth-century jury form still relevant? How do we define fairness? Limit financial and military power? Keep our system robust to psychological manipulation?

My perspective, of course, is security. I want to create a system that is resilient against hacking: one that can evolve as both technologies and threats evolve.

The format was one that I have used before. Forty-eight people meet over two days. There are four ninety-minute panels per day, with six people on each. Everyone speaks for ten minutes, and the rest of the time is devoted to questions and comments. Ten minutes means that no one gets bogged down in jargon or details. Long breaks between sessions and evening dinners allow people to talk more informally. The result is a very dense, idea-rich environment that I find extremely valuable.

It was amazing event. Everyone participated. Everyone was interesting. (Details of the event—emerging themes, notes from the speakers—are in the comments.) It’s a week later and I am still buzzing with ideas. I hope this is only the first of an ongoing series of similar workshops.

Security and Human Behavior (SHB) 2022

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/05/security-and-human-behavior-shb-2022.html

Today is the second day of the fifteenth Workshop on Security and Human Behavior, hosted by Ross Anderson and Alice Hutchings at the University of Cambridge. After two years of having this conference remotely on Zoom, it’s nice to be back together in person.

SHB is a small, annual, invitational workshop of people studying various aspects of the human side of security, organized each year by Alessandro Acquisti, Ross Anderson, Alice Hutchings, and myself. The forty or so attendees include psychologists, economists, computer security researchers, sociologists, political scientists, criminologists, neuroscientists, designers, lawyers, philosophers, anthropologists, geographers, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.

For the past decade and a half, this workshop has been the most intellectually stimulating two days of my professional year. It influences my thinking in different and sometimes surprising ways—and has resulted in some unexpected collaborations.

Our goal is always to maximize discussion and interaction. We do that by putting everyone on panels, and limiting talks to six to eight minutes, with the rest of the time for open discussion. Because everyone was not able to attend in person, our panels all include remote participants as well. The hybrid structure is working well, even though our remote participants aren’t around for the social program.

This year’s schedule is here. This page lists the participants and includes links to some of their work. As he does every year, Ross Anderson is liveblogging the talks.

Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, eleventh, twelfth, thirteenth, and fourteenth SHB workshops. Follow those links to find summaries, papers, and occasionally audio/video recordings of the various workshops. Ross also maintains a good webpage of psychology and security resources.

Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021

Post Syndicated from Wolfgang Alper original https://blog.zabbix.com/zabbix-meets-television-clever-use-of-zabbix-features-by-wolfgang-alper-zabbix-summit-online-2021/19181/

TV broadcasting infrastructures have seen many great paradigm shifts over the years. From TV to live streaming – the underlying architecture consists of many moving parts supplied by different vendors and solutions. Any potential problems can cause critical downtimes, which are simply not acceptable. Let’s look at how Zabbix fits right into such a dynamic and ever-changing environment.

The full recording of the speech is available on the official Zabbix Youtube channel.

In this post, I will talk about how Zabbix is used in ZDF – Zweites Deutsche Fernsehen (Second German Television). I will specifically focus on the most unique and interesting use cases, and I hope that you will be able to use this knowledge in your next project.

ZDF – Some history

Before we move on with our unique use cases, I would like to introduce you to the history of ZDF. This will help you understand the scope and the potential complexity and scale of the underlying systems and company policies.

  • In 1961, the federal states established a central non-profit television broadcaster – Zweites Deutsches Fernsehen
  • In 1963 on April 1, ZDF officially went on air and had reached 61 percent of television viewers
  • On the Internet, a selection of programs is offered via live stream or video-on-demand through the ZDFmediathek, which has been in existence since 2001
  • Since February 2013, ZDF has been broadcasting its programs around the clock as an internet live stream
  • As of today, ZDF is one of the largest public broadcasters in Europe with permanent bureaus worldwide and is also present on various platforms like Youtube, Facebook, etc.

Here we can see that over the years, ZDF has made some major leaps – from a television broadcaster with the majority percentage of viewers to offering on-demand video service and moving to 24/7 internet live streams. ZDF has also scaled up its presence along with multiple different digital platforms as well as its physical presence all over the globe.

Integrating Zabbix with an external infrastructure monitoring system

In our first use case, we will cover integrating Zabbix with an external infrastructure monitoring system. As opposed to monitoring IT metrics like hard drive space, memory usage, or CPU loads – this external system is responsible for monitoring devices like power generators, transmission stations, and other similar components. The idea was to pass the states of these components to Zabbix. This way, Zabbix would serve as a central “Umbrella” monitoring system.

In addition, the components that are monitored by the external system have states and severities, but the severities are not static and can vary depending on the monitored component. What this means is that each component could generate problems of varying severities. We had to figure out a way to assign the correct severities to each of the external components. Our approach was split into multiple steps:

  • Use Zabbix built-in HTTP check to get LLD discovery data
    • The external monitoring system provides an API, which we can use to obtain the necessary LLD information by using the HTTP checks
    • Zabbix-sender was used for testing since the HTTP items support receiving data from it
  • Use Zabbix built-in HTTP check as a collector to obtain the component status metrics
  • Define item prototypes as dependant items to extract data from collector item
  • Create “smart “trigger prototypes to respect severity information from the LLD data

The JSON below is an example of the LLD data that we are receiving from the external monitoring systems. In addition to component names, descriptions, and categories, we are also providing the severity information. The severities that have a value of -1 are not used, while other severities are cross-checked with the status value retrieved from the returned metrics:

{
"{#NAME}": "generator-secondary",
"{#DISPLAYNAME}": "Secondary power generator",
"{#DESCRIPTION}": "Secondary emergency power generator",
"{#CATEGORY}": "Powersupply",
"{#PRIORITY.INFORMATION}": -1,
"{#PRIORITY.WARNING}": -1,
"{#PRIORITY.AVERAGE}": -1,
"{#PRIORITY.HIGH}": 1,
"{#PRIORITY.DISASTER}": 2
}

Below we can see the returned metrics – the component name and its current status. For example, status = 1 value references the {#PRIORITY.HIGH} from the LLD JSON data.

"generator-primary": {
"status": 0,
"message": "Generator is healthy."
},
"generator-secondary": {
"status": 1,
"message": "Generator is not working properly."
},

We can see that the first generator returns status = 0, which means that the generator is healthy and there are no problems, while the secondary generator is currently not working properly – status = 1 and should generate a problem with severity High.

Below we can see how the item prototypes are created for each of the components – one item prototype collects the message information, while the other collects the current status of the component. We use JSONPath preprocessing to obtain these values from our master item.

As for the trigger prototypes – we have defined a trigger prototype for each of the trigger severities. The trigger prototypes will then create triggers depending on the information contained in the LLD macros for a given component.

As you can see, the trigger expressions are also quite simple – each trigger simply checks if the last received component status matches the specific trigger threshold status value.

The resulting metrics provide us both the status value and the component status message. As we can see, the triggers are also generating problems with dynamic severities.

Improving the solution with LLD overrides

The solution works – but we can do better! You might have already guessed the underlying issue with this approach: our LLD rule creates triggers for every severity, even if it isn’t used. The threshold value for these unused triggers will use value -1, which we will never receive, so the unused triggers will always stay in the OK state. Effectively – we have created 5 trigger definitions, while in our example, we require only 2 triggers.

How can we resolve this? Thankfully, Zabbix provides just the right tool for the job – LLD Overrides! We have created 5 overrides on our discovery rule – one for each severity:

In the override conditions, we will specify that if the value contained in the priority LLD macros is equal to -1, we will not be discovering the trigger of the specific severity.

The final result looks much cleaner – now we have only two trigger definitions instead of five. 

 

This is a good example of how we can use LLD together with master items obtaining data from external APIs and also improve the LLD logic by using LLD overrides.

“Sphinx” application monitoring using Graylog REST API

For our second example, we will be monitoring the Sphinx application by using the Graylog REST API. Graylog is a log management tool that we use for log collection – it is not used for any kind of alerting. We also have an application called Sphinx, which consists of three components – a Web component, an App component, and a WCF Gateway component. Our goal here is to:

  • Use Zabbix for evaluating error messages related to Sphinx from Graylog
  • Monitor the number of errors in user-defined time intervals for different components and alert when a threshold is exceeded
  • Analyze the incoming error message and prepare them for a user-friendly output sorted by error types

The main challenges posed by this use-case are:

  • How to obtain Sphinx component information from Graylog
  • How to handle certificate problems (DH_KEY_TOO_SMALL / Diffie-Hellman key) due to an outdated version of the installed Graylog server
  • How to sort the error messages coming in “Free form” without explicit error types

Collecting the data from Graylog

Since the Graylog application used in the current scenario was outdated, we had to work around the certificate issues by using the Zabbix external check item type. Once again, we will be using master and dependent item logic – we will create three master items (one for each component) and retrieve the component data. All additional information will be retrieved by the dependent items as to not cause extra performance impact by flooding the Graylog API endpoint. The data itself was parsed and sorted by using Javascript preprocessing. The dependent item prototypes are used here to create the items for the obtained stats and the data used for visualizing each error type on a user-friendly dashboard.

Let’s take a look at the detailed workflow for this use case:

  • An External check for scanning the Graylog stream Sphinx App Raw
  • A dependent item which analyzes and filters the raw data by using preprocessing Sphinx App Raw Filtered
  • This dependent item is used as a master item for our LLD Sphinx App Error LLD
  • The same dependent item is also used as a master item for our item prototypes – Sphinx App Error count and Sphinx App Error List

Effectively this means that we perform only a single call to the Graylog API, and all of the heavy lifting is done by the dependent item in the middle of our workflow.
The following workflow is used to obtain the information only about the App component – remember, we have two other components where this will have to be implemented – Web and Gateway.

In total, we will have three master items for each of the APP components:

They will use the following shell script to execute the REST API call to the Graylog API:

graylog2zabbix.sh[{$GRAYLOG_USERNAME},{$GRAYLOG_PASSWORD},{HOST.CONN},{$GRAYLOG_PORT},search/universal/relative?
query=name%3Asphinx-app%20AND%20stage%3Aproduction%20AND%20level%3A(ERROR%20OR
%20FATAL)&range=1800&limit=50&filter=streams%3A60000a8c1c09f9862279966e&fields=name%2Clevel
%2Cmessage&decorate=true]

The data that we obtain this way is extremely hard to work with without any additional processing. It very much looks like a set of regular log entries – this complicates the execution of any kind of logic in reaction to receiving this kind of data:

For this reason, we have created a dependent item, which uses preprocessing to filter and sort this data. The dependent item preprocessing is responsible for:

  • Analyzing the error messages
  • Defining the error type
  • Sorting the raw data so that we can work with it more easily

We have defined two preprocessing steps to process this data. We have the JSONPath preprocessing step to select the message from the response and a Javascript preprocessing script that does the heavy lifting. You can see the Javascript script below. It uses Regex and performs data preparation and sorting. In the last line, you can see that the data is transformed back into JSON, so we can work with it down the line by using the JSONpath preprocessing steps for our dependent items.

Below we can see the result. The data stream has been sorted and arranged by error types, which you can see on the left-hand side. All of the logged messages are now children that belong to one of these error types.

We have also created  3 LLD rules – one for each component. These LLD rules create items for each error type for each component. To achieve this, there is also some additional JSONPath and Javascript preprocessing done on the LLD rule itself:

The end result is a dashboard that uses the collected information to display the error count per component. Attached to the graph, we can see some additional details regarding the log messages related to the detected errors.

Monitoring of TV broadcast trucks

I would like to finish up this bost by talking about a completely different use case – monitoring of TV broadcast trucks!

In comparison to the previous use cases – the goals and challenges here are quite unique. We are interested in a completely different set of metrics and have to utilize a different approach to obtain them. Our goals are:

  • Monitor several metrics from different systems used in the TV broadcast truck
  • Monitor the communication availability and quality between the broadcast truck and the transmitting station
  • Only monitor the broadcast truck when it is in use

One of the main challenges for this use case is avoiding false alarms. How can we avoid false positives if a broadcast truck can be put into operation at any time without notifying the monitoring team? The end goal is to monitor the truck when it’s in use and stop monitoring it when it’s not in use.

  • Each broadcast truck is represented by a host in Zabbix – this way, we can easily put it into maintenance
  • A control host is used to monitor the connection states of all broadcasting trucks
  • We decided on creating a middleware application that would be able to implement start/stop monitoring logic
    • This was achieved by switching the maintenance on/off by using the Zabbix API
  • A specific application in the broadcasting truck then tells Zabbix how long to monitor it and when to enable the maintenance for the said truck

Below we can see the truck monitoring workflow. The truck control host gets the status for each truck to decide when to start monitoring the truck. The middleware then starts/stops the monitoring of a truck by using Zabbix API to control the maintenance periods for the trucks. Once a truck is in service, it also passes the monitoring duration to the middleware, so the middleware can decide when the monitoring of a specific truck should be turned off.

Next, let’s look at the truck control workflow from the Zabbix side.

  • Each broadcast truck is represented by a single trigger on the control host
    • The trigger actions forward the information that the truck maintenance period should be disabled to the middleware
  • Middleware uses the Zabbix API to disable the maintenance for the specific truck
  • The truck is now monitored
  • The truck forwards the Monitoring duration to the middleware
  • Once the monitoring duration is over, the middleware enables the maintenance for the specific truck

Finally, the trucks are displayed on a map which can be placed on our dashboards. The map displays if the truck is maintenance (not active) and if it has any problems. This way, we can easily monitor our broadcast truck fleet.

From gathering data from external systems to performing complex data transformations with preprocessing and monitoring our whole fleet of broadcast trucks – I hope you found these use cases useful and were able to learn a thing or two about the flexibility of different Zabbix features!

The post Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Defining flexible problem thresholds with the new trigger syntax by Sergey Simonenko / Zabbix Summit Online 2021

Post Syndicated from Sergey Simonenko original https://blog.zabbix.com/defining-flexible-problem-thresholds-with-the-new-trigger-syntax-by-sergey-simonenko-zabbix-summit-online-2021/19091/

Introduced in Zabbix 5.4, the new trigger expression syntax enables a problem detection logic that is more sophisticated and flexible than ever before. In addition to changing the syntax, the existing trigger functions have also been reworked. Many new functions have been added, redundant functions have been removed while existing functions have been improved to support many new use cases. In this blog post, we will take a look at the new trigger syntax and functions, as well as further trigger function improvements that will be added in Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

New syntax and functions

Changing the trigger syntax was one of the major improvements that we rolled out between Zabbix 5.0 LTS and Zabbix 6.0 LTS. The new syntax helped us get rid of multiple limitations that restricted the flexibility of the old syntax. At the same time, we were able to make the syntax more simple and intuitive, as well as unify it for usage in trigger expressions, calculated items, map labels, and more.

Let’s compare how calculated items and triggers look with the new syntax:

  • Let’s look at a calculated item
avg(/host/system.cpu.util,1h)
  • And compare it with a trigger expression
avg(/host/system.cpu.util,1h)>25

As you can see, both expressions look extremely similar. The only major difference is the usage of a threshold value in the trigger expression. Note that most of the functions can be used in calculated items and triggers.

Note –  when you’re upgrading to Zabbix 6.0 LTS, your triggers calculated and aggregated items will be automatically converted to the new syntax!

Smart parameters

One of the major improvements brought on by the new syntax is the support of smart parameters. It is now no longer necessary to pass an exact host and item to every function. Only history and protective functions require them. This means that, for example, date and time functions like now and time don’t require a host/item reference:

  • History function:
last(/host/item)=“success”
  • Date and time function:
time()>=090000

Note that it is still required to have at least one function which references a host/item in the expression.

Time and time shift

While designing the new trigger syntax, we also made a decision to combine time and time shift parameters into a single parameter:

(sec|#num)<:time shift>

We can now group the time shift expression into two types: absolute time shift and relative time shift.

With relative time shift, we can add or subtract time units to analyze metrics collected during some time period. Here  you can see a relative time shift for analyzing the data for one hour (1h) from the previous day (now-1d):

1h:now-1d

The absolute time periods can be recognized by the forward-slash symbol after the now, which references the current time. We then have to specify the time unit that we wish to use, like day or week (or w). Absolute time periods analyze data based on the time interval which is used. For example, in the case of the day period, the function will analyze the data from midnight to midnight, or in the case of the week – from Monday to Sunday. Here are some examples of  absolute time shifts:

  • An hour one day ago:
1h:now-1d
  • Yesterday
1d:now/d
  • Today
1d:now/d+1d
  • Last 2 days
2d:now/d+1d
  • Last week
1w:now/w
  • This week
1w:now/w+1w

Nested functions

The new trigger syntax also allows us to write nested functions. This means, that now we can use the returned value of one function as a parameter for another function. For example, instead of using the abschange function, which has now been removed, we can obtain an absolute value by using abs as a nested function. This way, we can obtain an absolute value for a result of another function:

abs(last(/host/item))

Similarly, we have replaced the strlen function. Now we can use the length function and obtain the length from any string value returned by another nested function:

length(find(/host/item,“pattern”))

We can also use functions such as min, max, and many other functions to obtain value from multiple nested functions. For example, here is how we can obtain a minimum value from the two resulting last values:

min(last(/host1/item),last(/host2/item))

New trigger functions

Trigger functions are now grouped according to their purpose and functionality. This can be seen both in the frontend and in our documentation:

  • History functions – operate with historical data
  • Aggregate functions – allow to sum, find minimums, maximums and perform other aggregations on your values
  • Operator functions – check if a value belongs in range/is one of the acceptable values.
  • Mathematical functions – perform mathematical operations like finding absolute values, rounding your values, obtaining logarithm values, and more.
  • Date and time functions like date, now, time, etc.

New string and math functions

We have greatly expanded the number of available string and math functions. Now you can find a specific character in a string, perform multiple types of trims on the value, obtain byte or bit lengths, and more:

  • left, right, mid – character(s) at a given position
  • insert, replace, concat – modify the string value
  • trim, ltrim, rtrim – different types of trim functions
  • ascii, bitlength, bytelength – obtain ascii code of the leftmost character or value length in bits or bytes

The greatly expanded set of mathematical functions enables our users to analyze different types of metrics:

  • sin, tan, cos – functions for angle values
  • exp, expm1 – Euler’s number of a value
  • log, log10 – logarithm of a value
  • rand – return a random integer

Operator functions

The operator functions used in the old syntax have been simplified. Now you can use these functions to write more compact and more readable trigger expressions:

  • Detecting if the obtained value is between two values with the old trigger syntax:
{HOST:ITEM.last()}>=1 and {HOST:ITEM.last()}<=10
  • Detecting if the obtained value is between two values with the new trigger syntax:
between(last(/host/item),1,10)=1
  • Detecting if the obtained value is equal to a value within a set of values with the old syntax:
{HOST:ITEM.last()}=1 or {HOST:ITEM.last()}=2 or {HOST:ITEM.last()}=3…
  • Detecting if the obtained value is equal to a value within a set of values with the new syntax:
in(last(/host/item),1,2,3,…)=1

New history and aggregate functions

Zabbix 6.0 LTS adds a couple of new history and aggregate functions which once again help you to define dynamic expressions in a very simple manner:

  • monoinc, monodec – detect monotonic increase or decrease in a set of historical values
    • Allows to detect unexpected data growth or data decrease, for example – growth in a message queue
  • changecount – count the number of changes (all changes or only increases or decreases) between adjacent historical values
  • rate, bucket_percentile, histogram_quantile – Functions that improve the analysis of Prometheus exporter metrics

Additional changes

Some of the redundant functions have also been removed. We observed that these functions often time caused additional confusion and clutter, so functions such as delta, diff, and prev have been removed:

  • Instead of  delta use:
max(/host/item, #100) - min(/host/item, #100)
  • Instead of diff use:
last(/host/item) != last(/host/item, #2)
  • Instead of prev use:
last(/host/item, #2)

Aggregate calculations

If you have used aggregate calculations before Zabbix 5.4, you may recall that we had a separate type of item explicitly for defining aggregate checks. This could cause some confusion, since both calculated items and aggregate checks served a similar purpose but had to be configured in different ways. For example, calculated items had a separate formula field, where the calculated item logic was performed, while the item key could be defined in an arbitrary fashion. On the other hand, in aggregate checks, the aggregation formula had to be defined in the item key itself – with strict key syntax in mind. In Zabbix 5.4, we finally solved this by removing the aggregate check item type and allowing for aggregate checks to be defined as a calculated item:

  • Aggregate checks are now a part of calculated items
  • Old syntax allowed to perform aggregate calculations based only on a single host group and an exact item key
    • We have introduced the ability to use filters and wildcards to address multiple host groups and keys
    • This was a top-voted feature request from the Zabbix community

The new syntax is not limited to a single host group for aggregate calculations. You can use tags, multiple host groups, and complex and/or logical operations with multiple clauses. For example, this is how you would calculate the average CPU load on a certain set of servers:

avg(last_foreach(/*/system.cpu.load?[group="Servers A" or group=“Servers B" or (group=“Servers C" and tag=“Importance:High")]))

Let’s deconstruct the expression below:

  • We can see that this is a nested expression. The last_foreach function returns an array of values – the last value for each matching item. These values will be used to calculate the average value of our CPU load, as per the initial avg function
  • You can think of the question mark as a WHERE statement in SQL. It signifies that we will try and pick up the item values from hosts in matching host groups or matching specific tags
  • We are collecting CPU load values from hosts in Servers A or Servers B host groups
  • We are also picking up CPU load values from the Servers C host group if the tag and tag value match Importance:High

Aggregating discovered items

The new syntax can also be extremely helpful in use cases where we use low-level discovery (LLD) to discover our entities. For the sake of an example, let’s imagine that we are discovering network interfaces. Previously, if we wanted to perform some aggregations on the discovered entities, we had to create an aggregate item that would contain information about all of the discovered items. This caused an issue when a new interface was discovered – we had to manually adjust the aggregate item.

The support of wildcards in aggregate calculations resolves this problem. Let’s look at an example:

sum(last_foreach(/*/net.if.in[*,bytes]?[group=“Customer A”]))

Instead of explicitly specifying an interface in the item key parameters, we are using a wild card – any interface discovered on hosts in the Customer A host group will be used in the aggregate calculation. This way, we will obtain the sum of incoming traffic for Customer A.

But this is just a high-level overview of the most commonly used functions and new interesting use cases. If you wish to see the full list of supported functions together with examples of how to use them – please take a look at our documentation. If you have any questions or additional use-cases that you wish to discuss, don’t hesitate and leave a comment right below this post!

The post Defining flexible problem thresholds with the new trigger syntax by Sergey Simonenko / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Securing Zabbix 6.0 LTS by Kārlis Saliņš / Zabbix Summit Online 2021

Post Syndicated from Kārlis Saliņš original https://blog.zabbix.com/securing-zabbix-6-0-lts-by-karlis-salins-zabbix-summit-online-2021/18962/

Security is an essential dimension of any tool in your IT infrastructure, and Zabbix is no exception. With Zabbix 6.0 LTS, our users will be able to secure their Zabbix instance on multiple layers – from encrypting your network communication to flexible user access control,  API token provisioning, and custom user password policies. Let’s take a look at the full set of security features that Zabbix 6.0 LTS provides to its users.

The full recording of the speech is available on the official Zabbix Youtube channel.

Why do we need security?

Over time the security standards for IT infrastructures and software have greatly developed. We now have a robust set of standards that we have to follow to ensure the uninterrupted and secure delivery of internal and external services. We also need to ensure that any sensitive information stays confidential and minimize the risk for any possible data breaches. In the case of Zabbix, we have the ability to define a flexible set of security measures to ensure our data stays secure:

  • We can encrypt our connections between every Zabbix component to protect our data
    • Multiple encryption methods are supported
    • Zabbix administrators can configure supported cipher suites based on their company policy
  • Role-based access enables Zabbix administrators to define a flexible set of roles to restrict access to confidential information
    • This is especially vital for multi-tenant environments, where a single Zabbix instance is shared between multiple customers
  • Audit logging adds a layer of visibility that can help us detect potential security or configuration problems before they have a real impact

Security in Zabbix

Zabbix consists of multiple components – some of which are mandatory while others are optional. In the diagram below, you can see how the connection between every component can be encrypted. Users have the ability to choose which parts of their Zabbix infrastructure should be encrypted.

In addition to encrypting the communication between all of the Zabbix components, Zabbix administrators can also create flexible user roles for granular access control. The user passwords are stored in the database, encrypted by using bcrypt. With Zabbix 6.0 LTS, we can also define Zabbix password complexity policies – I’m going to cover that specific feature a bit further in this blog post.

As for your secrets – authentication credentials for your devices, SSH usernames and passwords, and other sensitive information – these can be stored in the HashiCorp Vault, which is an optional component specifically for storing your secrets in a very secure fashion.

Zabbix user types

You can have 3 types of users in Zabbix. We need to understand the access restrictions that these user types enforce, before we can talk more about user roles:

  • Zabbix Super Admin
    • Unlimited access to everything
  • Zabbix Admin
    • Can create hosts and templates
    • Permission-based access to Zabbix entities
  • Zabbix User
    • Permission-based access to Zabbix entities
    • Has access only to the monitoring information
    • Has no access to configuration sections in Zabbix GUI

User roles can be used to create a  user role of a particular type and further restrict the access for all users that belong to this role. For example, we can have a User Role that is based on the Super Admin type but, instead of having access to every Zabbix section, we can restrict the members of this role to have access only to parts of Zabbix Administration or Configuration sections.

Obtaining API tokens

If you plan on using the Zabbix API, the first step to executing your API workflows is issuing a user.login API call and obtaining an API token, which will be used in all future API calls until the user is logged out.

Now we have a better way of generating the API tokens. You can now generate the API tokens by accessing Administration – General – API tokens. Here you can generate new API tokens, select the user who can use the particular token, and also define an expiration date for the token.

Once the token has been created, we will see a confirmation screen with the generated token information. You should copy and save the token here since you will not be able to obtain the token later! If you forget to save it, you can always generate a new token.

Below we can see a list of our API tokens, their expiration dates, names, users, and other information:

Using the API tokens is exactly the same as you would use them with the old approach. Simply place your token in the auth field of your API call and, if the token is valid and the user has the necessary permissions, the API call will succeed:

Secret macros

Macros can be used in many different parts of Zabbix – from using them in your triggers to create dynamic trigger thresholds to using them to define user names and passwords for different types of checks, such as SNMP, SSH, Zabbix agent checks, and many others. Now we can add an extra layer of obfuscation for our macros by creating secret macros:

  • The value of a secret macro is not displayed in the frontend
  • The value of a secret macro does not get cloned or exported with host/template export
  • Secret macros are stored in the Zabbix database
  • Database connections and access must be secured to prevent the potential attackers from obtaining these values

Zabbix + HashiCorp vault

 

The HashiCorp Vault is an optional component that we can use for storing Zabbix secrets, thus providing an additional layer of security. The secrets are stored in the vault, and every Zabbix reloads the Zabbix configuration cache, the secrets are picked up from the vault (every minute by default):

  • HashiCorp Vault can be used for storing secrets such as authentication information and  Zabbix database credentials
  • the HashiCorp Vault provides a unified interface to access any secret while ensuring tight access control and keeping a detailed audit log
  • Initially, the vault is sealed and must be unsealed using unseal keys
  • Secrets are stored in the Zabbix configuration cache
  • The values of secrets are retrieved on every Zabbix configuration cache update

Initially, we have to provide the vault token and vault URL in the Zabbix server configuration file. If we choose to store the Zabbix database credentials in the secret vault, we also have to provide the path to the location of the secret:

### Option: VaultToken
# Vault authentication token that should have been generated
# exclusively for Zabbix server with read only permission
VaultToken=verysecretrandomlygeneretedvaultstring

### Option: VaultURL
# Vault server HTTP[S] URL. System wide CA certificates directory
# will be used if SSLCALocation is not specified.
VaultURL=https://my.organization.vault:8200

### Option: VaultDBPath
# Vault path from where credentials for database will be retrieved
# by keys 'password' and 'username'.
VaultDBPath=my/secret/location

Security improvements in Zabbix 6.0 LTS

The security improvements that we discussed previously have been available in the previous non-LTS versions. If you’re migrating from Zabbix 5.0 LTS to Zabbix 6.0 LTS all of these will be new to you. Next, let’s cover the improvements that have been added specifically in the Zabbix 6.0 LTS. If you’re using the latest major Zabbix version –  Zabbix 5.4, then all of the following features await you in Zabbix 6.0 LTS.

Improved Audit log

The audit log has seen some major improvements. Not only is the audit log capable of logging every operation performed by the Zabbix server and Zabbix frontend, but it has also received an internal re-design to ensure minimal possible performance impact and improved UX:

  • Improved API operation logging
  • Scalability improvements when logging a large number of audit log entries
  • Quality of life improvements when it comes to filtering the audit log
  • More types of operations are now logged in the audit log
    • Script execution
    • Global macro changes
    • Changes resulting from the execution of low-level discovery rules
    • And much more

The above image shows that not only new types of operations are being logged in the audit log, but we can also see the result of those operations, such as – results of script executions, macro changes, new users being created in Zabbix, and more.

User password complexity settings

One of the major security improvements that are introduced in Zabbix 6.0 LTS is the ability to define custom password complexity requirements. Zabbix administrators can select between multiple password complexity requirements and apply them for their Zabbix instance:

  • Prevent using simple passwords like password
  • Set the minimum password length
  • Prevent the usage of  name/last name/username in the password
  • Prevent usage of easy to guess passwords, such as abcd1234, asdf1234
    • The passwords analysis is based on a dictionary of ~1 million most common passwords
  • The default Admin password on a fresh Zabbix instance is not changed – you have to update it after the new Zabbix instance is deployed!

Monitoring of certificate attributes

Another dimension of security that has improved with the release of Zabbix 6.0 LTS – security monitoring. Now we can use Zabbix Agent 2 to monitor SSL certificate attributes:

  • Zabbix Agent 2 built-in plugin for certificate monitoring available starting from Zabbix Agent 2 5.0.15
  • Collects information about the website certificate
  • Official templates are available out of the box and can be downloaded on our git page
  • web.certificate.get[hostname,<port>,<address

Below you can see an example of the certificate information obtained from the www.zabbix.com website. Here’s what the item key looks like for our example:

zabbix_agent2 -t web.certificate.get[www.zabbix.com]

And the output that is received by Zabbix:

As we can see, the resulting information tells us that the certificate is valid and has been verified successfully. Next, let’s take a look at a different kind of result:

Here we can see that the certificate is still valid, but we also receive a warning that it’s a self-signed certificate.

You may have noticed that the information obtained by this item comes in a JSON containing multiple values. That’s because most of the .get items obtain information in bulk. We can use this item as a master item and create multiple dependent items, each of which will collect an individual value. This can be configured by using JSONPath preprocessing.

Let’s take a look at some trigger examples that we can use together with the certificate monitoring item:

  • Check if the certificate is invalid
    • {HOST:cert.validation.str("invalid")} = 1
  • Check if the certificate expires in less than 7 days
    • ({HOST:cert.not_after.last()} - {HOST:cert.not_after.now ()}) / 86400 < 7
  • Check if the Certificate fingerprint has changed
    • {TEMPLATE_NAME:cert.sha1_fingerprint.diff()}=1

User permissions in service trees

With the reworked business service monitoring, we have also added the ability to define access permissions for a particular service or multiple services that have a matching tag. Permissions can be defined for Read-only access or Read-write access.

In the example below, we can see three child services. Our user has read-only access for the first two services while having read-write permissions for the third service.

If you wish to read more about the re-designed Services section and Zabbix business service monitoring, you can find a blog post based on the Zabbix Summit 2021 speech on the topic here.

Zabbix security best practices

Finally, let’s go through a checklist of best practices that will help you to secure your environment on all levels.

  • Define proper user roles for different types of Zabbix users in your organization
    • If you have a multi-tenant Zabbix environment, make sure that each tenant can access only the information related to their organization!
  • Enable HTTPS on your Zabbix web frontend
    • Unencrypted connections to the web frontend mean that an attacker could potentially obtain sensitive information
  • Encrypt your connections to the Zabbix database
  • Use the secret vault to harden your security and store your secrets in a vault
  • Encrypt the traffic between the Zabbix server, Zabbix proxies, and Zabbix agents
    • Here you can choose between PSK and Certificate-based encryptions
    • Connections between command-line tools like zabbix-sender can and should also be encrypted!
  • Agentless checks should also be configured in a secure manner
    • HTTP authentication and SSL for HTTP checks
    • Public key authentication for SSH checks
  • Define custom password complexity settings to prevent weak passwords for your Zabbix users
  • Define allow and deny lists for the item keys in your Zabbix agent configuration
    • This way you can restrict the collection of sensitive metrics by potential attackers
  • Use secret macros instead of displaying your macro values in plain-text
  • Keep Zabbix updated and update to the latest minor Zabbix release
    • Minor releases contain critical bug and security fixes
    • The upgrade process is extremely simple since no changes are made to the database schema in the minor version updates

Advanced Zabbix Security Administration course

If you wish to learn more about securing your Zabbix instance and get some practical experience in a controlled environment while being guided by Zabbix certified trainers – then the Advanced Zabbix Security Administration course is for you! In this course you will have the chance to learn and practice the following topics:

You can find more information about the Advanced Zabbix Security Administration course as well as the course pricing on the Zabbix website.

Questions

Q: Can we encrypt individual Zabbix components, or is it an all-or-nothing approach?

A: Connections between each component can be encrypted individually. You can, for example, encrypt only connections between Zabbix server and Zabbix proxies if you wish to do so. You can even leave connections between some proxies unencrypted, while others can be fully secured.

 

Q: Can I mix different Zabbix authentication approaches – for example, can I use LDAP together with internal authentication?

A: It is not only possible, but it’s also recommended. If you’re using LDAP authentication – always have a user with internal authentication configured for it. This will save you in situations where something goes wrong with the LDAP servers or connectivity.

The post Securing Zabbix 6.0 LTS by Kārlis Saliņš / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

New Agent 2 features in Zabbix 6.0 LTS by Aigars Kadiķis / Zabbix Summit Online 2021

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/new-agent-2-features-in-zabbix-6-0-lts-by-aigars-kadikis-zabbix-summit-online-2021/18929/

Zabbix Agent 2 has been developed to provide additional benefits to our users – From a larger set of supported metrics to metric collection logic improvements and simplified custom monitoring plugin development. Let’s look at what new features Zabbix Agent 2 will receive in Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

What is Zabbix agent?

First, let’s talk about the key benefits you gain with Zabbix agent and how it can add an additional layer of flexibility to your monitoring:

  • Zabbix Agent is a daemon that collects your metrics
  • Available on Windows and Unix-like systems
  • Rich capabilities out of the box
    • Natively supports the collection of a large set of OS-level metrics such as memory/CPU/storage/file statistics and much more
    • Provides native log monitoring capabilities
    • Can be extended
  • Select the direction of the communication between the Zabbix server and the Zabbix agent
    • Push the metrics to the Zabbix server with active checks
    • Let the Zabbix server poll the agent with passive checks
  • Control over the data collection interval
    • Ability to schedule checks and define flexible metric collection intervals
    • For example – You can collect metrics at a specific time or only during working hours

Why Zabbix agent 2?

Now that the key benefits of using a Zabbix agent are clear, let’s answer the question – why should I consider using Zabbix agent 2 instead of sticking with the classical agent?

The main goal of Zabbix Agent 2 is the ability to have a simple and flexible to extend the metric collection capabilities of the agent. This is true for both the internal development of new native Zabbix Agent 2 metrics and for custom Zabbix Agent 2 plugin development done by our community. We manage to achieve this goal by developing the Zabbix Agent 2 in GO. Less code, more flexibility, and a much more modular approach – all of this thanks to the GO language.

In addition to the aforementioned metric collection improvements, with Zabbix agent 2, we were also able to solve many ongoing design problems. The Zabbix agent 2 introduces improvements such as:

  • Support for check concurrency for active checks (this was not the case with the classical Zabbix Agent – active check metrics were collected one at a time)
  • Support for persistent data storage on the Agent side
  • Reduced number of TCP connections between the Zabbix agent 2 and Zabbix server
  • HTTPS web site checks out of the box on Windows
  • Concurrency support provides the ability to read multiple log files in parallel
  • Out of the box monitoring for many different applications

Let’s take a look at some of the more popular systems that Zabbix Agent 2 can monitor out of the box

Certificate monitoring

The ability to perform certificate monitoring out of the box has been a long-awaited feature. One of the more common requests was monitoring the certificate expiry date. With Zabbix agent 2, it is possible to perform certificate monitoring with a native Zabbix agent item:

Zabbix agent item key for certificate monitoring:

web.certificate.get[hostname,<port>,<IP>]

This item will return:

  • X.509 fields
  • Validation result
  • Fingerprint field

Example:

web.certificate.get[blog.zabbix.com,443]

This item will collect multiple certificate metrics in bulk. We can then obtain the necessary information by using the Zabbix dependent items. You can take a look and download the latest official template from our git page. The template already contains the necessary master/dependent items – all you have to do is import the template and apply it to your hosts.

IoT monitoring – MQTT

Zabbix Agent 2 is capable of performing IoT monitoring out of the box. Zabbix Agent 2 provides items for both MQTT and Modbus monitoring.

Below you can find an example of how the mqtt.get item can obtain metrics on specific MQTT topics:

mqtt.get["tcp://host:1883","path/to/topic"]
mqtt.get["tcp://host:1883","path/to/#"]

Zabbix Agent 2 is also officially supported on Raspberry Pi devices. This makes things even easier for IoT monitoring since we can simply deploy our Zabbix Agent 2 on a Raspberry Pi device in close proximity to our monitored IoT devices.

Out of the box database monitoring

With the classical agent, we had to resort to using user custom monitoring approaches for database monitoring. This was achieved either by using UserParameters, external scripts, or some other custom approach. With Zabbix agent 2, we provide native database monitoring for a large selection of SQL and NoSQL database engines.

You can find the official Zabbix database monitoring templates on our git page. 

Systemd monitoring

Another long-awaited feature is native systemd monitoring. Zabbix Agent 2 provides a flexible set of items and discovery rules with which you can monitor a specific systemd unit property, discover systemd services in an automated fashion and retrieve all of the systemd unit properties in bulk.

Discover a list of systemd units and their details:

systemd.unit.discovery[<type>]

Retrieve all properties of a systemd unit:

systemd.unit.get[unit name,<interface>]

Retrieve information about a specific property of a systemd unit:

systemd.unit.info[unit name,<property>,<interface>]

These items can then be used to define triggers like:

  • If service is scheduled at system bootup but not running right now, then generate a problem
  • If service is not scheduled at startup but running right now, notify us that we forgot to enable the service
  • and much more!

You can find more information about the official systemd template on our git page.

Docker monitoring

As with previous templates, the Zabbix Agent 2 docker monitoring also provides items for individual metrics and discovery rules for automated container discovery:

  • Discover all containers or only currently running containers automatically
  • Per container monitoring
    • CPU
    • Memory
    • Network

You can find more information about the official Docker template on our git page

Additional applications supported by Zabbix agent 2

And that’s not all! Zabbix Agent 2 provides out of the box monitoring for many other systems, like:

  • Ceph –  an open-source software storage platform
  • Memcached –  a general-purpose distributed memory-caching system
  • Smart – Self-Monitoring, Analysis, and Reporting Technology

If you’re interested in the full list of the official Zabbix templates, you can find all of them on our git page

Agent 2 plugins

The underlying Zabbix Agent 2 structure is based on GO plugins. This approach is used for both the official Zabbix Agent 2 items and should be used for the development of custom community extensions.

On startup, Zabbix agent 2 scans a specific directory and determines the supported interfaces per each plugin. Next, Zabbix will validate the existing plugin configuration and register each plugin in the aforementioned directory. Now we can begin the monitoring workflow. Once a metric has been requested, Zabbix agent 2 will check if the plugin responsible for collecting the particular metric is currently active. If it’s inactive – Agent 2 will check if the specific plugin supports the Runner interface and attempt to start it. Next, Agent 2 will check if the Configurator interface is available and perform the plugin configuration. Lastly, once the plugin is active, Agent 2 will collect the metric by using the Exporter interface. Next time the metric is requested – the plugin will already be active, and Agent 2 can immediately request the metric from the Exporter interface.

But is there a situation when a plugin can remain inactive – does it get unloaded after some time? The plugin does not stay loaded in memory indefinitely. If a plugin hasn’t received a request for 24 hours, the plugin will be deactivated, and it will get unloaded from the memory.

Loadable plugins

Let’s summarize the Zabbix Agent 2 plugin logic:

  • External plugins are loadable on Zabbix agent 2 startup, with no need to recompile Zabbix Agent 2
  • Connections bidirectionally to the plugins using Unix sockets on Linux and Named Pipes on Windows
  • Backward compatible with older plugins
  • The plugin is deactivated if:
    • any related passive item key has not been used for 24h
    • the active item is not in the active checklist
  • Custom plugin architecture remains the same as it was for the internal plugins
  • Separate repository for community plugins and integrations

Supported platforms for Agent 2

At this point, you may be wondering – what about compatibility? Can I use Zabbix Agent 2 as a replacement for the classical Zabbix Agent? Can it be used on the same platforms? Let’s take a look at the platforms on which you can deploy Zabbix Agent 2:

  • RHEL/CentOS 6,7,8
  • SLES 15 SP1+
  • Debian 9,10,11, Ubuntu 18.04, 20.04
  • Raspberry Pi OS, Ubuntu ARM64
  • Windows 7 and later, Windows Server 2008R2 and later

If you wish to deploy Agent 2 on a system that is not officially supported, the main takeaway is – GO environment needs to be supported on the system. This means that for Zabbix Agent 2 to run, you will have to provide a set of dependencies for GO language support. If that’s the case – you should be able to compile Zabbix Agent 2 on your system.

New Agent keys

Finally, let’s cover some new Zabbix agent item keys that are available in Zabbix 6.0 LTS. Since we don’t plan on halting the support for the classical Zabbix Agent, these item keys will be supported by both Zabbix Agent and Zabbix Agent 2.

Agent variant

  • agent.hostmetadata – obtains the agent metadata from the Zabbix agent configuration
  • agent.variant
    • Returns 1 for C agent – Zabbix agent
    • Returns 2 for Go agent – Zabbix agent 2

File properties

  • vfs.file.permissions – returns 4-digit string containing octal number with Unix permissions
  • vfs.file.owner – returns the user ownership of file
  • vfs.file.get – returns information about a file. Similar to the stat command result
  • vfs.dir.get – get information about directories and files
  • vfs.file.cksum – now with md5 and sha256
  • vfs.file.size – measure the file size bytes or in lines in the file

vfs.dir.get on Windows

Below is an example of how most .get item keys behave. Here we can see bulk information about the contents of a directory in a JSON array. This can then be used in low-level discovery to automatically monitor the parameters for each entity obtained by the vfs.dir.get item. Below is an example output of the vfs.dir.get key executed on Windows. Note that this is just a partial output – the real JSON file will most likely contain multiple such elements related to each of the files discovered in the directory.

[{
  "basename": "input.json",
  "pathname": "c:\\app1\\temp\\input.json",
  "dirname": "c:\\app1\\temp",
  "type": "file",
  "user": "AKADIKIS-840-G2\\aigars",
  "SID": "S-1-5-21-341453538-698488186-381249278-1001",
  "size": 2506752,
  "time": {
    "access": "2021-11-03T09:19:42.5662347+02:00",
    "modify": "2020-12-21T16:00:46+02:00",
    "change": "2020-12-29T12:20:10.0104822+02:00"
  },
  "timestamp": {
    "access": 1635923982,
    "modify": 1608559246,
    "change": 1609237210
  }
}]

vfs.file.get on Linux

As we can see, the output of vfs.file.get is also very similar to the previous get request. As I’ve mentioned before – the information here is similar to what the stat command provides.

{
  "basename": "passwd",
  "pathname": "/etc/passwd",
  "dirname": "/etc",
  "type": "file",
  "user": "root",
  "group": "root",
  "permissions": "0644",
  "uid": 0,
  "gid": 0,
  "size": 3348,
  "time": {
    "access": "2021-11-03T09:27:21+0200",
    "modify": "2021-10-24T13:18:18+0300",
    "change": "2021-10-24T13:18:18+0300"
},
"timestamp": {
    "access": 1635924441,
    "modify": 1635070698,
    "change": 1635070698
  }
}

More dimensions for discovery keys

The functionality of some of the existing keys has also been improved in Zabbix 6.0 LTS. For example, for vfs.fs.discovery and vfs.fs.get keys Zabbix will now also collect the file system label as the value of the {#FSLABEL} macro.

  • vfs.fs.discovery – will now retrieve an additional label value – {#FSLABEL}
  • vfs.fs.get – will now retrieve an additional label value – {#FSLABEL}
[{
  "{#FSNAME}": "C:",
  "{#FSTYPE}": "NTFS",
  "{#FSLABEL}": "System",
  "{#FSDRIVETYPE}": "fixed"
}]

Questions

Q: Can we run both of the agents at the same time – Zabbix Agent and Zabbix Agent 2?

A: Yes, both of the agents can be started on the same machine. All we have to do is adjust the listen port for one of the agents since, by default, both of them will try to listen on port 10050. Therefore, we need to switch that port to something else for one of the agents. You can also simply disable the passive checks for one of the agents, so it’s not listening for incoming connections at all – such an approach will also work.

 

Q: Can I use the Zabbix agent if I don’t have administrative privileges?

A: Yes, most definitely. You can run the agent under any other user both on Windows and Linux. Just make sure that the user has access to the information (logs, files, folders, for example) that the Zabbix agent needs to monitor.

 

Q: Are there any use cases where the classical C Zabbix agent is better than Zabbix agent 2?

A: First off, the binary size for the classical Zabbix agent is definitely smaller, so that’s one benefit. The Zabbix Agent 2 also has a more complex set of dependencies required to run it, so if for some reason we cannot provide the necessary GO dependencies for Zabbix agent 2, then the classical Zabbix agent is the way to go. In addition, if you’re using some kind of automation or orchestration tools to deploy Zabbix agents – having the same type of agent everywhere will make life easier for you, so that’s something else to take into account when pi

The post New Agent 2 features in Zabbix 6.0 LTS by Aigars Kadiķis / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Take advantage of Zabbix services online by Sergejs Sorokins / Zabbix Summit Online 2021

Post Syndicated from sersor original https://blog.zabbix.com/take-advantage-of-zabbix-services-online-by-sergejs-sorokins-zabbix-summit-online-2021/18716/

From Turnkeys to Upgrades and Training courses – Zabbix offers a wast selection of services to help you get the most value out of your Zabbix environment. In this blog post, we will take a look at the new and existing Zabbix online services and learn how they can benefit both Zabbix veterans and newcomers to the monitoring world.

Zabbix in public clouds

Zabbix images are available for the most popular cloud services providers, including  – AWS, Azure, Google Cloud Platform, and many others. Let’s talk about what applications and services are available for those of you, that choose to take this Zabbix deployment route.

  • Virtual Appliances of Zabbix server and Zabbix proxy
  • Technical support services can be obtained directly in the cloud platform
  • Some cloud platforms provide containers for Zabbix server, Zabbix proxy, and Zabbix agent components

What does it take to deploy Zabbix in a public cloud?

  • 2-5 minutes to deploy a fully functional Zabbix instance
  • Easy to scale up and scale down according to your requirements
  • Select a geographical region where you wish to deploy your Zabbix component
    • For example, you could deploy the Zabbix server in one region and deploy a set of proxies across multiple other regions
  • The prices are very affordable and flexible
    • Depending on the cloud provider, the price can go as low as 5$ per month for the virtual appliance of the Zabbix server

Zabbix technical support services

Zabbix Technical support services are one of the most popular services with our customers. Zabbix support services offer 5 tiers of support.

Starting from the Silver tier, which covers the basic customer requirements such as answering customer tech support questions, to the Enterprise and Global tiers, which include not only technical support services but also remote assistance, performance tuning, training, on-site visits, and more. Each customer can pick a support tier depending on their requirements and budget.

The technical support service provides multiple obvious and not so obvious benefits:

  • Support on demand
  • You will receive the answers to your questions strictly within your SLA
  • Insurance against “storms”
    • Whenever a major outage or issue arises in your infrastructure, Zabbix support can assist you with fixing the problems and ensuring that your Zabbix instance is in a healthy state
  • A way to influence the Zabbix road-map
    • Commercial customers of Zabbix have the advantage of a direct line of communication with Zabbix regarding their feature requests or potential bug reports

Zabbix professional training

We can divide the Zabbix training into two types of courses – core courses and one-day courses. Both of these types of courses are unique in the knowledge that they aim to deliver.

Core courses

The core courses are multi-day courses that range from the Zabbix Certified User course to Zabbix Certified Expert.

  • Zabbix Certified User course, which is aimed at Zabbix users who are not involved in configuring Zabbix, but they still need to access the Zabbix Dashboards and read metrics on a regular or a semi-regular basis
  • On the other hand, the Zabbix Certified Specialist course is aimed at Zabbix users that deal with managing and configuring Zabbix – configuring metric collection, problem thresholds, data processing, and more
  • The Zabbix Certified Professional course takes another step forward and is tailored for up-and-coming Zabbix power users. The course deals in deploying and managing advanced Zabbix infrastructures with automation and custom data collection in place
  • Lastly, the Zabbix Certified Expert course is aimed at potential Zabbix architects. By participating in this course users will learn about best practices when designing Zabbix infrastructures, the inner working of Zabbix processes, and all of the underlying logic that Zabbix performs when processing data and generating alerts

One-day courses

The goal of the Zabbix one-day courses is to take an in-depth look at a particular topic and provide all of the available knowledge on specific Zabbix features. These are one-day courses with a large focus on practical tasks. The content of these courses is unique. Therefore it doesn’t matter if you’re a newcomer or a season Zabbix expert – the information covered in the courses will be useful to you in any scenario.

Course availability

All of the aforementioned courses are widely available all over the world. Zabbix provides courses in different languages, for different time zones both online and on-site.

  • The courses are available in 11 languages
  • Currently, there are 8 types of courses with more to come down the line
  • 21 official training partners, providing Zabbix training in different languages and time zones
  • Available both online and on-site

What we covered here is just a small part of the overall service offering that Zabbix provides. In addition to the aforementioned services, you can visit the Zabbix Professional Services page to find a full list of Zabbix services – from consultancy hours to turnkey deployments and integration services. If you’re interested in any kind of assistance from Zabbix, feel free to get in touch with our Sales department – [email protected] and together we will tailor the best service offering for you.

Questions

Q: What pre-requisites are required for the attendees of Zabbix training courses?

A: For the Zabbix Certified User course there are no pre-requisites. As for the Zabbix Certified Specialist – some basic IT knowledge and at least surface-level ability to navigate the Linux operating system would be useful for the attendees. On the other hand, the Zabbix Certified Professional course requires the completion of the Zabbix Certified Specialist course. Similarly – the Zabbix Certified Expert course requires the completion of the Zabbix Certified Professional course.

 

Q: With the release of Zabbix 6.0 LTS right around the corner – should our users simply wait until Zabbix 6.0 LTS training is available or they can certify in Zabbix 5.0 LTS?

A: Since Zabbix 5.0 LTS courses are still available, I would look at what you plan to stick with for the foreseeable future. If there are no plans to upgrade to Zabbix 6.0 LTS, then the Zabbix 5.0 LTS course could be the best for you. On the other hand, if a Zabbix upgrade is already scheduled – then 6.0 LTS might just benefit you more.

 

Q: What If I have the Zabbix 5.0 LTS specialist certificate – will I be able to attend the Zabbix 6.0 LTS professional course or do I have to start over?

A: With the release of Zabbix 6.0 LTS we plan to introduce the Zabbix Certified Upgrade course. This will be a one-day course that will allow Zabbix 5.0 certified specialists and professionals to upgrade their certification to the Zabbix 6.0 certified specialist or professional.

The upgrade course has the extra benefit of preparing you for the Zabbix 6.0 LTS changes before you actually perform the jump to the Zabbix 6.0 LTS release. With the upgrade certificate in hand, you will be aware of all of the changes and new features that await you in the latest Zabbix LTS release.

 

The post Take advantage of Zabbix services online by Sergejs Sorokins / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Integrating Zabbix with your existing IT solutions by Aleksandrs Larionovs / Zabbix Summit Online 2021

Post Syndicated from Alexandrs Larionovs original https://blog.zabbix.com/integrating-zabbix-with-your-existing-it-solutions-by-aleksandrs-larionovs-zabbix-summit-online-2021/18671/

Zabbix 6.0 LTS comes packed with many new integrations and templates. As the total number of templates and integrations grows, we plan to make major improvements to our template repository. This will greatly improve the workflow of developing a new community template, submitting template pull requests, following the development process of a template, and much more.

The full recording of the speech is available on the official Zabbix Youtube channel.

What are integrations?

By definition, integrations are connections between systems and applications that work together as a whole to share information and data. In Zabbix, we separate integrations into two types:

  • Out-of-the-box templates
    • Templates contain items, triggers, graphs, and other entities that allow you to monitor any device, service, application, and other monitoring endpoints.
  • Webhook integrations
    • Webhooks allow you to send the information from Zabbix to any sort of 3rd party system like ITSM or messaging applications.

Where to find the latest integrations?

If you’re installing a fresh Zabbix instance, it will come pre-packaged with the latest set of official templates and webhooks. If you wish to download and import the integrations manually, you can find them in:

How do you benefit from integrations?

What are the benefits of using the official Zabbix integrations for you as a Zabbix user?

  • Monitor your endpoints in a tested and optimized manner
  • Monitor a large variety of 3rd party systems
  • Official templates come with a guarantee of quality and official support
  • Official templates provide quick deployment of monitoring logic for your monitoring endpoints

Having supported integrations can also be important from the standpoint of a vendor. Having an official Zabbix integration can provide multiple benefits for vendors:

  • Supported vendors get the ability to engage the Zabbix community
  • Collaborate with Zabbix and receive additional recognition
  • Provide higher quality monitoring support by collaborating in the development of the integration
  • Set higher monitoring standards for your product
  • Improve your public image

What if I wish to request a new official integration?

There are multiple ways how you can approach a scenario when there is no official integration for a specific product:

  • Option 1:
    • Create a ZBXNEXT ticket with your request at support.zabbix.com
    • Ask your friends and colleagues to vote on a request
    • If there is community interest in the integration – we will develop it!
  • Option 2:
  • Option 3:
    • Look for an unofficial community template

How are the official Zabbix integrations made?

Our first step in developing a new template is prioritizing which template should we tackle first. This includes looking at the current IT landscape and deciding which of the components are vastly considered as Essential services. Then, we look at the community requests and the number of votes behind each request. We also continuously work on improving the existing templates and evaluating the priority of the requested improvements. There is also the option to sponsor an integration by contacting our Sales department.

After we prioritize our list, we proceed to development – we do research, talk to community experts, create focus groups and proceed with the development. Once the development is finished, we proceed with validation – this includes internal reviews from the Integrations team as well as giving our colleagues from the Support team the chance to take a look at the newly developed template. Community feedback is also important for us – the feedback regarding the template can be left either in the comments under the specific feature request or in our Suggestions and Feedback forum section.

Community templates

While we pride ourselves on the rapid growth of our integrations team and the pace at which we have been delivering new official templates and integrations, we, of course, can’t instantly develop a template for every vendor and device out there. This is where our community has been of great help to us.

Moving from share.zabbix.com

Previously, if you were to find Zabbix lacking a template or an integration that you require, you would visit share.zabbix.com and look for a community solution to your problem. At this point, we have decided to migrate away from share.zabbix.com since, over the years, we have found it lacking in multiple aspects:

  • The website was hard to navigate
  • The underlying platform was outdated
  • Once uploaded, the templates were rarely maintained
  • It was hard to collaborate on templates
  • Lacking standardization – each template could use different naming conventions or metric collection approaches
  • Zombie templates – templates developed for old versions but never updated along the way.

Community template repository

The new go-to place for community templates will be our Community template repository. The repository will serve as a platform for collaboration. Once uploaded, the templates can be maintained by either the original developer or other community members. The platform will be moderated by the Integrations team, who will also provide feedback on the community templates to ensure a higher quality of the templates and additional validation. The documentation will also be generated for the community templates, containing the contents of each template – this way, you can have a transparent look at the template before downloading it.

The process

Let’s go through the whole process of developing and maintaining a community template.

1. Collaborate

  • For existing templates – you can start a discussion on Github issues to discuss issues or potential improvements on the template.
  • You can create a new bug report related to the template
  • For older community templates – you will be able to take over the maintenance of this template and continue improving it down the line
  • Develop and publish a new template or an integration

When it comes to community development, Zabbix does provide an official set of guidelines that the developers can follow to ensure that the template uses the official best practice conventions and approaches:

  • Naming conventions for templates and template entities
  • A set of best practices helps the community developers to simplify the decision making regarding best template and integration development approaches
  • Practical and ethical framework for template and integration development
  • This enables the community developers to follow the same set of development guidelines as the official Zabbix Integrations team

2. Pull request

Once you have decided to make a new integration or modify an existing integration, create a pull request describing the proposed changes and be ready to participate in a discussion related to the proposed set of changes. We will review and moderate the discussion and assist you in ensuring a smooth template development process.

3. Validation

The validation process consists of two parts. First, we will review if the template is valid, can be imported in Zabbix, and is usable by our community members. Next, the Integrations team will check if the template is developed according to the Zabbix standard and suggest any necessary changes.

4. Merge

If the validation process has been passed, we will accept the pull request and merge the integration into the repository. Afterward, the readme file for the integration will be generated. Finally, the template will be added to the template directory, and it will be available for everyone to see and download.

The Templates directory will have a similar structure to what you are used to in share.zabbix.com, so you should feel right at home here. We tried to check and migrate each and every valid template, but if you don’t find your template in the list – simply submit a pull request to us, and we will review it.

The generated Readme file will contain a list of entities included in the template – such as User macros, Template linkages, Discovery rules, Items, and more.

Where can I find the repository?

The Zabbix community template repository can be found in https://github.com/zabbix/community-templates. All you need to participate is a Github account and the willingness to participate in the integration development process.

To report an issue with the template repository or the official integrations, feel free to use our support portal: https://support.zabbix.com/

  • To report a bug – open a ZBX ticket
  • To suggest an improvement – open a ZBXNEXT ticket

For any discussions related to the Zabbix integrations, you can use (but are not limited to) the following channels:

Questions

Q: What is the workflow for our users that wish for Zabbix to develop new integration for a specific product?

A: We’re actively listening to our community. The best way to voice your request is to look for an existing feature request on https://support.zabbix.com/ and vote on it. If there is no such feature request – feel free to create it and vote on it. Thirdly, you can always contact our sales department and use our integration services to have the required template developed for you.

 

Q: Where can I see which integrations are currently developed or scheduled for the next release?

A: You can track the development process of a template by following the particular feature request in our support portal. You can also take a look at the official Zabbix roadmap and see what features, fixes, and integrations are currently scheduled for the upcoming Zabbix versions.

 

The post Integrating Zabbix with your existing IT solutions by Aleksandrs Larionovs / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

A guide to migrating to Zabbix 6.0 LTS by Edgars Melveris / Zabbix Summit Online 2021

Post Syndicated from Edgars Melveris original https://blog.zabbix.com/a-guide-to-migrating-to-zabbix-6-0-lts-by-edgars-melveris-zabbix-summit-online-2021/18569/

Upgrading to a new software version can be an intimidating process, especially if you are upgrading your Zabbix instance for the first time. In this blog post, we will take a look at the upgrade process itself, the necessary pre-requisites, and also what changes you can expect to the existing functionality when you’ve migrated to Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

Pre-upgrade checklist

Database versions

The first step before performing the upgrade to a new Zabbix version is ensuring that your underlying infrastructure is ready for the upgrade process. There are some changes in Zabbix that you should be aware of and address before the upgrade. One of these changes is the list of supported database engines and their versions for Zabbix 6.0 LTS:

  • MySQL/Percona 8.0.x
  • MariaDB 10.5.0 -10.6.x
  • PostgreSQL 13.x
  • Oracle 19c – 21c

And if you’re using PostgreSQL + TimescaleDB or Zabbix proxies:

  • TimescaleDB 2.0.1-2.3
  • SQLite 3.3.5 – 3.34.x

You may have noticed that we have increased the version requirements for the Zabbix backend databases. The reason for this is Zabbix utilizes the features that only these newer database versions provide, thus ensuring optimal Zabbix performance. If you’re using an unsupported database version, Zabbix will not start. There will be a configuration parameter to override this behavior, but that is not recommended since we cannot ensure that your Zabbix version will work without encountering any performance issues or crashes. A database upgrade to a supported version should be performed first before moving to Zabbix 6.0 LTS.

Supported operating systems

Zabbix supports all Linux distributions and many other Unix-like operating systems. Unfortunately, it is not feasible to provide Zabbix packages for each and every distribution out there. One of the major changes that were made back in Zabbix 5.2 – we no longer provide packages for RHEL/CentOS 7. This is because some of the libraries included in these distributions are outdated, and it becomes more and more complicated to build Zabbix on these OS versions. Though it is still possible to build Zabbix from sources if you provide the correct versions of the required libraries.

Some of the officially supported operating systems for Zabbix 6.0 LTS:

  • RHEL/CentOS/Oracle Linux 8
  • Ubuntu 18.04+
  • Debian 10+
  • SLES 12+

Other installation options

There are additional Zabbix deployment options:

  • Docker – all of the dependencies are already provided in the official docker images
  • Cloud image – the image includes all of the required dependencies
  • Zabbix appliance – All of the available Zabbix appliance images contain the required dependencies

Environment review

Before upgrading between major Zabbix versions, it is very much recommended to do an environment review and take a look at the pending maintenance tasks for our environment and also do a health check. Some of the things that we should consider before performing the upgrade to Zabbix 6.0 LTS:

  • Apply any required OS or DB upgrades before upgrading Zabbix and check for any issues before moving on
  • Check for any customizations in your installation – are there any DB schema changes? Any custom modules or patches?
    • The best way to test this is to make a copy of your existing Zabbix instance and test the upgrade in a QA environment.
  • Are the required packages available for all of the Zabbix components?
  • Are all proxies installed on the supported OS versions?
  • Check the documentation for any known issues in the version to which you are upgrading

Important changes that affect the upgrade process

There are some changes in Zabbix 6.0 LTS that could potentially affect the upgrade process or your existing Zabbix workflows.

API Changes

Below is a list of documentation pages related to API changes between versions 5.0 and 6.0:

  • https://www.zabbix.com/documentation/6.0/manual/api/changes_5.4_-_6.0
  • https://www.zabbix.com/documentation/current/manual/api/changes_5.2_-_5.4
  • https://www.zabbix.com/documentation/5.2/manual/api/changes_5.0_-_5.2

Some of the more important API changes:

  • Trigger and calculated/aggregated item syntax change introduced in Zabbix 5.4 also change the API calls responsible for creating triggers (ZBXNEXT-6451)
    • You will need to change the trigger syntax in your API calls to avoid any issues
  • For user.create and user.update methods the user_medias parameter was renamed to medias (ZBX-17955)
    • user_medias parameter is now deprecated
  • The type property is no longer supported for user.create, user.update, and user.get methods (ZBXNEXT-6148)
    • The type property is not supported for the user object since we now define it in user roles
  • Items no longer support applications. Applications have been replaced with tags (ZBXNEXT-2976)
  • Since value maps can no longer be defined globally, the valuemap.create and valuemap.get methods now require a hostid field (ZBXNEXT-5868)

Other important changes

There are a couple of important changes that users should be aware of when migrating to Zabbix 6.0 LTS:

  • Previously, trailing spaces in passwords – both when setting a password and entering it, were trimmed. This has been changed, and trailing spaces in passwords are no longer trimmed.
  • Global value maps that remain unused will be removed
  • Existing audit log records will be removed due to major changes in the audit log design.

Upgrade steps

Next,  let’s discuss the steps that you should take to perform the upgrade procedure in a correct and safe manner:

  • Backup your database, as well as any customizations (external scripts, alert scripts) and configuration files
  • Update the Zabbix server and Zabbix frontend
    • Once the new Zabbix server process is started, it will automatically check the database schema version and automatically upgrade it
    • Depending on the database size and the version from which you are migrating – this can take a while
    • Once the automatic database schema upgrade is done, the Zabbix server will be started automatically
  • Update your proxies. Proxies are required to have the same major version as the Zabbix server
  • Check if there are no issues and your Zabbix instance is up and running
    • Check if the metrics are being collected by your Zabbix server and Zabbix proxies
    • Check if the triggers are detecting any problems and if you’re receiving notifications about them

Backup

Let’s take a more in-depth look at the backup process and discuss the required steps with some examples:

  • Backup the database – methods depend on the DB type
    • In most cases, you can ignore history and trends tables – simply backing up only your configuration data
    • History and trends tables tend to be extremely large. That’s why the above approach is a lot faster
    • If at some point you are required to perform a restore from this backup, history, and trends tables will have to be manually recreated
  • Backup the Zabbix configuration files
  • Optionally – backup any custom alert scripts, external scripts, and any other customizations

Example MySQL database backup with history and trends tables ignored:

mysqldump -uroot -p --single-transaction --ignore-table=zabbix.history --ignore-table=zabbix.history_uint --ignoretable=zabbix.history_text --ignore-table=zabbix.history_log --ignore-table=zabbix.history_str --ignore-table=zabbix.trends --ignore-table=zabbix.trends_uint zabbix | gzip > zabbix_backup.sql.gz

Backing up the configuration

At the very least, you should back up the configuration files located in:

  • /etc/zabbix/*
  • external scripts from /usr/lib/zabbix/externalscripts/
  • alert scripts from /usr/lib/zabbix/alertscripts
  • /etc/httpd/conf.d/zabbix.conf
  • /etc/php-fpm.d/zabbix.conf

Upgrade process with docker

There are multiple approaches to running Zabbix in docker. For this example, we will assume that you’re running Zabbix server and Zabbix frontend in docker and using the official Zabbix docker images with a MySQL backend database and apache web backend.

Stop the Zabbix server, frontend, and proxy containers:

docker stop my-zabbix-server
docker stop my-zabbix-frontend

Start the Zabbix 6.0 LTS container and point it at the same backend database:

docker run --name my-zabbix-server-6.0 -e DB_SERVER_HOST="some-mysql-server" -e
MYSQL_USER="some-user" -e MYSQL_PASSWORD="some-password" -d zabbix/zabbixserver-mysql:6.0-latest

Once again, an automatic DB schema upgrade will be started

Lastly, start the Zabbix frontend container:

docker run --name my-zabbix-web-apache-6.0 -e DB_SERVER_HOST="some-mysqlserver" -e MYSQL_USER="some-user" -e MYSQL_PASSWORD="some-password" -e ZBX_SERVER_HOST= "my-zabbix-server-6.0" -d zabbix/zabbix-web-apache-mysql:6.0-latest

Upgrade process with Zabbix packages

Upgrading the main Zabbix components

If you’re using the official Zabbix packages, then the upgrade process will take a few more steps and can seem a bit more complicated. Let’s take a look at the required upgrade steps in detail. For our example, we will use a CentOS 8 OS distribution.

Install the Zabbix 6.0 LTS release package. This will add the necessary Zabbix 6.0 LTS repository information:

rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm

Clear the DNF package manager cache:

dnf clean all

Install all of the required packages:

dnf install zabbix-server-mysql zabbix-web zabbix-web-mysql zabbix-web-deps
zabbix-apache-conf zabbix-selinux-policy

Start Zabbix components and observe the log file. You should see that the database schema upgrade is in progress. Once it has finished, all of the internal Zabbix processes should be started without any issues:

17602:20210921:131335.333 completed 96% of database upgrade
17602:20210921:131335.355 completed 97% of database upgrade
17602:20210921:131335.379 completed 98% of database upgrade
17602:20210921:131335.606 completed 99% of database upgrade
17602:20210921:131335.711 completed 100% of database upgrade
17602:20210921:131335.711 database upgrade fully completed
17602:20210921:131335.804 server #0 started [main process]
17602:20210921:131335.808 server #2 started [configuration syncer #1]
17602:20210921:131335.810 server #1 started [service manager #1]

Upgrading the Zabbix proxies

In addition, we are also required to update our Zabbix proxies. The procedure is very similar to what we did in our previous steps:

Install the Zabbix 6.0 LTS release package:

rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm

Clear the DNF package manager cache:

dnf clean all

Update the Zabbix proxy packages:

dnf update zabbix-proxy-mysql(pgsql, sqlite3)

For MySQL, PostgreSQL, and Oracle proxy backend databases, the DB schema is performed automatically.

For Zabbix proxies using SQLite3 backend databases, automatic database schema upgrade is not supported. We will simply have to remove the old SQLite3 database file – it will then be automatically recreated once we start the Zabbix proxy.

rm –rf /tmp/proxy.sqlite

Post-upgrade tasks

After the upgrade to Zabbix 6.0 LTS, there are a few additional tasks that we should take care of. Let’s take a look at what needs to be done.

History table primary key

Zabbix 6.0 LTS backend database history table schema has been changed. These tables now contain primary keys. The upgrade or these history tables is not done automatically since it can cause additional downtime. Depending on the size of the database, executing the required changes can be extremely slow since every record in the history tables needs to be altered. In addition, duplicate entries in history tables could potentially cause this manual database schema upgrade to fail. There are multiple benefits to the history table schema changes:

  • All history tables will now have primary keys
  • Decreased history table storage size
  • Increased history table query performance
  • Not recommended when upgrading an existing instance

For new Zabbix 6.0 LTS installations, this change will be included by default, while for the existing installations, it is recommended to thoroughly test the history table schema change procedure and evaluate the potential downtimes. The exact history table upgrade steps will be documented with the release of Zabbix 6.0 LTS.

Check new processes

There are some new Zabbix processes that have been added to Zabbix 6.0 LTS that yous should be aware of:

  • StartHistoryPollers
    • The process responsible for handling calculated, aggregated, and internal checks requiring a database connection
    • The default value is 5. Consider increasing this number if you have many such items
  • If migrating from 4.0: StartLLDProcessors
    • Worker process for low-level discovery tasks
    • The default value is 2. Consider increasing if you have many low-level discovery rules.

Update the existing templates

If you’ve performed a Zabbix upgrade before, you will be aware of the fact that Zabbix does not update your existing templates automatically since we assume there could be some custom changes performed by the end-users on said templates. Therefore, to see the aforementioned new processes in the Zabbix server internal monitoring graphs, you should download and import the latest Zabbix server template.

You can download the templates from the official Zabbix git page. You can read the release notes to see the full list of the updated templates and changes that have been performed on said templates.

Update the Zabbix agents

You may also consider upgrading your Zabbix agents. This is not mandatory since Zabbix agents are backward compatible, so you can use an older version of Zabbix agents with Zabbix 6.0 LTS. All of the previous functionality will continue to function, but you may still consider updating the agents since the updates could contain some bug fixes or support for a brand new set of items.

Upgrading the Zabbix agent:

dnf install zabbix-agent

Upgrading the Zabbix agent 2:

dnf install zabbix-agent2

New Zabbix packages

You may have noticed that in Zabbix 6.0 LTS, there are multiple new packages. Most of these packages are repackagings of some of the old components for better package management, but there are exceptions:

  • zabbix-selinux-policy – basic SELinux policy for Zabbix
  • zabbix-sql-scripts – All of the .sql backend database scripts
    • These used to be a part of the zabbix-server package
    • This package is required to, for example, deploy the initial Zabbix database schema or data during the Zabbix install process.
  • zabbix-web-service – The service responsible for the scheduled report generation

Questions

Q: Are my custom templates going to be affected in any way by the upgrade process?

A: Yes, all of your templates will continue to work. Any changes that we have made to trigger syntax, for example, will be automatically applied to your existing entities.

 

Q: How long will the migration process take? How can I estimate the downtime?

A: Unfortunately, it’s impossible to estimate a precise downtime duration without creating a QA copy of your existing Zabbix instance with the same exact hardware and checking the downtime duration there with a test upgrade. At the end of the day, this will depend not only on the size of the database but also on the size of individual tables, the version from which you are upgrading, and how optimized your software and hardware are.

 

Q: What about migrating from a very old version – say Zabbix 3.0 or older?

A: It should work, but there can be some caveats and additional pre-requisites required for the older version upgrades. I would recommend going through our previous Summit recordings since we have covered the upgrade process for older versions in previous years. Those should provide you a pre-requisite checklist that you can perform before upgrading to Zabbix 6.0 LTS.

The post A guide to migrating to Zabbix 6.0 LTS by Edgars Melveris / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Top 10 reasons to migrate to Zabbix 6.0 LTS by Dmitry Krupornitsky / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/top-10-reasons-to-migrate-to-zabbix-6-0-lts-by-dmitry-krupornitsky-zabbix-summit-online-2021/18445/

Today we will take a look at the top 10 reasons to migrate to Zabbix 6.0 LTS. We will discuss features and changes included not only in Zabbix 6.0 LTS but also in intermediate major versions – Zabbix 5.2 and Zabbix 5.4.

The full recording of the speech is available on the official Zabbix Youtube channel.

High availability

With Zabbix 6.0 LTS, native support for Zabbix server high availability clusters is finally here. High availability setups can protect you from software and hardware failures and allow you to minimize downtime while performing maintenance tasks. Before Zabbix 6.0 LTS, users were required to use a dedicated piece of clustering software to enable high availability. Most users used a combination of Corosync + pacemaker software. This required additional knowledge related to these tools, to ensure a proper high availability cluster setup, configuration, maintenance, and other tasks related to managing your Zabbix high availability cluster. You could also use other 3rd party vendor solutions, but such solutions also require additional knowledge and in many cases incur additional licensing costs.

The native Zabbix server high availability cluster is an opt-in solution that provides high availability for the Zabbix server component. This solution consists of multiple Zabbix server instances – nodes, where each node is configured separately and uses the same database. Each node has two modes of operation – active or standby. Only a single node can be active at a time. The standby nodes do not perform any data collection, data processing, or any other Zabbix server activities. The standby nodes do not listen for connection on ports and have a minimal number of connections established to the Zabbix backend database. The high availability nodes are compatible with one another across different minor Zabbix server versions.

Learn how to deploy your own Zabbix server high availability cluster by following the steps provided in our Zabbix Summit blog post dedicated to this topic.

New Zabbix interface options

Zabbix 6.0 LTS provides multiple Zabbix interface improvements. One of the major changes that the users will notice when switching to Zabbix 6.0 LTS is the migration from screens to dashboards. The screens will be migrated to dashboards automatically during the upgrade. Dashboards consist of multiple highly customizable widgets, which can be placed on a dashboard with a click of a button. With Zabbix 6.0 LTS many new widgets will be available for different purposes – more flexible views of your metrics with the Single item value widget, a Geomap widget for a better overview of your infrastructure state, Top N/Bottom N views provide a whole new way to look at your metrics and more.

Now you will be able to save your favorite problem filters and access your filters in tabs for more simple filtering of the commonly accessed problem views.

Zabbix 6.0 LTS introduces timezone configuration on a per-user basis. Users can now have their preferred timezone configured via the user settings in the Zabbix frontend. The same is also true for language – this can also now be configured individually for each user.

The Zabbix frontend is now more customizable than ever. There are several ways in which you can customize your Zabbix frontend:

  • Replace the Zabbix logo with your company’s branding
  • Hide links to Zabbix support/integration pages
  • Set a custom help page link
  • Change the copyright notice in the footer of the frontend.

Implementing these changes requires customizing the underlying PHP code – we tried to make this as simple and accessible as possible, so you can quickly make the necessary changes yourself.

There are also many other Interface improvements, such as multi-page dashboards, third-level menus, graph improvements, and many others.

Improved security

Security is always something that we focus on when developing Zabbix. Zabbix 6.0 LTS brings many new security-related improvements and features:

  • User roles allow you to define roles with granular permissions related to the frontend access and the actions that each user role is permitted to perform
    • Roles are still based on user types – Zabbix User, Admin, Super admin, and user type restrictions still apply, but can be further customized per each role
    • User group to host group permissions (Read, Read/Write, Deny) still need to be used in combination with roles to ensure granular access to your data
    • For example, now we can define users that have access to host configuration but restrict access to other configuration sections.

In Zabbix 6.0 LTS it is possible to define custom password complexity requirements for Zabbix frontend logins. We can define password length/complexity policies and prohibit the usage of easy to guess common passwords.

The Zabbix API has also seen some security improvements. Now it is possible to generate a persistent API token for a particular user, define an expiration date and use the token in your API calls, without the need to regularly re-issue a new API token.

Zabbix 5.2 release also added the ability to store sensitive information in an external vault. As of the release of Zabbix 6.0 LTS, only HashiCorp Vault is supported, but CyberArk Vault support is also coming in Zabbix 6.2 release.

A set of architectural and structural measures have been taken to completely restructure the Zabbix Audit log. The updated Audit log entry contains records of all configuration changes made by the Zabbix server and Zabbix frontend. The new Audit log also contains additional filtering options, such as filtering Audit log entries based on the operation during which the changes were performed. The new Audit log is not only more detailed but also reworked with minimum performance impact in mind.

Scalability improvements

Many scalability improvements have been introduced between the Zabbix 5.0 LTS release and Zabbix 6.0 LTS release. These improvements not only improve the performance of existing Zabbix instances but also lay the groundwork for the design of upcoming features in later releases.

Previously, trend-based trigger functions would always use database queries to obtain the required data. Starting from Zabbix 5.4, a new type of cache – Trend function cache, has been introduced. This cache stores the results of calculated trend functions. When processing the trend functions, the Zabbix server will check the Trend function cache for the cached results. In case of failure, the Zabbix server will read the data from the database and cache the results.

The scalability improvements allow for better parallel data processing on Zabbix servers with heavy loads. Zabbix Instances with tens of thousands or more new values per second will greatly benefit from the improved performance.

The introduction of the graceful startup of the Zabbix server can help you improve performance and prevent unwanted downtimes, especially with large distributed environments. Whenever a Zabbix server gets started up after downtime, the existing Zabbix proxies start sending the data backlog to the Zabbix server. it is extremely important to maintain the stability and performance of the Zabbix server during this time window. Graceful startup improves the Zabbix server data backlog handling logic during such situations.

To prevent unwanted delays and other issues when using zabbix_get and zabbix_sender command-line tools, it is now possible to define a custom Timeout parameter for these tools.

Advanced business service monitoring

The new Busines service monitoring features allow Zabbix users to not only define complex service trees but also receive alerts in situations where the status of a business service has been changed. This is valuable to every user that wishes to monitor their business services, no matter how simple or complex the service is.

Combined with a large number of new and improved service status calculation rules. By defining custom service weights and advanced service status propagation rules, the business services can be defined in an extremely flexible fashion. Services are also not linked to individual triggers anymore, instead, we use tag-based service mapping to map our services to problem events.

The service functionality has also received scalability improvements. Zabbix can support the monitoring of over 100 000 business services. The scalability improvements have been implemented from both the UI/UX and the performance perspectives.

The old all-or-nothing business service permission approach has been redesigned to a granular read/write permissions for individual business services. This is not only an improvement from the security perspective, but also adds the ability to define services in a multi-tenant fashion, where each tenant has access only to the services that they own.

With the redesign of the business services, we have added the support for root cause analysis, allowing users to see the underlying problem which caused a particular service to change its state.

You can read more about Business service monitoring in our Zabbix Summit blog post dedicated to this topic.

Tag and template improvements

Item applications have been replaced with tags. This design decision adds consistency to filtering, mapping, grouping, and other tag-related functions when it comes to different Zabbix entities. Tags can also be used to provide additional information related to your entities in a manner that is much more flexible than it was with applications.

Universal template IDs introduced for each of the template elements allow you to define much more robust template management workflows, especially when you combine this with a CI/CD template management approach. These IDs are unique and can be used to match a particular template entity, such as item, trigger, graph, and so on. By utilizing the Universal template IDs, Zabbix now understands which entity we are trying to update, which entity no longer exists, whether it is a new entity or we are adjusting an existing entity. The default template export format is now YAML, though JSON and XML formats are still supported. This was done to improve the template management usability since the YAML format is more user-friendly and easier to edit manually. All of the official Zabbix templates available on the Zabbix git page have already been converted to the YAML format.

The redesign of the templates has also allowed us to improve the visualization of the changes made when importing a template. Now users can see the list of changes in a diff-like display and understand the impact that the template import will have on the  Zabbix entities.

Value maps have been moved to host and template levels. This is another design decision that we made to enable support for fully self-contained templates, that are easy to manage and deploy, and can be easily imported into different Zabbix environments. While global value maps might be easy to manage in small environments, this is not the case in larger environments, where different teams are working with a single or between multiple Zabbix instances. Therefore, the global value maps have been removed.

Reporting and visualization

With the addition of Scheduled reports functionality, any dashboard can now be converted into a scheduled report. While this feature was originally added in Zabbix 5.4, with the release of Zabbix 6.0 LTS and a set of new widgets, the reporting functionality has gained a lot of additional value that these widgets grant specifically from the reporting perspective. Users can create scheduled reports and receive them in their mailbox at a specific time either on a daily, weekly, monthly, or yearly basis. The time period for which the report will provide the information can also be selected.

The new Geographical map widget allows you to quickly deploy a geomap with an overview of the state of your infrastructure. The geomap widget supports filters, so we can display only a particular part of your infrastructure. Zabbix uses an open-source Javascript interactive maps library called Leaflet and supports multiple map providers such as OpenStreetMap, OpenTopoMap, USGS US Topo, and more. Users also have the ability to define and use a custom map tile provider. The map will display your infrastructure and also highlight any detected problems as well as display problem counters. This is a major step forward from the old approach, which required users to use the regular map functionality together with Zabbix API scripting, to provide information on a geographical map.

Advanced problem detection

Zabbix 5.4 release introduced a new unified syntax for defining trigger expressions, calculated, and aggregated items. There are multiple benefits that come with the new trigger syntax. First off – the syntax is now unified and can be used for defining triggers, calculated items, and providing values in maps or graph names. The syntax also has a more functional approach, instead of being object-oriented. This allows us to solve many complex use cases, for example dynamically calculate or aggregate a value from all hosts tagged with a specific tag or belonging to a specific host group. Aggregated item type has also been removed and users can now define aggregate checks under the calculated item type.

New monitoring functionality and integrations

As with every major release, Zabbix 6.0 LTS comes with a set of new items and improves the functionality of already existing items:

  • It is now possible to monitor SSL certificate validity and expiration data, such as the expiry date, issuer, version, subject, and more
  • New Zabbix Agent 2 metrics allow you to collect file owner information, file properties, extended interface info, extended TCP info, SHA2 hashes for files, and more
  • New templates for NGINX+, HPE/Dell servers, CISCO ASAv, Cloudflare

Finally – Zabbix 6.0 LTS

Many of our users and customers prefer sticking with the LTS releases instead of upgrading between each major version. As with every LTS release, there are major benefits to sticking with Zabbix 6.0 LTS:

  • LTS release receive thorough testing and full long term support
    • 3 years of full support – general, critical, and security fixes/improvements
    • 5 years of limited support – critical and security fixes

Questions

Q: Which of the current versions are still supported and for how long are they going to remain supported? What updates can we expect these versions to receive?

A: Currently we have three supported major versions available. Zabbix 5.4, which will not be supported after the release of Zabbix 6.0 LTS. We also still provide support for Zabbix 5.0 LTS and Zabbix 4.0 LTS. Zabbix 5.0 LTS will continue receiving full support until the middle of 2023 and limited support until the middle of 2025, while Zabbix 4.0 LTS will receive limited support until November 2023.

 

Q: Could you elaborate on how tags are more flexible than applications and are there any other benefits to using tags?

A: Zabbix already supports tags for most of the essential Zabbix objects, such as triggers, hosts, host prototypes, and templates. With the introduction of tags for items, tags can now be found everywhere. This way you can have tags that provide different additional information and assign values for your objects. Tags have several usages – for example, we can use them to mark events. If we have an item with a tag, this tag will mark any problem related to this item. Problem events will inherit tags from the whole tag chain – hosts, templates, triggers, items, and more. Further down the line, we can use our actions to react to specific tags. If you recall, Business services are also mapped to problems based on the tag mapping. Of course, tags can also be used for filtering and grouping different Zabbix objects.

 

Q: Is there a guideline to the migration process from an older version to Zabbix 6.0 LTS? Is there a change list that I can look at to see what other features have received an overhaul?

A: Regarding the upgrade itself – our documentation contains guidelines for both upgrading from packages and upgrading from sources. The documentation may also contain upgrade notes regarding any extra steps or precautions required when upgrading to a particular version. Regarding the feature changes – we recommend reading through the major version release notes. For example, if you’re upgrading from Zabbix 5.0 LTS to Zabbix 6.0 LTS, make sure to familiarize yourself not only with the Zabbix 6.0 LTS release notes, but also read through the Zabbix 5.2 and Zabbix 5.4 release notes, since changes introduced in these versions will also be a part of Zabbix 6.0 LTS.

The post Top 10 reasons to migrate to Zabbix 6.0 LTS by Dmitry Krupornitsky / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Build Zabbix Server HA Cluster in 10 minutes by Kaspars Mednis / Zabbix Summit Online 2021

Post Syndicated from Kaspars Mednis original https://blog.zabbix.com/build-zabbix-server-ha-cluster-in-10-minutes-by-kaspars-mednis-zabbix-summit-online-2021/18155/

With the native Zabbix server HA cluster feature added in Zabbix 6.0 LTS, it is now possible to quickly configure and deploy a multi-node Zabbix Server HA cluster without using any external tools. Let’s take a look at how we can deploy a Zabbix server HA cluster in just 10 minutes.

The full recording of the speech is available on the official Zabbix Youtube channel.

Why Zabbix needs HA

Let’s dive deeper into what high availability is and try to define what the term High availability entails:

  • A system runs in high availability mode if it does not have a single point of failure
  • A single point of failure is a component failure of which halts the whole system
  • Redundancy is a requirement in systems that use high availability. In our case, we need a redundant component to which we can fail-over in case if the currently active component encounters an issue.
  • The failover process needs to be transparent and automated

In the case of the Zabbix components, the single point of failure is our Zabbix server. Even though Zabbix in itself is very stable, you can still encounter scenarios when a crash happens due to OS level issues or something more trivial – like running out of disk space. If your Zabbix server goes down, all of the data collection, problem detection, and alerting is stopped. That’s why it’s important to have some form of high availability and redundancy for this particular Zabbix component.

How to choose HA for Zabbix

Before the addition of native HA cluster support in Zabbix 6.0 LTS it was possible to use 3rd party HA solutions for Zabbix. This caused an ongoing discussion – which 3rd party solution should I use and how should I configure it for Zabbix components? On top of this, you would also have a new layer of software that requires proper expertise to deploy, configure and manage. There are also cloud-based HA options, but most of the time these incur an extra cost.

Not having the required expertise for the 3rd party high availability tools can cause unwanted downtimes or, at worst, can cause inconsistencies in the Zabbix DB backend. Here are some of the potential scenarios that can be caused by a misconfigured high availability solution:

  • The automatic failover may not be configured properly
  • A split-brain scenario with two nodes running concurrently, potentially causing inconsistencies in the Zabbix database backend
  • Misconfigured STONITH (Shoot the other node in the head) scenarios – potentially causing both nodes to go down

Native Zabbix HA solution

Zabbix 6.0 LTS native high availability solution is easy to set up and all of the required steps are documented in the Zabbix documentation. The native solution does not require any additional expertise and will continue to be officially supported, updated, and improved by Zabbix. Native high availability solution doesn’t require any new software components – the high availability solution stores the information about the Zabbix server node status in the Zabbix database backend.

How Zabbix cluster works

To enable the native high availability cluster for our servers, we first need to start the Zabbix server component in the high availability mode. To achieve this, we need to look at the two new parameters in the /etc/zabbix/zabbix_server.conf configuration file:

  • HANodeName – specify an arbitrary name for your Zabbix server cluster node
  • ExternalAddress – specify the address of the cluster node

Once you have made the changes and added these parameters, don’t forget to restart the Zabbix server cluster nodes to apply the changes.

Zabbix HA Node name

Let’s take a look at the HANodeName parameter. This is the most important configuration parameter – it is mandatory to specify it if you wish to run your Zabbix server in the high availability mode.

  • This parameter is used to specify the name of the particular cluster mode
  • If the HANodeName is not specified, Zabbix server will not start in the cluster mode
  • The node name needs to be unique on each of your nodes

In our example, we can observe a two-node cluster, where zbx-node1 is the active node and zbx-node2 is the standby node. Both of these nodes will send their heartbeats to the Zabbix database backend every 5 seconds. If one node stops sending its heartbeat, another node will take over.

Zabbix HA Node External Address

The second parameter that you will also need to specify is the ExternalAddress parameter.

In our example, we are using the address node1.example.com. The purpose of this parameter is to let the Zabbix frontend know the address of the currently active Zabbix server since the Zabbix frontend component also constantly communicates with the Zabbix server component. If this parameter is not specified, the Zabbix frontend might not be able to connect to the active Zabbix server node.

Zabbix frontend setup

Seasoned Zabbix users might know that the Zabbix frontend has its own configuration file, which usually contains the Zabbix server address and the Zabbix server port for establishing connections from the Zabbix frontend to the Zabbix server. If you are using the Zabbix high availability cluster, then you will have to comment these parameters out since instead of being static, now they depend on the currently active Zabbix server node and will be obtained from the Zabbix backend database.

Putting it all together

In the above example, we can see that we have two nodes – zbx-node1, which is currently active and zbx-node2. These nodes can be reachable by using the external addresses – node1.example.com and node2.example.com for zbx-node1 and zbx-node2 respectively. We can see that we also have deployed multiple frontends. Each of these frontend nodes will connect to the Zabbix backend database, read the address of the currently active node and proceed to connect to that node.

Zabbix HA node types

Zabbix server high availability cluster nodes can have one of the following multiple statuses:

  • Active – The currently active node. Only one node can be active at a time
  • Standby – The node is currently running in standby mode. Multiple nodes can have this status
  • Shutdown – The node was previously detected, but it has been gracefully shut down
  • Unreachable – Node was previously detected but was unexpectedly lost without a shutdown. This can be caused by many different reasons, for example – the node crashing or having network issues

In normal circumstances, you will have an active node and one or more standby nodes. Nodes in shutdown mode are also expected if, for example, you’re performing some maintenance tasks on these nodes. On the other hand, if an active node becomes unreachable, this is when one of the standby nodes will take over.

Zabbix HA Manager

How can we check which node is currently active and which nodes are running in standby mode? First off, we can see this in the Zabbix frontend – we will take a look at this a bit later. We can also check the node status from the command line. On every node – no matter active or standby, you will see that the zabbix_server and ha manager processes have been started. The ha manager process is responsible for checking the high availability node status in the database every 5 seconds and is responsible for taking over if the active node fails.

On the other hand, the currently active Zabbix server node will have many other processes – data collector processes such as pollers and trappers, history and configuration syncers, and many other Zabbix child processes.

Zabbix HA node status

The System information widget has received some changes in Zabbix 6.0 LTS. It is now capable of displaying the status of your Zabbix server high availability cluster and its individual nodes.

The widget can display the current cluster mode, which is enabled in our example and provides a list of all cluster nodes. In our example, we can see that we have 3 nodes – 1 active node,1 stopped node, and 1 node running in standby mode. This way we can not only see the status of our nodes but also their names, addresses, and last access times.

Switching Zabbix HA node

The witching between nodes is done manually. Once you stop the currently active Zabbix server node, another node will automatically take over. Of course, you need to have at least one more node running in standby status, so it can take over from the failed active node.

How failover works?

All nodes report their status every 5 seconds. Whenever you shut down a node, it goes into a shutdown state and in 5 seconds another node will take over. But if a node fails the workflow is a bit different. This is where something called a failover delay is taken into account. By default, this failover delay is 1 minute. The standby node will wait for one minute for the failed active node to update its status and if in one minute the active node is still not visible, then the standby node will take over.

Zabbix cluster tuning

It is possible to adjust the failover delay by using the ha_set_failover_delay runtime command. The supported range of the failover delay is from 10 seconds to 15 minutes. In most cases the default value of 1 minute will work just fine, but there could be some exceptions and it very much depends on the specifics of your environment.

We can also remove a node by using the ha_remove_node runtime command. This command requires us to specify the ID of the node that we wish to remove.

Connecting agents and proxies

Connecting Zabbix agents to your cluster

Now let’s talk about how we can connect Zabbix agents and proxies to your Zabbix cluster. First, let’s take a look at the passive Zabbix agent configuration.

  • Passive Zabbix agents require all nodes to be written in the configuration file under the Server parameter
  • Nodes are specified in a comma-separated list

Once you specify the list of all nodes, the passive Zabbix agent will accept connections from all of the specified nodes.

What about the active Zabbix agents?

  • Active Zabbix agents require all nodes to be written in the configuration file under the ServerActive parameter
  • Nodes need to be separated by semicolons

Notice the difference – comma-separated list for passive Zabbix agents and nodes separated by semicolons for active Zabbix agents!

Connecting Zabbix proxies to your cluster

Proxy configuration is very similar to the agent configuration. Once again – we can have a proxy running either in passive mode or active mode.

For the passive Zabbix proxies, we need to list our cluster nodes under the Server parameter in the proxy configuration file. These nodes should be specified in a comma-separated list. This way the proxies will accept connections from any Zabbix server node. As for the active Zabbix proxies – we need once again to list our nodes under the Server parameter, but this time the node names will be separated by semicolons.

Conclusion – Setting up Zabbix HA cluster

Let’s conclude by going through all of the steps that are required to set up a Zabbix server HA cluster.

  • Start Zabbix server in high availability mode on all of your Zabbix server cluster nodes – this can be done by providing the HANodeName parameter in the Zabbix server configuration file
  • Comment out the $ZBX_SERVER and $ZBX_SERVER_PORT in the frontend configuration file
  • List your cluster nodes in the Server and/or ServerActive parameters in the Zabbix agent configuration file for all of the Zabbix agents
  • List your cluster nodes in the Server parameter for all of your Zabbix proxies
  • For other monitoring types, such as SNMP – make sure your endpoints accept connections from all of the Zabbix server cluster nodes
  • And that’s it – Enjoy!

Zabbix HA workshop and training

Wish to learn more about the Zabbix server high availability cluster and get some hands-on experience with the guidance of a Zabbix certified trainer? Take a look at the following options!

  • The Zabbix server high availability workshop will be hosted shortly after the release of Zabbix 6.0 LTS, which is currently planned for January 2022. One of the workshop sessions will be focused specifically on Zabbix server high availability cluster configuration and troubleshooting.
  • Zabbix Certified professional training course covers the Zabbix server HA cluster configuration and troubleshooting. This is also a great opportunity to discuss your own Zabbix use cases and infrastructure with a Zabbix certified trainer. Feel free to check out our Zabbix training page to learn more!

Questions

Q: What about the high availability for the Zabbix frontend? Is it possible to set it up?
A: This is already supported since Zabbix 5.2. All you have to do is deploy as many Zabbix frontend nodes as you require and don’t forget to properly configure the external address so the Zabbix frontends are able to connect to the Zabbix servers and that’s all!

Q: Does high availability cause a performance impact on the network or the Zabbix backend database?
A: No, this should not be the case. The heartbeats that the cluster nodes send to the database backend are extremely small messages that get recorded in one of the smaller Zabbix database tables, so the performance impact should be negligible.

Q: What is the best practice when it comes to migrating from a 3rd party solution such as PCS/Corosync/Pacemaker to the native Zabbix server high availability cluster? Any suggestions on how that can be achieved?
A: The most complex part here is removing the existing high availability solution without breaking anything in the existing environment. Once that is done, all you have to do is upgrade your Zabbix instance to Zabbix 6.0 LTS and follow the configuration steps described in this post. Remember, that if you’re performing an upgrade instead of a fresh install, the configuration files will not have the new configuration parameters so they will have to be added in manually.

Gaining new insights with Business service monitoring by Aleksandrs Petrovs-Gavrilovs / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/gaining-new-insights-with-business-service-monitoring-by-aleksandrs-petrovs-gavrilovs-zabbix-summit-online-2021/17973/

Zabbix 6.0 LTS comes with a complete redesign of the service monitoring. From improved business service scalability to advanced service status calculation logic and alerting. Let’s take a look at the Business Service monitoring feature and how you can use it to ensure full transparency for your business services.

The full recording of the speech is available on the official Zabbix Youtube channel.

Business services can be quite complex. They tend to consist of many different moving parts with redundancy and failover mechanism in place, all of which need to be taken into consideration when we wish to analyze the current status of our services.

BSM Checklist

Let’s take a look at what needs to be done so we can successfully define and monitor our business service:

  • First, we have to define what exactly is our business service and what components does it consist of?
  • We need to understand what are our expectations when it comes to service uptime. When should the service be up and running? What are the acceptable downtimes? Should it run 24/7/365 or maybe it’s a service that is critical only during our working hours?
  • Once we know what needs to be monitored, we need to make sure that we are collecting the data that reflects the status of different service components.
  • Finally – we have to find a suitable tool to track and measure our service.

Define your business

Let’s take a look at how a business may look like. As I mentioned before – business services can consist of many different components. Let’s take a look at an example of how business services may look like:

The tree structure here represents our Business services. We can see that we have classified the services into two branches – Internal services and User services. The User services consist of components such as Websites, Helpdesk services, Phones. These general services are based on lower-level components such as the actual physical phones for the phone service, underlying software for the Website and Helpdesk services, and so on.

This can make things quite complicated since usually, organizations will have many more components to take care of. That’s why, let’s see how we can simplify this tree and define our services in a more simple manner, like the service tree below:

Now we are left with only 3 levels for our services. Let’s take a look at how we can move this to Zabbix:

Here we can see a high-level view of our services. Once again we have our Internal services and User services. These here high-level services consist of child services and define what these components consist of and what their SLAs should be. We can also define tags to provide additional details to our services – which customer uses the service, the type of service, maybe even the location that the service is used in – this part is completely up to your imagination.

Once you have defined the services, their respective components and have linked them to the problems by using tags, you will finally be able to see the full picture. Zabbix will display not only the status of the service but also the root cause of the problem. This way we can provide service status information not only on the service owner level but also provide information that your technical staff can use to fix the issue.

Configuring SLAs

Configuring Business Service monitoring can be done from the MonitoringServices section. In Zabbix 6.0 LTS you are not required to start defining the service tree from the root service. Now you can define your own root level services. To create a service, all we have to do is switch to the Edit mode by clicking the Edit button in the upper right corner of the services screen and click the Create service button right next to it. We have also made some additional changes to the service section UI/UX. Now you also have multiple fast edit buttons next to each service. You can use them to Add a child service, edit an existing service, or delete an existing service.

Next, let’s take a look at the actual service creation steps.

  • We need to provide a name for our service
  • If the service is not a top-level service you have to select a parent service
  • Define problem tags. Problems tagged with the matching tags will affect the service status
  • Define the status calculation rule

Major improvements have been made to status calculation rules. We still support the old logic of the Use the most critical of child services / Most critical if all children have problems / Set status to ok, but there are also many advanced service status calculation rules.

  • Now we have the ability to select a specific status (Warning, Average, High, and so on) for our service in case of a problem
  • Select the number of children, More than/Less than N children, Percentage of children that should be affected for the parent service status change to take place
  • Define weights for child services and perform status changes based on the weight of the affected child services

Child services can also apply different propagation rules for the parent service

  • Child services can Increase or decrease the parent status service status by N severities, ignore the child service, apply a fixed status or apply the status depending on the problem severity

For our example let’s use an HA cluster use case. HA clusters consist of multiple nodes – for our example, we will use 3 nodes.

  • First, we define that the HA cluster consists of 3 nodes – 3 child services.
  • Each node will have equal weight – 1
  • On the parent service, we will define multiple status rules
    • If the weight of the child services is 1 (1 node is down) – the parent service will change its status to Warning
    • If the weight of the child services is 2 (2 nodes are down) – the parent service will change its status to Average
    • If the weight of the child services is 3 (all nodes are down) – the parent service will change its status to Disaster

In the above image, we can see how the corresponding status change will look like in the Services section. Note that we can also see the root cause of the parent service status change in the Root cause column.

We also have the ability to define the acceptable SLAs as well as SLA calculation uptime and downtime periods for our services. We have the option to define scheduled uptimes and downtimes, during which SLA should or shouldn’t be calculated (Such as weekends, for example), as well as one-time downtimes for one-time maintenance purposes.

Services can utilize tags to provide additional information about your services, such as the service type, service customer, service location, and more. On top of that, tags can also be used in the Service action condition logic, so you can define granular alerting logic for your service status changes.

The Child services tab allows you to quickly look at the related child services, their problem tags, and status calculation rules.

Child services can also be crosslinked between multiple parent services. This means that you don’t have to duplicate and recreate child services if they are used as a component of multiple parent services.

Track, solve and measure

Once we have configured our service, what remains is keeping track of our service statuses, SLAs and staying notified about service status changes and their root cause.

For this purpose, it is vital to secure access to our services. This is especially critical for MSPs, which may have multiple customers and each customer should have access only to the services related to that particular customer. To that end, the Roles section has also received an update related to the Service permissions. We can now define Read-Write and Read access to either specific services or services marked with a particular tag.

The Root cause section displays the root cause problems that affected the service status change. You will be able to click on the root cause problem and open it in the Problems section for further analysis of what caused your services to change their status and which host has been affected by it.

Previously I mentioned alerting on service status change, so let’s dig deeper into that. In Zabbix 6.0 LTS we have added a new type of action – Service actions. Zabbix can now react to service status changes and notify you when a service changes its status. The Service action conditions can analyze if a status has been changed on a particular service, a service that matches or contains a specific string in its name, tag, or tag value. If the conditions are true, Zabbix can send out an email, deliver a phone call or an SMS, create a ticket in your helpdesk system or perform any other alerting and notification workflow.

Many other BSM features are coming as we continue the development of Zabbix 6.0 LTS:

  • SLA graphical visualizations with support for over 100k services
  • Daily, Monthly, Weekly SLA reports
  • New service tree and SLA reporting widgets available from the dashboard
  • Service tree import and export
  • Impact analysis – see which service affects other related services in what way.

Questions

Q: Will the existing services be migrated to Zabbix 6.0 LTS?
A: The existing services will be migrated to Zabbix 6.0 LTS during the upgrade. All of the configuration for the existing services will stay intact after the migration.

Q: Does host maintenance suppress service calculation in Zabbix 6.0 LTS?
A: Host maintenance will not affect the service calculation. If you wish to define maintenance periods for your services –  use scheduled or one-time downtime options when configuring an individual service.

Q: How are the Fixed status and Ignore this service calculation rules going to work?
A: Fixed status services will not change their status no matter what happens to the child services – the service status will remain fixed. As for Ignore this service – the service status change will be ignored and will not affect the parent services.

What’s new in Zabbix 6.0 LTS by Artūrs Lontons / Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/whats-new-in-zabbix-6-0-lts-by-arturs-lontons-zabbix-summit-online-2021/17761/

Zabbix 6.0 LTS comes packed with many new enterprise-level features and improvements. Join Artūrs Lontons and take a look at some of the major features that will be available with the release of Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

If we look at the Zabbix roadmap and Zabbix 6.0 LTS release in particular, we can see that one of the main focuses of Zabbix development is releasing features that solve many complex enterprise-grade problems and use cases. Zabbix 6.0 LTS aims to:

  • Solve enterprise-level security and redundancy requirements
  • Improve performance for large Zabbix instances
  • Provide additional value to different types of Zabbix users – DevOPS and ITOps teams, Business process owner, Managers
  • Continue to extend Zabbix monitoring and data collection capabilities
  • Provide continued delivery of official integrations with 3rd party systems

Let’s take a look at the specific Zabbix 6.0 LTS features that can guide us towards achieving these goals.

Zabbix server High Availability cluster

With the release of Zabbix 6.0 LTS, Zabbix administrators will now have the ability to deploy Zabbix server HA cluster out-of-the-box. No additional tools are required to achieve this.

Zabbix server HA cluster supports an unlimited number of Zabbix server nodes. All nodes will use the same database backend – this is where the status of all nodes will be stored in the ha_node table. Nodes will report their status every 5 seconds by updating the corresponding record in the ha_node table.

To enable High availability, you will first have to define a new parameter in the Zabbix server configuration file: HANodeName

  • Empty by default
  • This parameter should contain an arbitrary name of the HA node
  • Providing value to this parameter will enable Zabbix server cluster mode

Standby nodes monitor the last access time of the active node from the ha_node table.

  • If the difference between last access time and current time reaches the failover delay, the cluster fails over to the standby node
  • Failover operation is logged in the Zabbix server log

It is possible to define a custom failover delay – a time window after which an unreachable active node is considered lost and failover to one of the standby nodes takes place.

As for the Zabbix proxies, the Server parameter in the Zabbix proxy configuration file now supports multiple addresses separated by a semicolon. The proxy will attempt to connect to each of the nodes until it succeeds.

Other HA cluster related features:

  • New command-line options to check HA cluster status
  • hanode.get API method to obtain the list of HA nodes
  • The new internal check provides LLD information to discover Zabbix server HA nodes
  • HA Failover event logged in the Zabbix Audit log
  • Zabbix Frontend will automatically switch to the active Zabbix server node

You can find a more detailed look at the Zabbix Server HA cluster feature in the Zabbix Summit Online 2021 speech dedicated to the topic.

Business service monitoring

The Services section has received a complete redesign in Zabbix 6.0 LTS. Business Service Monitoring (BSM) enables Zabbix administrators to define services of varying complexity and monitor their status.

BSM provides added value in a multitude of use cases, where we wish to define and monitor services based on:

  • Server clusters
  • Services that utilize load balancing
  • Services that consist of a complex IT stack
  • Systems with redundant components in place
  • And more

Business Service monitoring has been designed with scalability in mind. Zabbix is capable of monitoring over 100k services on a single Zabbix instance.

For our Business Service example, we used a website, which depends on multiple components such as the network connection, DB backend, Application server, and more. We can see that the service status calculation is done by utilizing tags and deciding if the existing problems will affect the service based on the problem tags.

In Zabbix 6.0 LTS there are many ways how service status calculations can be performed. In case of a problem, the service state can be changed to:

  • The most critical problem severity, based on the child service problem severities
  • The most critical problem severity, based on the child service problem severities, only if all child services are in a problem state
  • The service is set to constantly be in an OK state

Changing the service status to a specific problem severity if:

  • At least N or N% of child services have a specific status
  • Define service weights and calculate the service status based on the service weights

There are many other additional features, all of which are covered in our Zabbix Summit Online 2021 speech dedicated to Business Service monitoring:

  • Ability to define permissions on specific services
  • SLA monitoring
  • Business Service root cause analysis
  • Receive alerts and react on Business Service status change
  • Define Business Service permissions for multi-tenant environments

New Audit log schema

The existing audit log has been redesigned from scratch and now supports detailed logging for both Zabbix server and Zabbix frontend operations:

  • Zabbix 6.0 LTS introduces a new database structure for the Audit log
  • Collision resistant IDs (CUID) will be used for ID generation to prevent audit log row locks
  • Audit log records will be added in bulk SQL requests
  • Introducing Recordset ID column. This will help users recognize which changes have been made in a particular operation

The goal of the Zabbix 6.0 LTS audit log redesign is to provide reliable and detailed audit logging while minimizing the potential performance impact on large Zabbix instances:

  • Detailed logging of both Zabbix frontend and Zabbix server records
  • Designed with minimal performance impact in mind
  • Accessible via Zabbix API

Implementing the new audit log schema is an ongoing effort – further improvements will be done throughout the Zabbix update life cycle.

Machine learning

New trend functions have been added which utilize machine learning to perform anomaly detection and baseline monitoring:

  • New trend function – trendstl, allows you to detect anomalous metric behavior
  • New trend function – baselinewma, returns baseline by averaging data periods in seasons
  • New trend function – baselinedev, returns the number of standard deviations

An in-depth look into Machine learning in Zabbix 6.0 LTS is covered in our Zabbix Summit Online 2021 speech dedicated to machine learning, anomaly detection, and baseline monitoring.

New ways to visualize your data

Collecting and processing metrics is just a part of the monitoring equation. Visualization and the ability to display our infrastructure status in a single pane of glass are also vital to large environments. Zabbix 6.0 LTS adds multiple new visualization options while also improving the existing features.

  • The data table widget allows you to create a summary view for the related metric status on your hosts
  • The Top N and Bottom N functions of the data table widget allow you to have an overview of your highest or lowest item values
  • The single item widget allows you to display values for a single metric
  • Improvements to the existing vector graphs such as the ability to reference individual items and more
  • The SLA report widget displays the current SLA for services filtered by service tags

We are proud to announce that Zabbix 6.0 LTS will provide a native Geomap widget. Now you can take a look at the current status of your IT infrastructure on a geographic map:

  • The host coordinates are provided in the host inventory fields
  • Users will be able to filter the map by host groups and tags
  • Depending on the map zoom level – the hosts will be grouped into a single object
  • Support of multiple Geomap providers, such as OpenStreetMap, OpenTopoMap, Stamen Terrain, USGS US Topo, and others

Zabbix agent – improvements and new items

Zabbix agent and Zabbix agent 2 have also received some improvements. From new items to improved usability – both Zabbix agents are now more flexible than ever. The improvements include such features as:

  • New items to obtain additional file information such as file owner and file permissions
  • New item which can collect agent host metadata as a metric
  • New item with which you can count matching TCP/UDP sockets
  • It is now possible to natively monitor your SSL/TLS certificates with a new Zabbix agent2 item. The item can be used to validate a TLS/SSL certificate and provide you additional certificate details
  • User parameters can now be reloaded without having to restart the Zabbix agent

In addition, a major improvement to introducing new Zabbix agent 2 plugins has been made. Zabbix agent 2 now supports loading stand-alone plugins without having to recompile the Zabbix agent 2.

Custom Zabbix password complexity requirements

One of the main improvements to Zabbix security is the ability to define flexible password complexity requirements. Zabbix Super admins can now define the following password complexity requirements:

  • Set the minimum password length
  • Define password character requirements
  • Mitigate the risk of a dictionary attack by prohibiting the usage of the most common password strings

UI/UX improvements

Improving and simplifying the existing workflows is always a priority for every major Zabbix release. In Zabbix 6.0 LTS we’ve added many seemingly simple improvements, that have major impacts related to the “feel” of the product and can make your day-to-day workflows even smoother:

  • It is now possible to create hosts directly from MonitoringHosts
  • Removed MonitoringOverview section. For improved user experience, the trigger and data overview functionality can now be accessed only via dashboard widgets.
  • The default type of information for items will now be selected automatically depending on the item key.
  • The simple macros in map labels and graph names have been replaced with expression macros to ensure consistency with the new trigger expression syntax

New templates and integrations

Adding new official templates and integrations is an ongoing process and Zabbix 6.0 LTS is no exception here’s a preview for some of the new templates and integrations that you can expect in Zabbix 6.0 LTS:

  • f5 BIG-IP
  • Cisco ASAv
  • HPE ProLiant servers
  • Cloudflare
  • InfluxDB
  • Travis CI
  • Dell PowerEdge

Zabbix 6.0 also brings a new GitHub webhook integration which allows you to generate GitHub issues based on Zabbix events!

Other changes and improvements

But that’s not all! There are more features and improvements that await you in Zabbix 6.0 LTS. From overall performance improvements on specific Zabbix components, to brand new history functions and command-line tool parameters:

  • Detect continuous increase or decrease of values with new monotonic history functions
  • Added utf8mb4 as a supported MySQL character set and collation
  • Added the support of additional HTTP methods for webhooks
  • Timeout settings for Zabbix command-line tools
  • Performance improvements for Zabbix Server, Frontend, and Proxy

Questions and answers

Q: How can you configure geographical maps? Are they similar to regular maps?

A: Geomaps can be used as a Dashboard widget. First, you have to select a Geomap provider in the Administration – General – Geographical maps section. You can either use the pre-defined Geomap providers or define a custom one. Then, you need to make sure that the Location latitude and Location longitude fields are configured in the Inventory section of the hosts which you wish to display on your map. Once that is done, simply deploy a new Geomap widget, filter the required hosts and you’re all set. Geomaps are currently available in the latest alpha release, so you can get some hands-on experience right now.

Q: Any specific performance improvements that we can discuss at this point for Zabbix 6.0 LTS?

A: There have been quite a few. From the frontend side – we have improved the underlying queries that are related to linking new templates, therefore the template linkage performance has increased. This will be very noticeable in large instances, especially when linking or unlinking many templates in a single go.
There have also been improvements to Server – Proxy communication. Specifically – the logic of how proxy frees up uncompressed data. We’ve also introduced improvements on the DB backend side of things – from general improvements to existing queries/logic, to the introduction of primary keys for history tables, which we are still extensively testing at this point.

Q: Will you still be able to change the type of information manually, in case you have some advanced preprocessing rules?

A: In Zabbix 6.0 LTS Zabbix will try and automatically pick the corresponding type of information for your item. This is a great UX improvement since you don’t have to refer to the documentation every time you are defining a new item. And, yes, you will still be able to change the type of information manually – either because of preprocessing rules or if you’re simply doing some troubleshooting.

Zabbix 6.0 LTS – The next great leap in monitoring by Alexei Vladishev / Zabbix Summit Online 2021

Post Syndicated from Alexei Vladishev original https://blog.zabbix.com/zabbix-6-0-lts-the-next-great-leap-in-monitoring-by-alexei-vladishev-zabbix-summit-online-2021/17683/

The Zabbix Summit Online 2021 keynote speech by Zabbix founder and CEO Alexei Vladishev focuses on the role of Zabbix in modern, dynamic IT infrastructures. The keynote speech also highlights the major milestones leading up to Zabbix 6.0 LTS and together we take a look at the future of Zabbix.

The full recording of the speech is available on the official Zabbix Youtube channel.

Digital transformation journey
Infrastructure monitoring challenges
Zabbix – Universal Open Source enterprise-level monitoring solution
Cost-Effectiveness
Deploy Anywhere
Monitor Anything
Monitoring of Kubernetes and Hybrid Clouds
Data collection and Aggregation
Security on all levels
Powerful Solution for MSPs
Scalability and High Availability
Machine learning and Statistical analysis
More value to users
New visualization capabilities
IoT monitoring
Infrastructure as a code
Tags for classification
What’s next?
Advanced event correlation engine
Multi DC Monitoring
Zabbix Release Schedule
Zabbix Roadmap
Questions

Digital transformation journey

First, let’s talk about how Zabbix plays a role as a part of the Digital Transformation journey for many companies.

As IT infrastructures evolve, there are many ongoing challenges. Most larger companies for example have a set of legacy systems that require to be integrated with more modern systems. This results in a mix of legacy and new technologies and protocols. This means that most management and monitoring tools need to support all of these technologies – Zabbix is no exception here.

Hybrid clouds, containers, and container orchestration systems such as K8S and OpenShift have also played an immense part in the digital transformation of enterprises. It has been a very major paradigm shift – from physical machines to virtual machines, to containers and hybrid parts. We certainly must provide the required set of technologies to monitor such environments and the monitoring endpoints unique to them.

The rapid increase in the complexity of IT infrastructures caused by the two previous points requires our tools to be a lot more scalable than before. We have many more moving parts, likely located in different locations that we need to stay aware of. This also means that any downtime is not acceptable – this is why the high availability of our tools is also vital to us.

Let’s not forget that with increased complexity, many new potential security attack vectors arise and our tools need to support features that can help us with minimizing the security risks.

But making our infrastructures more agile usually comes at a very real financial cost. We must not forget that most of the time we are working with a dedicated budget for our tools and procedures.

Infrastructure monitoring challenges

The increase in the complexity of IT infrastructures also poses multiple monitoring challenges that we have to strive to overcome:

  • Requirements for scalability and high availability for our tools
    • The growing number of devices and networks as well as the increased complexity of IT infrastructures
  • Increasingly complex infrastructures often force us to utilize multiple tools to obtain the required metrics
    • This leads to a requirement for a single pane of glass to enable centralized monitoring
  • Collecting values is often not enough – we need to be able to leverage the collected data to gain the most value out of it
  • We need a solution that can deliver centralized visualization and reporting based on the obtained data
  • Our tools need to be hand-picked so that they can deliver the best ROI in an already complex infrastructure

Zabbix – Universal Open Source enterprise-level monitoring solution

Zabbix is a Universal free and Open Source enterprise-level monitoring solution. The tool comes at absolutely no cost and is available for everyone to try out and use. Zabbix provides the monitoring of modern IT infrastructures on multiple levels.

Universal is the term that we are focusing on. Given the open-source nature of the product, Zabbix can be used in infrastructures of different sizes – from small and medium organizations to large, globe-spanning enterprises. Zabbix is also capable of delivering monitoring of the whole IT stack – from hardware and network monitoring to high-level monitoring such as Business Service monitoring and more.

Cost-Effectiveness

Zabbix delivers a large set of enterprise-grade features at no cost! Features such as 2FA, Single sign-on solutions, no restrictions when it comes to data collection methods, number of monitored devices and services, or database size.

  • Exceptionally low total cost of ownership
    • Free and Open Source solution with quality and security in mind
    • Backed by reliable vendors, a global partner network, and commercial services, such as the 24/7 support
    • No limitations regarding how you use the software
    • Free and readily available documentation, HOWTOs, community resources, videos, and more.
    • Zabbix engineers are easy to find and hire for your organization
    • Cost is fully under your control – Zabbix Commercial services are under fixed-price agreements

Deploy Anywhere

Our users always have the choice of where and how they wish to deploy Zabbix. With official packages for the most popular operating systems such as RHEL, Oracle Linux, Ubuntu, Raspberry Pi OS, and more. With official Helm charts, you can quickly also deploy Zabbix in a Kubernetes cluster or in your OpenShift instance. We also provide official Docker container images with pre-installed Zabbix components that you can deploy in your environment.

We also provide one-click deployment options for multiple cloud service providers, such as Amazon AWS, Microsoft Azure, Google Cloud, Openstack, and many other cloud service providers.

Monitor Anything

With Zabbix, you can monitor anything – from legacy solutions to modern systems. With a large selection of official solutions and substantial community backing our users can be sure that they can find a suitable approach to monitor their IT infrastructure components. There are hundreds of ready-to-use monitoring solutions by Zabbix.

Whenever you deploy a new IT solution in your enterprise, you will want to tie it together with the existing toolset. Zabbix provides many out of the box integrations for the most popular ticketing and alerting systems

Recently we have introduced advanced search capabilities for the Zabbix integrations page, which allows you to quickly lookup the integrations that currently exist on the market. If you visit the Zabbix integrations page and look up a specific vendor or tool, you will see a list of both the official solutions supported by Zabbix and also a long list of community solutions backed by our users, partners, and customers.

Monitoring of Kubernetes and Hybrid Clouds

Nowadays many existing companies are considering migrating their existing infrastructure to either solutions such as Kubernetes or OpenShift, or utilizing cloud service providers such as Amazon AWS or Microsoft Azure.

I am proud to announce, that with the release of Zabbix 6.0 LTS, Zabbix will officially support out-of-the-box monitoring of OpenShfit and Kubernetes clusters.

Data collection and Aggregation

Let’s cover a few recent features that improve the out-of-the-box flexibility of Zabbix by a large margin.

Synthetic monitoring is a feature that was introduced a year ago in Zabbix version 5.2 and it has already become quite popular with our user base. The feature enables monitoring of different devices and solutions over the HTTP protocol. By using synthetic monitoring Zabbix can connect to your HTTP endpoints, such as cloud APIs, Kubernetes, and OpenShift APIs, and other HTTP endpoints, collect the metrics and then process them to extract the required information. Synthetic monitoring is extremely transparent and flexible – it can be fine-tuned to communicate with any HTTP endpoints.

Another major feature introduced in Zabbix 5.4 is the new trigger syntax. This enables our users to define much more flexible trigger expressions, supporting many new problem detection use cases. In addition, we can use this syntax to perform flexible data aggregation operations. For example, now we can aggregate data filtered by wildcards, tags, and host groups, instead of specifying individual items. This is extremely valuable for monitoring complex infrastructures, such as Kubernetes or cloud environments. At the same time, the new syntax is a lot more simple to learn and understand when compared to the old trigger syntax.

Security on all levels

Many companies are concerned about security and data protection when it comes to the tools that they are using in their day-to-day tasks. I’m happy to tell you that Zabbix follows the highest security standards when it comes to the development and usage of the product.

Zabbix is secure by design. In the diagram below you can see all of the Zabbix components, all of which are interconnected, like Zabbix Agent, Server, Proxy, Database, and Frontend. All of the communication between different Zabbix components can be encrypted by using strong encryption protocols like TLS.

If you’re using Zabbix Agent, the agent does not require root privileges. You can run Zabbix Agent under a normal user with all of the necessary user level restrictions in place. Zabbix agent can also be restricted with metric allow and deny lists, so it has access only to the metrics which are permitted for collection by your company policies.

The connections between the Zabbix database backend and the Zabbix Frontend and Zabbix Server also support encryption as of version 5.0 LTS.

As for the frontend component – users can add an additional security layer for their Zabbix frontends by configuring 2FA and SSO logins. Zabbix 6.0 LTS also introduces flexible login password complexity requirements, which can reduce the security breach risk if your frontend is exposed to the internet. To ensure that Zabbix meets the highest standards of the company security compliance, the new Audit log, introduced in Zabbix 6.0 LTS, is capable of logging all of the Zabbix Frontend and Zabbix Server operations.

For an additional security layer – sensitive information like Usernames, Passwords, API keys can be stored in an external vault. Currently, Zabbix supports secret storage in the HashiCorp Vault. Support for the CyberArk vault will be added in the Zabbix 6.2 release.

Another Zabbix feature – the Zabbix API, is often used for the automation of day-to-day configuration workflows, as well as custom integrations and data migration tasks. Zabbix 5.4 added the ability to create API tokens for particular frontend users with pre-defined token expiration dates.

In Zabbix 5.2 we added another layer for the Zabbix Frontend user permissions – User Roles. Now it is possible to define granular user roles with different types of rights and privileges, assigned to specific types of users in your organization. With User Roles, we can define which parts of the Zabbix UI the specific user role has access to and which UI actions the members of this role can perform. This can be combined with API method restrictions which can also be defined for a particular role.

Powerful Solution for MSPs

When we combine all of these features, we can see how Zabbix becomes a powerful solution for MSP customers. MSPs can use Zabbix as an added value service. This way they can provide a monitoring service for their customers and get additional revenue out of it. It is possible to build a customer portal which is a combination of User Roles for read-only access to dashboards and customized UI, rebranding option – which was just introduced in Zabbix 6.0 LTS, and a combination of SLA reporting together with scheduled PDF reports, so the customers can receive reports on a weekly, daily or monthly basis.

Scalability and High Availability

With a growing number of devices and ever-increasing network complexity, Scalability and High availability are extremely important requirements.

Zabbix provides Load balancing options for Zabbix UI and Zabbix API. In order to scale the Zabbix Frontend and Zabbix API, we can simply deploy additional Zabbix Frontend nodes, thus introducing redundancy and high availability.

Zabbix 6.0 LTS comes with out-of-the-box support for the Zabbix Server High Availability cluster. If one of the Zabbix Server nodes goes down, Zabbix will automatically switch to one of the standby nodes. And the best thing about the Zabbix Server High Availability cluster – it takes only 5 minutes to get it up and running. the HA cluster is very easy to configure and use.

One of the features in our future roadmap is introducing support for the History API to work with different time-series DB backends for extra efficiency and scalability. Another feature that we would like to implement in the future is load balancing for Zabbix Servers and Zabbix Proxies. Combining all of these features would truly make Zabbix a cloud-native application with unlimited horizontal scalability.

Machine learning and Statistical analysis

Defining static trigger thresholds is a relatively simple task, but it doesn’t scale too well in dynamic environments. With Machine Learning and Statistical Analysis, we can analyze our data trends and perform anomaly detection. This has been greatly extended in Zabbix 6.0 LTS with Anomaly Detection and Baseline Monitoring functionality.

Zabbix 6.0 Adds an extended set of functions for trend analysis and trend prediction. These support multiple flexible parameters, such as the ability to define seasonality for your data analysis. This is another way how to get additional insights out of the data collected by Zabbix

More value to users

When I think about the direction that Zabbix is headed in, and look at the Zabbix roadmap, one of the main questions I ask is “How can we deliver more value to our enterprise users?”

In Zabbix 6.0 LTS we made some major steps to make Zabbix fit not only for infrastructure monitoring but also fit for Business Service monitoring – the monitoring of services that we provide for our end-users or internal company users. Zabbix 6.0 LTS comes with complex service level object definitions, real-time SLA reporting, multi-tenancy options, Business Service alerting options, and root cause and Impact analysis.

New visualization capabilities

It is important to present the collected data in a human-readable way. That’s why we invest a lot of time and effort in order to improve the native visualization capabilities. In Zabbix 6.0 LTS we have introduced Geographical Maps together with additional widgets for TOP N reporting and templated and multi-page dashboards.

The introduction of reports in Zabbix 5.2 allowed our users to leverage their Zabbix Dashboards to generate scheduled PDF reports with respect to user permissions. Our users can generate daily, weekly, monthly or yearly reports and send them to their infrastructure administrators or customers.

IoT monitoring

With the introduction of support for Modbus and MQTT protocols, Zabbix can be used to monitor IoT devices and obtain environmental information from different sensors such as temperature, humidity, and more. In addition, Zabbix can now be used to monitor factory equipment, building management systems, IoT gateways, and more.

Infrastructure as a code

With IT infrastructures growing in scale, automation is more important than ever. For this reason, many companies prefer preserving and deploying their infrastructure as code. With the support of YAML format for our templates, you can now keep them in a git repository and by utilizing CI/CD tools you can now deploy your templates automatically.

This enables our users to manage their templates in a central location – the git repository, which helps users to perform change management and versioning and then deploy the template to Zabbix by using CI/CD tools.

Tags for classification

Over the past few versions, we have made a major push to support tags for most Zabbix entities. The switch from applications to tags in Zabbix 5.4 made the tool much more flexible. Tags can now be used for the classification of items, triggers, hosts, business services. The tags that the users define can also be used in alerting, filtering, and reporting.

What’s next?

You’re probably wondering – what’s coming next? What are the main vectors for the future development of Zabbix?

First off – we will continue to invest in usability. While the tool is made by professionals for professionals, it is important for us to make using the tool as easy as possible. Improvements to the Zabbix Frontend, general usability, and UX can be expected very soon.

We plan to continue to invest in the visualization and reporting capabilities of Zabbix. We want all data collected by our monitoring tool to provide information in a single pane of glass. This way our users can see the full picture of their environment while also seeing the root cause analysis for the ongoing problems that we face. This way we can get most of the data that Zabbix collects.

Extending the scope of monitoring is an ongoing process for us. We would like to implement additional features for compliance monitoring. I think that we will be able to introduce a solution for application performance monitoring very soon. We’d like to make log monitoring more powerful and comprehensive. monitoring of public and private clouds is also very important for us, given the current IT paradigms.

We’d like to make sure that Zabbix is absolutely extendable on all levels. While we can already extend Zabbix with different types of plugins, webhooks, and UI modules there’s more to come in the near future.

The topic of high availability, scalability, and load balancing is extremely important to us. We will continue building on the existing foundations to make Zabbix a truly cloud-native solution.

Advanced event correlation engine

Advanced event processing is a really important topic. When we talk about a monitoring solution, we pay very much attention to the number of metrics that we are collecting. We mustn’t forget, that for large-scale environments the number of events that we generate based on those metrics is also extremely important. We need to keep control and manage the ever-growing number of different events coming from different sources. This is why we would like to focus on noise reduction, specifically – root cause analysis.

For this reason, we can expect Zabbix to introduce an advanced event correlation model in the future. This model should have the ability to filter and deduplicate the events as well as perform event enrichment, thus leading to a much better root cause analysis.

Multi DC Monitoring

Currently, Multi DC monitoring can be done with Zabbix by deploying a distributed Zabbix instance that utilizes Zabbix proxies. But there are use cases, where it would be more beneficial to have multiple Zabbix servers deployed across different datacenters – all reporting to a single location for centralized event processing, centralized visualization, and reporting as well as centralized dashboards. This is something that is coming soon to Zabbix.

Zabbix Release Schedule

Of course, the burning question is – when is Zabbix 6.0 LTS going to be released? And we are very close to finalizing the next LTS release. I would expect Zabbix 6.0 LTS to be officially released in January 2022.

As for Zabbix 6.2 and 6.4 – these releases are still planned for Q2 and Q4, 2022. The next LTS release – Zabbix 7.0 LTS is planned to be released in Q2, 2023.

Zabbix Roadmap

If you want to follow the development of Zabbix – we have a special page just for that – the Zabbix Roadmap. Here you can find up-to-date information about the development plans for Zabbix 6.2, 6.4, and 7.0 LTS. The Roadmap also represents the current development status of Zabbix 6.0 LTS.

Questions

Q: What would you say is the main benefit of why users should migrate from Zabbix 5.0/4.0 or older versions to 6.0 LTS?

A: I think that Zabbix 6.0 LTS is a very different product – even when you compare it with the relatively recent Zabbix 5.0 LTS. It comes with many improvements, some of which I mentioned here in my keynote. For example, Business Service monitoring provides huge added value to enterprise customers.

With the new trigger syntax and the new functions related to anomaly detection and baseline monitoring our users can get much more out of the data that they already have in their monitoring tool.

The new visualization options – multiple new widgets, geographical maps, scheduled PDF reporting provide a lot of added value to our end-users and to their customers as well.

Q: Any plans to make changes on the Zabbix DB backend level – make it more scaleable or completely redesign it?

A: Right now we keep all of our information in a relational database such as MySQL or PostgreSQL. We have added the support for TimescaleDB which brings some huge advantages to our users, thanks to improved data storage and performance efficiency.

But we still have users that wish to connect different storage engines to Zabbix – maybe specifically optimized to keep time-series data. Actually, this is already on our roadmap. Our plan is to introduce a unified API for historical data so that if you wish to attach your own storage, we just have to deploy a plugin that will communicate both with our historical API and also talk to the storage engine of your choosing. This feature is coming and is already on our Roadmap.

Q: What is your personal favorite feature? Something that you 100% wanted to see implemented in Zabbix 6.0 LTS?

A: I see Zabbix 6.0 LTS as a combination of Zabbix 5.2, 5.4, and finally the features introduced directly in Zabbix 6.0 LTS. Personally, I think that my favorite features in Zabbix 6.0 LTS are features that make up the latest implementation of Anomaly detection.

We could be at the very beginning of exploring more advanced machine learning and statistical analysis capabilities, but I’m pretty sure that with every new release of Zabbix there will be new features related to machine learning, anomaly detection, and trend prediction.

This could provide a way for Zabbix to generate and share insights with our users. Analysis of what’s happening with your system, with your metrics – how the metrics in your system behave.

Summary of Zabbix Summit Online 2021, Zabbix 6.0 LTS release date and Zabbix Workshops

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/summary-of-zabbix-summit-online-2021-zabbix-6-0-lts-release-date-and-zabbix-workshops/17155/

Now that the Zabbix Summit Online 2021 has concluded, we are thrilled to report we hosted attendees from over 3000 organizations from more than 130 countries all across the globe.

This year, the main focus of the speeches was the upcoming Zabbix 6.0 LTS release, as well as speeches focused on automating Zabbix data collection and configuration, Integrating Zabbix within existing company infrastructures, and migrating from legacy tools to Zabbix. 21 speakers in total presented their use cases and talked about new Zabbix features during the Summit with over 8 hours of content.

In case you missed the Summit or wish to come back to some of the speeches – both the presentations (in PDF format) and the videos of the speeches are available on the Zabbix Summit Online 2021 Event page.

Zabbix 6.0 LTS release date

As for Zabbix 6.0 LTS – as per our statement during the event, you can expect Zabbix 6.0 LTS to release in early 2022. At the time of this post, the latest pre-release version is Zabbix 6.0 Alpha 7, with the first Beta version scheduled for release VERY soon. Feel free to deploy the latest pre-release version and take a look at features such as Geomaps, Business Service monitoring, improved Audit log, UX improvements, Anomaly detection with Machine Learning, and more! The list of the latest released Zabbix 6.0 versions as well as the improvements and fixes they contain is available in the Release notes section of our website.

Zabbix 6.0 LTS Workshops

The workshops will focus on particular Zabbix 6.0 LTS features and will be available once the Zabbix 6.0 LTS is released. The workshops will provide a unique chance to learn and practice the configuration of specific Zabbix 6.0 LTS features under the guidance of a certified Zabbix trainer at absolutely no cost! Some of the topics covered in the workshops will include – Deploying Zabbix server HA cluster, Creating triggers for Baseline monitoring and Anomaly detection, Displaying your infrastructure status on Geomaps, Deploying Business Service monitoring with root cause analysis, and more!

Upcoming events

But there’s more! On December 9 2021 Zabbix will host PostgreSQL Monitoring Day with Zabbix & Postgres Pro. The speeches will focus on monitoring PostgreSQL databases, running Zabbix on PostgreSQL DB backends with TimescaleDB, and securing your Zabbix + PostgreSQL instances. If you’re currently using PostgreSQL DB backends r plan to do so in the future – you definitely don’t want to miss out!

As for 2022 – you can expect multiple meetups regarding Zabbix 6.0 LTS features and use cases, as well as events focused on specific monitoring use cases. More information will be publicly available with the release of Zabbix 6.0 LTS.

Zabbix 6.0 LTS at Zabbix Summit Online 2021

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/zabbix-6-0-lts-at-zabbix-summit-online-2021/16115/

With Zabbix Summit Online 2021 just around the corner, it’s time to have a quick overview of the 6.0 LTS features that we can expect to see featured during the event. The Zabbix 6.0 LTS release aims to deliver some of the long-awaited enterprise-level features while also improving the general user experience, performance, scalability, and many other aspects of Zabbix.

Native Zabbix server cluster

Many of you will be extremely happy to hear that Zabbix 6.0 LTS release comes with out-of-the-box High availability for Zabbix Server. This means that HA will now be supported natively, without having to use external tools to create Zabbix Server clusters.

The native Zabbix Server cluster will have a speech dedicated to it during the Zabbix Summit Online 2021. You can expect to learn both the inner workings of the HA solution, the configuration and of course the main benefits of using the native HA solution. You can also take a look at the in-development version of the native Zabbix server cluster in the latest Zabbix 6.0 LTS alpha release.

Business service monitoring and root cause analysis

Service monitoring is also about to go through a significant redesign, focusing on delivering additional value by providing robust Business service monitoring (BSM) features. This is achieved by delivering significant additions to the existing service status calculation logic. With features such as service weights, service status analysis based on child problem severities, ability to calculate service status based on the number or percentage of children in a problem state, users will be able to implement BSM on a whole new level. BSM will also support root cause analysis – users will be informed about the root cause problem of the service status change.

All of this and more, together with examples and use cases will be covered during a separate speech dedicated to BSM. In addition, some of the BSM features are available in the latest Zabbix 6.0 LTS alpha release – with more to come as we continue working on the Zabbix 6.0 release.

Audit log redesign

The Audit log is another existing feature that has received a complete redesign. With the ability to log each and every change performed both by the Zabbix Server and Zabbix Frontend, the Audit log will become an invaluable source of audit information. Of course, the redesign also takes performance into consideration – the redesign was developed with the least possible performance impact in mind.

The audit log is constantly in development and the current Zabbix 6.0 LTS alpha release offers you an early look at the feature. We will also be covering the technical details of the new audit log implementation during the Summit and will explain how we are able to achieve minimal performance impact with major improvements to Zabbix audit logging.

Geographical maps

With Geographical maps, our users can finally display their entities on a geographical map based on the coordinates of the entity. Geographical maps can be used with multiple geographical map providers and display your hosts with their most severe problems. In addition, geographical maps will react dynamically to Zoom levels and support filtering.

The latest Zabbix 6.0 Alpha release includes the Geomap widget – feel free to deploy it in your QA environment, check out the different map providers, filter options and other great features that come with this widget.

Machine learning

When it comes to problem detection, Zabbix 6.0 LTS will deliver multiple trend new functions. A specific set of functions provides machine learning functionality for Anomaly detection and Baseline monitoring.

The topic will be covered in-depth during the Zabbix Summit Online 2021. We will look at the configuration of the new functions and also take a deeper dive at the logic and algorithms used under the hood.

During the Zabbix Summit Online 2021, we will also cover many other new features, such as:

  • New Dashboard widgets
  • New items for Zabbix Agent
  • New templates and integrations
  • Zabbix login password complexity settings
  • Performance improvements for Zabbix Server, Zabbix Proxy, and Zabbix Frontend
  • UI and UX improvements
  • Zabbix login password complexity requirements
  • New history and trend functions
  • And more!

Not only will you get the chance to have an early look at many new features not yet available in the latest alpha release, but also you will have a great chance to learn the inner workings of the new features, the upgrade and migration process to Zabbix 6.0 LTS and much more!

We are extremely excited to share all of the new features with our community, so don’t miss out – take a look at the full Zabbix Summit online 2021 agenda and register for the event by visiting our Zabbix Summit page, and we will see you at the Zabbix Summit Online 2021 on November 25!

Interview with Zabbix Summit Online 2021 speaker: Brian van Baekel

Post Syndicated from Jekaterina Sizova original https://blog.zabbix.com/interview-with-zabbix-summit-online-2021-speaker-brian-van-baekel/16174/

We continue to introduce you to the speakers of Zabbix Summit Online 2021. Our next guest is Brian van Baekel – a known Zabbix evangelist and trainer who has educated hundreds of students on all the nuances of working with our monitoring system.

Hi Brian, you are often seen on Zabbix Blog – mainly as the author of technical posts. Tell us, where did you get so much practical experience in using Zabbix?

Well, I started with Zabbix in 2013 and have been working with it ever since. During those years I’ve seen so many different environments. Each with their own challenges. Every challenge forces you to find a new creative solution and if you come across it often enough, you gain experience! I started Opensource ICT Solutions in early 2018 and the only thing we do is Zabbix. Either consultancy, training, or support worldwide, as long as it is Zabbix related. So that is even more experience that’s growing day by day as we’re serving exciting customers around the globe!

Is it true that you started using Zabbix with version 1.8? In your opinion, what are the most significant changes in the Zabbix functionality when comparing Zabbix then and now?

Yes that’s true, the first Zabbix version I started with was 1.8 and at that point I was not impressed with it, to say at least. After a few weeks, I saw the potential and started to enjoy the product more and more. After some time, we upgraded to a newer version of Zabbix where Low-Level Discovery was introduced… and wow! That was (and still is) one of the most powerful things.

In general, it’s not just one significant change that impresses me most as there are countless small and major improvements. In my opinion, it’s better to look at the product as a whole and the most significant change is how mature the product became during those years. If you really want me to name some significant changes, I must think about 3:

  • Low-Level Discovery
  • Tags
  • Dashboards
At the Zabbix Summit 2021, we will be introducing version 6.0 to the community in detail. Have you checked out the roadmap yet? Please tell us, what improvements are you most excited about?

To be honest I am not only checking the roadmap on a weekly basis but following the development rather closely to make sure we can anticipate what’s going to be introduced so we know what to advise customers.

As our customers are mainly on the Long Term Support versions, I am extremely happy that what was introduced in 5.2 and 5.4 will be available in a Long Term Support release. Regarding the new features, I am really excited about BSM(Business Level Monitoring) which will give us a comprehensive look at services instead of hosts and their metrics. That’s extremely valuable. The second thing that seems promising is HA. Although we’re building HA setups ourselves on a weekly basis it’s nice to have something natively available in the product.

Can you tell us about the speech you are going to give at the Zabbix Summit 2021?

Yes of course! So, as I mentioned, working as a Zabbix consultant and trainer, I see a lot of different environments where we have to solve various challenges. One of those challenges I came across is SNMP monitoring where the devices that had to be monitored were totally not able to handle all the SNMP requests. Luckily, they are sending traps. The art is to receive the traps and utilize them in such a way that you know the status of that device within seconds without just relying on the received trap solely. If you’re creative enough, Zabbix allows you to cater for that. So I’m going to explain why you want SNMP polling combined with SNMP traps and how to react on those traps so that you know the complete status of that device. The higher-level message of this talk is “Zabbix is flexible enough. The product is not the limit, your creativity is! Think out of the box and the sky is the limit”

 

Revealing Zabbix Summit Online 2021 Agenda. Interview with Jacob Robinson

Post Syndicated from Jekaterina Sizova original https://blog.zabbix.com/revealing-zabbix-summit-online-2021-agenda-interview-with-jacob-robinson/15803/

This year’s Zabbix Summit 2021 will take place online on October 21. We decided to introduce you to the speakers and reveal some of the topics that will be covered during the event. Our first interviewee will be Jacob Robinson, who is speaking at this year’s Zabbix summit for the first time.

Hi Jacob, we are pleased to welcome you! We mentioned above that this year would be your debut speaking at our big event about monitoring. What was your primary motivation for giving the talk?

Hi, thank you, that is correct it is my first time speaking at any Zabbix event. My goal is to help let others know the unique way I am using Zabbix so they may benefit from it. I would really like to expand the system I have built and find others to use it and potentially contribute requests and development. I hope that the talk sparks some interest in the community, and I can connect with some people to discuss it more.

Would you like please to uncover your speech a bit? What should summit attendees expect?

Sure, my speech explains a project I developed that automatically detects, identifies, and creates hosts in Zabbix so that users never need to manually create any hosts. It also obtains MAC addresses, switch port configurations, and many other host details that are automatically entered into Zabbix even over large corporate networks.

Can you tell us about yourself and your experience with Zabbix? What exciting projects have you worked on?

I have been using Zabbix for around 3 years now to provide global monitoring of AV, Networking, and Security for WeWork. Everything I have done at WeWork has been very exciting, there have been integrations with several APIs, developing a custom Okta integration with Zabbix, controlling thousands of televisions and tracking electricity cost savings with Zabbix, and the challenges involved with monitoring WeWork’s network of over 150,000 active hosts. I also run a small blog, monitoreverything.net, where I try to write detailed documentation of things that I have done in Zabbix.

Can you tell us about your professional plans? In addition, should we wait for you at the Zabbix Summit in the following years with new and insightful cases (maybe even offline in Riga, who knows)?

My last job was working as an Audio-Visual engineer, and I transitioned into a Systems Infrastructure engineer so I’m not sure what I will do next. I have been enjoying developing and supporting Zabbix so I will likely continue to do that. I plan to be in-person at Zabbix Summit when it is possible!

Deploying and configuring Zabbix 5.4 in a multi-tenant environment

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/deploying-and-configuring-zabbix-5-4-in-a-multi-tenant-environment/15109/

In this post and the video, we will discuss deploying and configuring Zabbix 5.4 in a multi-tenant environment and how Zabbix is finally ready for real multi-tenant use cases thanks to multiple features.

Contents

I. Monitoring requirements of multi-tenant environments (0:30)
II. Supported monitoring approaches (2:32)

III. Zabbix and multi-tenant environment (5:56)

IV. To-do list (21:36)
V. Questions & Answers (23:32)

Monitoring requirements of multi-tenant environments

Before talking Zabbix, let us first analyze the core requirements behind multi-tenant environments. Such environments can be quite complex with a particular set of prerequisites that we have to be sure we can satisfy before continuing further.

  • The core idea behind multi-tenancy is support for multiple customers. Therefore we need to support granular role/permission schema. The ability to define different roles for different customers and limit what they can access is key to success for such deployments.
  • Multiple customers means a lot of data. No matter if we’re talking about a single Zabbix instance or scaling by deploying multiple Zabbix instances (say, for different regions) we need to have the ability to process large amounts of data. 
  • On top of that, we must be able to scale upwards, ideally – both horizontally and vertically. More customers, different requirements, varying amounts of data to process – all of this needs to be accounted for in advance.
  • Redundancy is another key factor for us. As service providers, we absolutely cannot afford any downtime or data loss. While this may be acceptable in our own home labs or classrooms, this is not the case here. Unscheduled downtime could potentially result in a loss of a customer.

Supported monitoring approaches

Now that we have covered the architectural requirements, let’s focus on data collection. No matter the monitoring solution, the easiest approach in most cases would simply be to tell our customer to deploy an agent and be done with it. Unfortunately, this often doesn’t sit will with the end user and their security team. Let us also not forget that it is simply not possible to deploy an agent on some devices or environments – what then? Having a vast selection of monitoring methods is key for successful deployment of a multi-tenant monitoring service.

Let us take a look at this in the context of Zabbix.

Agent

  • With Zabbix agents we can obtain data in two ways – in passive (polling) and active modes (trapping). This is extremely useful while working with multiple customers, since each of them will have different internal network security policies. I have personally seen cases where only one of these approaches is supported, while the other is restricted by the security policies.
  • Agents also support deployment on different platforms (Windows, Unix, etc.), as well as execution of external third-party scripts either by way of User parameters or system.run item key.
  • Active agents are also capable of reading log files and event logs on Windows environments. This can be extremely useful, since many applications, even in-house ones, can provide a lot of monitoring data by logging it.

Agent-supported deployment on various platforms

 

Since we need to stay flexible, there are many other monitoring approaches supported by Zabbix that we can utilize:

  • SNMP, HTTP, IPMI and SSH agentless monitoring.
  • Simple checks (ICMP pings, port statuses).
  • Database and Java application monitoring.
  • External scripts (executed by the Zabbix server, Zabbix proxy or Zabbix agent).
  • Aggregations and calculations of existing data.
  • VMware monitoring and integration.
  • Web monitoring by creating web scenarios.
  • Synthetic monitoring for simulating real life user transactions.

Latest improvements

Why are we putting emphasis on multi tenancy just now? The reason is a couple of great features added in the last few releases. These features can finally allow us to utilize Zabbix in a truly multitenant environment:

Added in Zabbix 5.2:

  • Ability to create customizable user roles based on user types;
  • Secrets can now be stored in an, highly secure external vault;
  • Improvements in configuring frontend were also added. For example, each user can now select their time zone for frontend data display. This will be relevant for users in different geographical locations.

Added in Zabbix 5.4:

  • Users now have the ability to send scheduled reports. This is extremely useful for customers who may wish to receive scheduled reports about their environments. Now, instead of utilizing third-party scripts to export data and generate reports, you can use the native Zabbix functionality.
  • Major performance improvements have also been added, especially for really large instances with tens of thousands of new incoming values per second.

Zabbix and multi-tenant environment

How do we use Zabbix in a multi-tenant environment? Essentially, we provide Zabbix as a service. We use the Zabbix monitoring tool to monitor our clients (ABC and BCD in the image). We monitor their network traffic, their operating system statistics, application statistics, log files, etc. For each tenant, these monitoring requirements are going to be different.

Multi-tenant environment

Zabbix proxy

Multi-tenancy would not be possible without Zabbix proxies. With Zabbix proxies we can deploy them in customer offices, data centers, organization branches and collect data locally. Since proxies also perform preprocessing, we can even utilize them to transform and normalize metrics or even discard some of the collected metrics before forwarding them to the central Zabbix backend server.

  • Proxies are capable of performing preprocessing ever since Zabbix version 5.0. This allows us to normalize and transform data, for example – change our textual data to numeric data, use throttling and other pre-processing approaches. Even custom JavaScript is supported nowadays to format or normalize the data before we send it to our central Zabbix backend server. So, instead of the server being responsible for all of the preprocessing and having quite a large preprocessing overhead, now the proxy can do it and then forward the data to the server.
  • In addition, on the proxy, the data gets compressed before forwarding to the server thus saving some network traffic overhead.
  • The proxy still continues collecting data and storing it in its own database even in case of a network outage on the customer’s site.
  • Once we collect the data by the proxy, it gets sent to the server via a single connection, which is a lot more feasible from the network security perspective. In this case we need to create only a single firewall rule as opposed to a wide array of rules if we were to monitor the customer’s site directly from the central Zabbix backend server.
  • We can execute remote scripts on the proxy.
  • We can also deploy multiple proxies to improve scalability. If a single proxy cannot handle the amount of data that we are gathering or preprocessing, we can always deploy an extra proxy. They are easy to deploy, and can even use out-of-the-box SQLite databases.

Passive and active proxies

With proxies can also select the direction of the connection. We can deploy passive proxies, which get polled by the Zabbix backend server. In that case, the Zabbix server pulls the data from the proxy. In this scenario the Zabbix server is the one responsible for establishing the connection to the proxy. This adds a minor performance overhead to the Zabbix backend server. On the other hand, we can also deploy active proxies, where we remove that overhead from the server and proxy sends the data autonomously to the server.

At the end of the day, similar to how it goes with agent requirements, the proxy mode will depend on the security policies of the customer. Don’t forget that we aren’t restricted to a single type of proxy –  we can have both of these proxy types running at the same time.

Selecting the connection direction

Data preprocessing — throttling

Preprocessing can help us not only normalize our data, but we can also utilize it to save up on storage and performance overhead, which is vital in large environments.

When monitoring a service or an application state, we are going to be obtaining discrete values such as 1, 2, or 3, or any number. These numbers have a tendency to repeat – if our server stays up, we are going to continue receiving a number which represents “Up”. By using the preprocessing method called throttling, we can decrease the amount of these numbers stored by discarding repeating values. Only status changes are stored, therefore we can potentially save some database space and remove unneeded data processing overhead.

Discarding unchanged values

 

At this point in time, this feature sees more and more usage in many Zabbix environments, though it was severely underutilized initially when Zabbix 5.0 came out. So, if you aren’t using throttling yet and you’re running on 5.0 or newer, I definitely suggest trying to implement it to some extent. It is available in Preprocessing section of the item configuration.

Permissions

Robust permission design is essential to a multi-tenant environment. Even though permission logic has seen an addition of roles, the user group to host group relations haven’t been abandoned and still play vital role in overall permission schema.

With roles we still have to utilize the three user types – Users, Admin, and Super admins.

User role overview

Here you can see the user role and the UI elements the user has access to together with API restrictions and the actions the user can perform.

Roles grant the ability to configure access to specific UI elements, actions and restrict API calls in a granular fashion. So, when you’re configuring a role, you will see a screen similar to the one below:

Configuring user roles

User roles

Here you can select User type. The user type restrictions still apply. Users can get access only to Monitoring and Inventory, Admins can get access to except the Administration section, and Super admins can get access to every section, including Administration.

With roles we can further restrict these user types. You can have Super admins with some limitations, so that they could only do specific actions and access specific UI sections.

This option has two core benefits. The first one is security as we can limit what our customers can do and what they can access. The other benefit is in the UX, as we can simplify the UI for our users, especially people not experienced with Zabbix. We can restrict the visibility of the sections that the end users don’t have access to, so they will not be concerned with navigating through multiple sections and subsections that they are not familiar with.

User groups

We still have user groups and user groups to host groups relations, which we have to take into consideration. Access to hosts is defined on User groups. So, we have to define our user groups and assign Full/Read only/Deny permissions on particular host groups. This is how we limit what specific customers can access.

User groups

In addition, we can have host groups defined in a hierarchical manner. For instance, if you have two customers each of then having a “Network Devices” subgroup, we can select to include the root group and all of its subgroups when assigning user group to host group permissions. This is a really elegant and quick way to give a User or an Admin on the customer’s side access to all of their hosts or limit a specific organizational unit to only access what they need, e.g.: only permit access to network devices for network administrators.

Using group hierarchy

High availability

The next important decision is the HA implementation. Going without some sort of HA solution is simply too risky and therefore is not an option with such environments.

  • HA can be used to minimize downtime and add redundancy.
  • Zabbix supporst Linux HA tools – PCS, Corosync, Pacemaker, which are used to enable HA. You are also welcome to try and use other third-party tools for HA.
  • Out-of-the-box HA is planned for Zabbix 6.0.

HA setup

To achieve a quorum in our HA environment, we will require an odd number of nodes. For Zabbix backend HA it is very much recommended to have at least three nodes. Does that mean that you have to deploy 3 Zabbix servers? Not really – our third node is going to be a really small arbitration node, which is simply going to be checking connections to the two other Zabbix nodes and giving a vote to achieve quorum in case of issues with one of the nodes.

In the end we will have three nodes:  Zabbix server A, Zabbix server B, and the Arbiter node

  • An odd number of nodes is recommended to achieve quorum.
  • Only Active/Passive cluster architecture is supported.
  • We cannot have two Zabbix nodes active running at the same time and talking to the same database. It is important to use some ‘shoot the other node in the head’ mechanism — STONITH to avoid such split-brain scenarios.

Failure to abide by these requirements can result in issues with database consistency, issues with underlying queries and cleaning up or inserting data. This can cause unexpected Zabbix backend server crashes down the line.

In addition, it is very common to have a requirement for proxies to be deployed with HA. Before implementing HA for proxies, we need to decide if we really do need it. HA adds a significant configuration management overhead. We can have hundreds if not thousands of proxies, and managing HA for each of those can add a significant overhead. Of course, the more comfortable you feel with the HA tools, the easier the deployment and the management of the environment.

Another approach for  Zabbix proxy HA can also be implemented by using Zabbix API scripts. We can essentially have two proxies running without the need to have the HA suite. In this case, if proxy A is down, we can use Zabbix API to move a host from proxy A to proxy B.

Using Zabbix API script to change the proxy

Here, host.massupdate is used to change the proxy on the hosts. Combine this with a robust scripting logic and you end up with a very viable approach to move your hosts between proxies in failover scenarios.

Database replication

We have covered the HA for Zabbix server backend and let’s remember that with frontend servers, we can simply bring up additional frontends, for instance, by utilizing Docker containers. But what about the DB redundancy?

  • Database replication can be used as a form of redundancy for the Zabbix DB. No matter the DB backend – Postgres, MySQL, Oracle, we can deploy multiple DB nodes and utilize the native DB replication or use third-party tools for replication, for instance, Galera Cluster.
  • I personally prefer using native replication tools as it is a bit more simple and you don’t have to concern yourself with another configuration and management layer that could potentially fail and be a bother to troubleshoot. But this will depend on your requirements, design and skillset.

Let’s look at an example with MySQL replication. You can set it up in many different ways as multiple replication approaches are supported: master/slave, master/master, or even have multiple masters replicating to one another. It is completely up to you how to implement replication, especially if you are already experienced with such deployments.

Which approach is best? At the end of the day it will all depend on your company policies, database backend and a compromise between simplicty and extra redundancy. I definitely suggest delving deeper and studying use cases and articles for the DB backend of your choice, before you decide to go with any particular approach.

Database replication

Database performance tuning

Database tuning is vital for the long term stability of your Zabbix instance. The database defaults might be sufficient for your home office, but for large multi-tenant environments with tens of thousands new values per second they will not suffice. The database defaults depend on the database backend and the database version used, but ideally, these should be tuned and tested, preferably during the design stage, before you have deployed your Zabbix instance in production.

After installing the database backend we need to take a look at the hardware resources available. Ideally, you have already estimated the hardware resources required for your instance and ensured that DB hosts have sufficient memory, CPU resources and storage has been selected according to the I/O requirements. Now, you can move on to tuning your database backend.

As an example with Postgres I used PGTune — an online database tuning tool. This is a simple estimate that should still provide you with a somewhat adequate configuration. Though ideally, you should have a DBA on board that is aware of what kind of data loads you will be dealing with to help you with an optimal database configuration.

Database performance tuning

History table partitioning

In such large environments, you will most likely see that housekeeper cannot keep up with the amount of data stored, unable to clean it up in a timely fashion with the housekeeper processes utilization reaching reaching 100 percent for 20-30 minutes at a time. This will have a negative effect on the overall database performance for the duration of housekeeping.

At this point, it is recommended to implement partitioning for history/trend tables. We can use Postgres with TimescaleDB plugin for this. Partitioning is supported out of the box, and you can configure it in Administration > General > Housekeeping.

For MySQL and Oracle backends we would have to rely on custom partitioning scripts or procedures. In addition, community-provided partitioning scripts are publicly available.

As always – don’t forget to test 3rd party scripts in a test environment before deploying it in production!

Community partitioning solution for MySQL

You can always create your own partitioning script, but you should be aware of what you’re doing and how things should be partitioned. We should always be partitioning only history and trend tables.

History table partitioning with TimescaleDB

  • TimescaleDB plugin for PostgreSQL DB backends supports out-of-the-box partitioning. You don’t have to rely on community scripts.
  • On TimescaleDB, we need to specify the chunk_time_interval parameter, which will define the partition chunk size.
  • In addition, we can also add compression of history/trends, which helps to reduce the history table size by 60-80 percent. Again, in such scenarios, your database is going to be huge — terabytes in size with hundreds of customers, each having thousands of metrics per second. So, compression is a really valuable asset.
  • The only thing we have to take into account is that compressed data is read-only and cannot be changed post-compression. So, no more changes or inserts are possible for the compressed chunks.

History and trends compression

To-do list

  • Deploy the latest available Zabbix version. Ideally stick with an LTS version.
  • Deploy proxy servers, define and configure HA/Replication on Zabbix proxies, as well as on Zabbix servers and databases.
  • Implement partitioning to improve database performance.
  • Implement throttling to reduce the volume of the incoming data.
  • Tune your database! Either use online guides or consult with your DBA.

With our to-do list completed, we can have our Zabbix environment with deployed with redundancy in mind, providing monitoring as a service for hundreds of customers, multiple proxies running for each of the customers, HA in place, and Zabbix performing up to our expectations.

Questions & Answers

Question. Will Zabbix have its native HA solution? Will it be the whole package or does it involve installing individual components and maintaining them?

Answer. It’s planned on the roadmap to have a native HA solution in Zabbix 6.0. You should be able to get your hands on it when the 6.0 beta version gets released. Hopefully, you’ll be able to get your hands on it, test it out yourselves and give us feedback. From the looks of it it should be very much plug-and-play and will remove a lot of management overhead when comparing it with current HA implementation. Right now this is being developed only for the Zabbix backend server. As for the frontend – nothing is stopping you from having multiple frontends pointed at the same DB/Zabbix backend server. 

Question. Can I run Zabbix on a single server and sell monitoring service to several customers with fully isolated environments, not just GUI, but also items, triggers, etc.

Answer. Yes, you can. You can have a single Zabbix instance and multiple customers being monitored by this instance. The only extra step that might be required is deploying proxies on the customer’s side. By using permission restrictions, proxy servers, roles, etc., we can then monitor multiple customers from a single Zabbix instance.

Question. When we change proxy, the agent configuration has to be updated. What about HA configuration on the proxy?

Answer. That really depends on the approach. If the agent is getting pointed at the virtual IP address and HA is managed by PCS, Corosync, or Pacemaker, then it should be fine as is and the VIP should just be on the currently active host. So, you’ll be essentially rerouted. With the HA by way of API approach, you can simply allow your agent to accept connections from both proxies. With ServerActive we can also specify multiple endpoints, so agents can actually be prepared for such an environment.

Question. How to merge two different instances into a single monitoring instance?

Answer. This is a complex task. First off, both instances need to have the same major Zabbix backend version. You might simply migrate the history from one instance to another, but then you will have some problems with underlying element IDs. So, in one instance you have your own set of items, triggers, users, etc. with your own set of IDs. These will most likely conflict with the set of IDs on the other instance.

You can do partial migration or use the export function to export your templates, hosts, value maps, network maps. I would try to export as much as I can as migration on SQL level will be a real pain. It is possible if you’re stubborn enough, but it can end up being a really complex task that can take days if not weeks to fully implement and test.

Question. Do subgroups relate to templates as well?

Answer. Subgroups relate to templates in a way where we can also define permissions to reading and modifying templates. For templates, you can also create per-customer templates and assign them to host groups. Users that have access to these host groups can then read or modify the templates.