Tag Archives: conferences

Managing complexity in Zabbix installations with Splunk

Post Syndicated from Christian Anton original https://blog.zabbix.com/managing-complexity-in-zabbix-installations-with-splunk/13053/

A big data analytics engine can be used to optimize large and complex Zabbix installations: keeping track of the amount and kind of problems over time, top alert producers, and much more. You can employ Splunk to optimize and analyze vital Zabbix runtime parameters, such as ‘unsupported items,’ repeatedly happening host availability issues, misconfigured agents, and Zabbix Queue entries.

Contents

I. Complexity (1:15)
II. Zabbix entity inventory (8:28)
III. Use cases (15:16)
IV. Conclusion (20:09)
V. Questions & Answers (21:41)

secadm GmbH is a service provider located in the south of Germany. The company with a strong background in monitoring and automation, network infrastructure, and security software development supports customers of all sizes to manage their IT infrastructures. secadm GmbH is a Zabbix partner and also a Zabbix training partner.

Complexity

Operating a Zabbix deployment of a specific size comes with some challenges:

  • A huge number of hosts, templates, items, host groups, macros, and configuration elements inside your Zabbix instance.
  • LLD rules/unsupported items — items that are unable to fetch information, for example, a wrong password or a wrong path, in an external check. It is often hard to get a hold of how many of those you have and in which of the various error states. Therefore, it’s also difficult to fix them.
  • Host availability/network issues — errors that you see only in the logs — things going up and down, losing their connectivity, but getting back in time before issuing an alert.
  • Queue entries. In larger Zabbix installations, you might have ten thousand or even more items in this service queue. Zabbix actually tells you that some items do not receive their data in time. Zabbix shows that something is really wrong, though it doesn’t give a hint about what is wrong.
  • Zabbix as a monitoring tool is there to actually generate problems and alerts out of these problems. Many problems often cause ‘alert fatigue’ when people start ignoring monitoring results because of too many alerts.

Therefore, we receive a lot of questions from our customers, such as:

  • Where do all these problems come from?
  • What are the hosts generating most of the problems, at what times, and generated by what templates?
  • Did the latest change/upgrade have any negative impact on our monitoring?
  • Can you get rid of unsupported items?
  • How many hosts have specific problems (for instance, caused by a known bug in an old version of an agent that behaves strangely with a specific version of the Zabbix server), and what would be the effect if we fixed those problems?
  • Where do all these queue entries come from?

Zabbix is a transparent and predictable monitoring tool that offers great ways to organize the monitored elements with templates and macros. Zabbix also offers excellent visualization capabilities. However, Zabbix is not an analytical utility offering a flexible query language to gather the required information in the required format, having on-demand statistical functions, and allowing you to enrich and correlate data with the data from arbitrary sources. So, extra tools will have to do the extra work.

Secadm GmbH being the partner of Zabbix and Splunk, has concluded that it’s obvious to use Splunk for such extra work. Splunk is offering many possibilities to onboard data in the platform far beyond the simple indexing of log data, looking up the Key-Value store, implementing scripts and programs inside the Splunk platform to fetch data in real-time and on-demand out of other systems without having to store and to index any kind of information, as well as performing custom search commands.

Zabbix entity inventory

The most important Zabbix data used for analysis — the inventory of all elements inside Zabbix that do not often change, such as:

  • Hosts,
  • Items,
  • Proxies,
  • Templates,
  • Triggers,
  • Discovery rules (LLD),
  • Item Prototypes, and
  • Trigger Prototypes.

As this data is not changing constantly, we fetch this data from Splunk with the scheduled search and custom search command directly from the API endpoints in Zabbix. Then we can store this information inside the Splunk KV Store, which is, in fact, the MongoDB allowing us to perform searches in milliseconds without having to index any data and quickly get the results.

Zabbix entity inventory

So, you can get statistics on status and state to drill down on the unsupported items for a list of all of the items. You can further identify the correlation for the hostnames instead of host IDs, which are not human-readable. The hostnames are available at the KV Store, which stores the hosts with their metadata. You can also identify how many unsupported items there are on each host.

You can also get information on the hostnames, hosts, item names, item types, and errors. You can categorize the problems as SNMP problems, shell problems such as wrong paths, and see how often certain problems happen and what hosts are assigned to what templates and host groups, and so on. This information may also be aggregated or correlated with information from UCMDB.

More data

More fun than having data within a data analytics platform has more data.

  • Indexing the Zabbix Server / Proxy Logs logs, categorizing events to identify availability issues, item problems, preprocessing problems, housekeepers statistics, etc.

  • A module to fetch information from Zabbix (item, host, trigger) in real-time.

  • Gathering metrics (History / Trends data) directly from Zabbix in real-time without the need to store these metrics in any place other than the Zabbix database. We can still use the data for graphing, correlations, calculations, etc.

  • Onboarding the Zabbix problems into Splunk by using the new custom Media types — Webhook.

Custom Media type

  • Correlation of the alert logs, which are new and available through the API since Zabbix 5.0.
  • Working on the queue items to solve these questions.

Use cases

Zabbix queue

Zabbix queue may be a real headache as you can wait for a Zabbix installation with 20,000-50,000 items for 5 or 10 minutes or even longer.

In this dashboard, the same view is displayed in Zabbix: items are categorized by overtime, item type, proxy, etc. Splunk here offers what Zabbix fails — the history so that you can see the spikes when things have changed dramatically. For instance, when more significant network changes happen, the network slows down, and the queues grow dramatically. You can see whether these queues have gone back down or remained up. This information is complicated to analyze in Zabbix.

You can also drill down to see the items correlated with their actual status and the host’s status inside Zabbix. So, you can clearly see, for instance, that an item is on the host that is down or in the queue as it’s not supported and doesn’t get any data.

Here, there is also an Ignore list. So you can get statistics for the remaining items and group them, for instance, by Item type. You can go further and analyze and fix the problems.

Zabbix problems analytics

Zabbix problems dashboard

In this dashboard, Zabbix problems are displayed by system categories. For instance, we can see that over the last 24 hours, Windows caused most of the problems.

Here, we can also drill down to see, for instance, if there are many similar problems. You can go further to identify a single issue that has caused many alerts or problems. You can see that one host is creating almost all of the problems. So, if you switch this one host off, you would have fewer problems.

Zabbix data for management visibility

We can use Zabbix data for greater management visibility, such as:

  • Correlation of data to generate meaningful dashboards:

— Zabbix (metrics, status, problems, etc.),
— application logs,
— other data sources,
— inventory (CMDB, …)

  • Business-level visualization.

Conclusion

Splunk is open-source software and is distributed for free. We are currently in the process of integrating Splunk with Zabbix.

If you are interested in Splunk, you can send a request to [email protected]  or look for Christian Anton on LinkedIn or Instagram.

Questions & Answers

Question. If we use this kind of integration, are there any performance issues caused by Splunk or some misconfiguration?

Answer. We have been using Splunk for installations with several tens of thousands of monitored hosts and from hundreds of thousands up to millions of items and have not seen any performance implications.

Question. How does this connector work under the hood? Does it use the API or direct queries to the database?

Answer. We rely on the API. Besides, we can fetch the data directly from the database.

 

Setting up Zabbix Agent 2 for PostgreSQL monitoring and revealing how it works

Post Syndicated from Daria Vilkova original https://blog.zabbix.com/setting-up-zabbix-agent-2-for-postgresql-monitoring-and-revealing-how-it-works/13208/

This article will recall the most important theses about the plugin for PostgreSQL monitoring for Zabbix Agent 2. Here you’ll find the explanation of how the plugin works under the hood illustrated with a simple example. You will also get familiar with a new mechanism of custom queries that let you collect metrics from separate SQL files on PC.

Contents

I.Zabbix Agent plugin (2:40)

    1. Implementation (3:10)
    2. Basic features (4:24)
    3. How to get a simple metric? (11:07)

II. Custom metrics (14:05)
III. Conclusion (17:58)
IV. Questions & Answers (19:20)

 

Zabbix Agent plugin

As a rule, Zabbix Agent is installed on the Zabbix Server machine. It gathers data, which is lately collected by the Zabbix Server. The user can have full access to it via the web interface.

Implementation

  • The plugin uses github.com/jackc/pgx — PG driver and toolkit for Go to connect to Postgres. The plugin supports the database/sql interface, which is a universal interface in Golang for SQL-like databases. Connections in the upcoming version of these databases are made via this database/sql interface.
  • The handler is the basic unit of the plugin, and all queries are executed in separate handlers and then sent to the Zabbix Server. We have made an effort to create an efficient connection to, and to optimize operations of the database.
  • Some metrics are generated in JSON and grouped as dependency items and discovery rules.

Basic features

  • Zabbix Agent 2 allows for keeping a permanent connection to the PostgreSQL database. In earlier versions, to connect to PostgreSQL, we had to make psql calls affecting the server load.
  • Zabbix Agent 2 provides for flexible polling intervals, which can be customized in templates.
  • The plugin is compatible with PostgreSQL 10+ and Zabbix Server version 4.4+ and Zabbix Agent.
  • In the latest plugin release, a new feature is introduced to allow for monitoring several PostgreSQL instances by one Agent using sessions.

Plugin connection parameters

There are three levels of the plugin connection parameters:

  • Global (global for all Zabbix Agent plugins).
  • Macros.
  • Sessions.

Macros and Sessions parameters are used to define a connection to the database.

Macros level

Macros should be familiar to all users of the first Zabbix Agent. In the template, we can define macros for the user, database, etc.

Filling in the template

Then we need to fill in the Key definition as a parameter.

Key definition as a parameter

Here, the sequence is important — URI, USER, and PASSWORD. The first two parameters are mandatory. If no password is given, an empty string is used as a password. If there is no database name, the default database name is used — ‘Postgres

NOTE. There may be parameters No. 5, 6, 7, etc., which can be used as parameters for dynamic queries in the handler.

This way to connect to the database is considered as default. In the official template for PostgreSQL monitoring on the Zabbix website, macros and keys are already specified, so the setup can be done in no time.

Sessions level

Each session has its own connection parameters. So, by creating multiple sessions, we can create multiple connections to several databases.

Sessions are defined in the Zabbix Agent configuration file — zabbix_agent2.conf.

Defining four parameters for session ‘Test’

  • To define the session ‘Test’, in the configuration file, you need to go to:
# Plugins.Postgres.Sessions.
  • Then, you fill in the name of the session:
# Plugins.Postgres.Sessions.Test.Uri=tcp://localhost:5432
  • Then, you do the same for the other three parameters and define macros for the session in the template:

Defining connection parameters and the name of the session in {$PG.SESSION}.

  • You need to fill in the session Name as the only parameter for the Key:

Now the agent will automatically pick up the connection parameters for this session name from the configuration file and start running.

Metrics monitored by the plugin

In the upcoming release, the plugin will be able to gather more than 98 metrics covering almost all the important parameters in the database, including:

  • number of connections,
  • database size,
  • info about archive files,
  • number of ‘bloating’ tables,
  • replication status,
  • background writer processes activity, etc.

Some of these metrics are not very informative without the operating system parameters. However, Zabbix Agent 2 can already gather all these metrics using the operational system plugin. In zabbix.connect, we have all the needed templates to get a full picture of the database health.

 

How to get a simple metric?

1. Create a handler (file) to get a new metric, for instance, the uptime metric: — zabbix/src/go/plugins/postgres/handler_uptime.go.

NOTE. The handler definitions for the current and the upcoming version are available in the article on the PostgreSQL monitoring plugin.

2. Import package to work with Postgres and specify the unique key for the new metric:

package postgres
const (
keyPostgresUptime = "pgsql.uptime"
)

3. Find the handler with the following query:

func uptimeHandler(ctx context.Context, conn PostgresClient, _ string, _
map[string]string, _ ...string) (interface{}, error) {
var uptime float64
query := `SELECT date_part('epoch', now() - pg_postmaster_start_time());

4. Define the variable, which will hold the result.

NOTE. The matching between the Golang variables and the Postgres variables can be found on the pgx documentation page.

5. Define the query for the new metric:

row, err := conn.QueryRow(ctx, query)
if err != nil {
...
}
err = row.Scan(&uptime)
if err != nil {
...
}
return uptime, nil

Here, we:

  • perform the query,
  • check if there are any errors,
  • scan the results for the Golang variable,
  • scan for errors again, and
  • finally, return the results.

6. Register the key of your new metric in metrics.go:

var metrics = metric.MetricSet{
....,
keyPostgresUptime: metric.New("Returns uptime.",
[]*metric.Param{paramURI, paramUsername,
paramPassword,paramDatabase}, false),
}

In the metrics variable, all the metrics in the plugin are defined. Here, we need to add the description of the new metric.

Now, we need to recompile the agent and start it running as we’ll have all the new metrics on board.

Custom metrics

In the upcoming version, the agent will be able to execute queries in separate sql files located on your local machine and return the result to the Zabbix Server alongside the default metrics. To create the sql file with the query:

  • in zabbix_agent2.conf, specify the path to the directory with the sql files named Plugins.Postgres.CustomQueriesPath.
  • in the template, provide the name for the sql file as the 5th parameter for the new key — pgsql.query.custom and specify the additional parameters for this query if needed.

Custom metric example

1. Let’s consider a simple table containing three rows.

  • # CREATE table example (phrase text, year int );
  • # SELECT * FROM example;

2. I have created two files retrieving data from this table:

  • $touch custom2.sql.
    — $echo “SELECT * FROM example;” > custom2.sql.
  • $touch custom1.sql.
    — $echo “SELECT phrase FROM example WHERE year=$1;” > custom1.sql.

In the first file, no parameters are required, while the ‘WHERE’ statements is specified in the second file, so we’ll need one additional parameter.

3. I have added the path to the sql files in zabbix_agent2.conf:

Plugins.Postgres.CustomQueriesPath=/path/to/file

4. In the templates, I need to create the key — pgsql.query.custom. Here, the first four parameters are connection parameters, and the name of the file containing the query is defined as the parameter (in this case, custom2).

Then, it is necessary to do the same for the second file. However, the second query requires some additional parameters. These parameters are specified as parameter 6. Here, for the custom1 file, the ‘2021’ parameter will be used for the query.

After these two keys are created, Zabbix Agent will automatically find them, execute them, and soon the results will appear in the Latest data.

The result for each query appears in text format

As the first one starts in 2020 and the second one — in 2021, the parameter has been used for the second key.

Conclusion

The new version of the plugin with custom metrics will hopefully become available with the next Zabbix Server release.

Questions & Answers

Question. What is the point of specifying the database name in that key? Are any metrics stored there? Should we create a separate database for Zabbix?

Answer. You can use the Postgres default database, but it is recommended to create a separate database as it is more secure to get monitoring metrics from a separate database. 

Question. Does the Zabbix user both in the OS and in the database need any special permissions to get this going? 

Answer. Two permissions should be defined. These permissions are specified in the instruction for the PostrgeSQL monitoring plugin for Zabbix. 

Question. Will Zabbix work independently of the pg_stat_statements module? 

Answer. It gathers some data from the pg_stat_statements module. Without this module installed, we will not be able to get some crucial metrics from it, though the module itself will be running.

Question. Can the plugin work in the passive mode or in the active mode only?

Answer. The plugin is working similar to the Zabbix Agent — it pushes the data.

Question. Does this Postgres plugin work automatically against the Zabbix backend if we use Postgres as Zabbix backend?

Answer. If you use Agent 2 with this plugin, then it will work out of the box though you’ll have to apply templates and create items, etc. Otherwise, you’ll have to update it.

Question. What is the advantage of using the plugin over Zabbix user parameters, which are custom scripts that the agent can execute?

Answer. If you use user parameters, connections to Postgres are established through psql calls. This can create additional server load. The plugin establishes a permanent connection entailing fewer overheads.

Supercharge Zabbix with powerful insights

Post Syndicated from alexk original https://blog.zabbix.com/supercharge-zabbix-with-powerful-insights/12841/

A new set of trigger functions for long-term analysis of trend data will allow Zabbix to analyze historical data and generate alerts on detected anomalies.

Contents

I. Types of monitoring (0:39)

II. Zabbix 5.2 new functions (5:34)

III. In a nutshell (13:28)
IV. Questions & Answers (14:17)

Types of monitoring

Let’s start with a philosophical observation. In many cases, configuring monitoring entities is a pretty straightforward exercise. For instance, we know that computers should have some free disk space as applications won’t work otherwise; that CPU should not run at 100oC; that user-facing application should respond in less than a couple of seconds, otherwise, users will notice and complain. To be alerted when any of these expectations fail, we need to use triggers. A trigger can be as simple as {Host:cpu.temp.avg(5m)} > 100.

However, in some situations, it is difficult to decide right from wrong. Some cases can’t be evaluated without a proper context. For instance, is it OK if RAM is 70% full?  The answer is our favorite ‘it depends’. If RAM was just 20% full a week ago, chances are big that some application is leaking and your memory usage will continue growing. But if your RAM usage stays at 70% for three years in a row, there are even better chances that it stays so for another three years.

Another it-depends example is web traffic monitoring. Intuitively, we know that it’s perfectly normal to have uneven traffic distribution across days of week or months. But every website has its usage patterns, so even when we figure out what is normal and what’s not for one specific website, it’s difficult to scale this knowledge to other websites.

Web traffic monitoring

So, in the grand scheme of things, it all boils down to finding a good baseline for parameters we want to monitor. And baselines are usually defined by previous knowledge.

So, in such cases, instead of figuring out a fixed threshold (some fixed value or percentage), we need to figure out data points in the past that we want to compare to our current data points.

  • Compare values to known thresholds.
{Host:cpu.temp.avg(5m)} > 100
  • Baseline — compare to unknown thresholds.

Finding the right points in the past (or rather, finding a good interval to look back to) is still something that the user must supply manually, even though we are also working on automating this in the future. But Zabbix 5.2 gives you some tools to make comparisons to baseline way easier.

Web traffic monitoring example

Let’s consider a history of website visits for an imaginary commerce site — shop.example.com.

Commercial site web traffic monitoring

The numbers are different at any given point in time, yet all these are normal in a certain context. Overall, we see a growing trend in 2020 as compared to 2019. But there are seasonal traffic spikes. The biggest ones are around Christmas.

Site administrators like to be informed of any traffic anomalies (such as fraud traffic, for example), but hate false positives caused by seasonal spikes.

If we want to detect anomalies here, we can get an average for some period and compare it to an average for the same period a year before.

If we know that our organic year-to-year growth is not likely to exceed, for instance, 15 %, then it’s seemingly easy to do this in virtually any version of Zabbix: we take the average traffic over 30 days and check if it exceeds the same period a year ago by more than 15 %.

However, there are a few problems with this trigger expression.

1. First, we look 1 year back in history. But if we look into Zabbix 5.0 documentation about triggers, we see this:

This means that we need to keep a full and detailed history for at least 1 year (13 months, in this specific case). It is a passable solution if we ingest the traffic data daily. But what if we do it every minute? What if we do it every minute for a thousand websites?

2. In Zabbix, we specify time as 30d and 365d. As you may know, in Zabbix, this is just a fancy way to specify 187,200 and 68,328,000 seconds. Zabbix 5.0 doesn’t have the time suffix for a month and a year just because this cannot be simply translated to the number of seconds. Even though 30d is very close to 28d and 31d, it’s still not the same.

3. The result of avg() function with or without the second time shift parameter always depends on the specific time of the calculation. This is because Zabbix calculates time shifts by subtracting the interval from the current time. This makes it impossible to calculate aggregates between, for instance, the first and the last day of a week, a month, or a year.

Zabbix 5.2 new functions

That is why we introduce new trigger functions, which address all the specified issues. We also added few other trigger features, which improve event presentation. These functions are similar to the non-trend counterparts but are optimized for baseline monitoring use cases.

trendavg(period, period_shift)
trendcount(period, period_shift)
trenddelta(period, period_shift)
trendmax(period, period_shift)
trendmin(period, period_shift)
trendmin(period, period_shift)
  • The new functions use trends tables instead of history (do not forget to set proper trend storage period):

  • period and period_shift parameters use the Gregorian calendar instead of the number of seconds.

h (hour), d (day), w (week), M (month), and y (year).

  • These functions are easy on system resources because they do calculations only when a period ends.

In addition to the new trigger functions, we also added the ability to set customized event name.

The customized event name lets you fine-tune how the event looks in the Zabbix UI (in screens like problems and problem widget) and include trigger expression calculation results.

This field is optional, you can continue using the trigger Name field instead.

There is also a new macro {? … }. It can be used for expressions inside the event name.

Triggers

Let’s reconfigure our trigger in the Zabbix 5.2 style.

Zabbix 5.2-style triggers

Let’s see what are the arguments for trendavg() function: 1M and now/M.

  • The first argument means that we use calendar month as an aggregation period. So, depending on the month’s trendavg() will be doing calculations for, it will pick up the first and the last date of the month. The same goes for other possible interval suffixes — h for hour, d for day, w for week, and y for year.
  • The second parameter, as in regular aggregate functions, means a time shift. But to distinguish between old and new types of shifts, we call them period shifts. The period shift denotes the last point in the timeline for our aggregation.

For instance, for October 13, 2020, trendavg(1M) will calculate the value for the period from September 1, 2020, to September 30, 2020, and trendavg(1M, 1M-1y) will calculate the value for the period from September 1, 2019, to September 30, 2019.

Event name field

In Zabbix 5.2, you can continue using the Name field with the content copied to the Event name field. But if you specify the Event name, it will be used for all corresponding events instead.

The Event name supports the new macro {?…}, so you can put another trigger expression inside this macro to show some related calculations. We call it the expression macro. For instance, the Event name will be displayed on the Problem screen as follows:

Formatting functions

This trigger generates problems like this:

It’s already very useful, but this percentage will look better if we could round it up. It wouldn’t hurt to show what month we compare our traffic against. To do that, we have added two formatting functions:

  • fmtnum(digits)

— applicable to ITEM.VALUE, ITEM.LASTVALUE, and expression macros.
fmtnum(2) gives 14.85 instead of 14.8512345.

  • fmttime(format, time_shift)

— applicable to {TIME}.
— uses strftime format codes.
— formats time, for instance, {TIME}.fmttime(“%B,%Y”) gives October,2020.

Let’s see how we can improve our Event name with new formatting functions:

It looks somewhat scary on the trigger configuration screen, but Zabbix will reward us by generating events like this:

But the new functions are not limited to a single use case of comparing some data from a recent period to some past period.

Cloud budget monitoring example

Let’s consider another real-world example. Imagine that your IT department runs some very important services in the Cloud. And, of course, your finance department sends a monthly budget you don’t want to overrun. You receive cloud usage records from one or more cloud providers and ingest this data periodically into monitoring.

You could set up a trigger with a trendsum() by one month to check whether you exceeded your fixed budget in the previous months or not. But you want to know about the budget overrun ASAP. If you exceed your monthly budget in the middle of a month, your quick reaction might save the company money.

In the chart, we see the even distribution of cloud usage costs up to the last dates of September. Then the usage starts going up. When should you start worrying?

Again, the new trend functions come to the rescue.

The solution is to use the period_shift parameter, just not in the past, but rather in the future. For instance, if today’s date is October 22, this expression will calculate the sum() from October 1 to October 31.

  • trendsum(1M,now/M+1M)

There is one problem, though. To save precious computing resources, Zabbix evaluates these functions in triggers only when the period is over. However, these functions are also available in calculated items, and we can use arbitrary calculation intervals there.

So, the solution is to set up a calculated item, use trendsum() in the formula, and specify some reasonable update interval (for instance, one hour or one day).

Here, on the right-hand side of the chart, we see the current period, which is not over yet. Let’s take a look at the item definition.

This is the formula to calculate the current calendar period. Then, we can add a simple trigger referencing this calculated item:

Formula to calculate the current calendar period

You can also use the new expression macro in this trigger. You don’t need to have trend functions anywhere in the formula for this.

 

Once the trigger fires, you will see the following problem on the Problem screen — a nice and clean message containing all the information we need.

Use cases

There are many more possible applications for the new functions besides the examples above. Generally, these trend functions can be applied not only to IT metrics but also to many other real-world KPIs, for example:

— Business performance (to calculate annual revenue, profitability, etc.).
— Sales and marketing (for instance, monthly average, customer acquisition costs, sales target rate).
— Warehousing (such as weekly shipments, return rates, etc.).
— Human resources (for instance, annual training costs, overtime hours, etc.).
— Customer support (such as average response time or the number of issues per month).

We expect these functions to pave the way for Zabbix to new territories, which have been previously occupied by CRMs and other business analytics systems.

In a nutshell

  • Zabbix trend functions — a new way to analyze history without storing historical data.
  • Zabbix trend functions support calendar hours, days, weeks, months, and years.
  • New trigger field Event name – lets us display events with context.
  • New formatting functions let us present numbers and dates in a flexible manner.
  • Long-term data analysis just got easier and better with the new Zabbix 5.2.

Questions & Answers

Question. What’s the maximum time period for these new trigger functions? For how long can we analyze the data?

Answer. The maximum time period is not limited by any hardcoded values. The only limit you should keep in mind is just the size of your trend data history. But there are no limitations in the code whatsoever that would limit this use. You also should keep in mind that the longer is the period the bigger the database load is. That’s also a factor to consider.

Question. Is this trend data that we’re analyzing also going to be stored in the value cache or some other place?

Answer. it’s not stored in the value cache at the moment. These trigger functions recalculate their values only after the period is over. So it’s not of much use for value cache. But if this is required by some demanding applications, we’ll add this in the later versions.

Zabbix migration in a mid-sized bank environment

Post Syndicated from Angelo Porta original https://blog.zabbix.com/zabbix-migration-in-a-mid-sized-bank-environment/13040/

A real CheckMK/LibreNMS to Zabbix migration for a mid-sized Italian bank (1,700 branches, many thousands of servers and switches). The customer needed a very robust architecture and ancillary services around the Zabbix engine to manage a robust and error-free configuration.

Content

I. Bank monitoring landscape (1:45)
II. Zabbix monitoring project (h2)
III. Questions & Answers (19:40)

Bank monitoring landscape

The bank is one of the 25 largest European banks for market capitalization and one of the 10 largest banks in Italy for:

  • branch network,
  • loans to customers,
  • direct funding from customers,
  • total assets,

At the end of 2019, at least 20 various monitoring tools were used by the bank:

  • LibreNMS for networking,
  • CheckMK for servers besides Microsoft,
  • Zabbix for some limited areas inside DCs,
  • Oracle Enterprise Monitor,
  • Microsoft SCCM,
  • custom monitoring tools (periodic plain counters, direct HTML page access, complex dashboards, etc.)

For each alert, hundreds of emails were sent to different people, which made it impossible to really monitor the environment. There was no central monitoring and monitoring efforts were distributed.

The bank requirements:

  • Single pane of glass for two Data Centers and branches.
  • Increased monitoring capabilities.
  • Secured environment (end-to-end encryption).
  • More automation and audit features.
  • Separate monitoring of two DCs and branches.
  • No direct monitoring: all traffic via Zabbix Proxy.
  • Revised and improved alerting schema/escalation.
  • Parallel with CheckMK and LibreNMS for a certain period of time.

Why Zabbix?

The bank has chosen Zabbix among its competitors for many reasons:

  • better cross feature on the network/server/software environment;
  • opportunity to integrate with other internal bank software;
  • continuous enhancements on every Zabbix release;
  • the best integration with automation software (Ansible); and
  • personnel previous experience and skills.

Zabbix central infrastructure — DCs

First, we had to design one infrastructure able to monitor many thousands of devices in two data centers and the branches, and many items and thousands of values per second, respectively.

The architecture is now based on two database servers clusterized using Patroni and Etcd, as well as many Zabbix proxies (one for each environment — preproduction, production, test, and so on). Two Zabbix servers, one for DCs and another for the branches. We also suggested deploying a third Zabbix server to monitor the two main Zabbix servers. The DC database is replicated on the branches DB server, while the branches DB is replicated on the server handling the DCs using Patroni, so two copies of each database are available at any point in time. The two data centers are located more than 50 kilometers apart from each other. In this picture, the focus is on DC monitoring:

Zabbix central infrastructure — DCs

Zabbix central infrastructure — branches

In this picture the focus is on branches.

Before starting the project, we projected one proxy for each branch, that is, more or less 1,500 proxies. We changed this initial choice during implementation by reducing branch proxies to four.

Zabbix central infrastructure — branches

Zabbix monitoring project

New infrastructure

Hardware

  • Two nodes bare metal Cluster for PostgreSQL DB.
  • Two bare Zabbix Engines — each with 2 Intel Xeon Gold 5120 2.2G, 14C/28T processors, 4 NVMe disks, 256GB RAM.
  • A single VM for Zabbix MoM.
  • Another bare server for databases backup

Software

  • OS RHEL 7.
  • PostgreSQL 12 with TimeScaleDB 1.6 extension.
  • Patroni Cluster 1.6.5 for managing Postgres/TimeScaleDB.
  • Zabbix Server 5.0.
  • Proxy for metrics collection (5 for each DC and 4 for branches).

Zabbix templates customization

We started using Zabbix 5.0 official templates. We deleted many metrics and made changes to templates keeping in mind a large number of servers and devices to monitor. We have:

  • added throttling and keepalive tuning for massive monitoring;
  • relaxed some triggers and related recovery to have no false positives and false negatives;
  • developed a new Custom templates module for Linux Multipath monitoring;
  • developed a new Custom template for NFS/CIFS monitoring (ZBXNEXT 6257);
  • developed a new custom Webhook for event ingestion on third-party software (CMS/Ticketing).

Zabbix configuration and provisioning

  • An essential part of the project was Zabbix configuration and provisioning, which was handled using Ansible tasks and playbook. This allowed us to distribute and automate agent installation and associate the templates with the hosts according to their role in the environment and with the host groups using the CMDB.
  • We have also developed some custom scripts, for instance, to have user alignment with the Active Directory.
  • We developed the single sign-on functionality using the Active Directory Federation Service and Zabbix SAML2.0 in order to interface with the Microsoft Active Directory functionality.

 

Issues found and solved

During the implementation, we found and solved many issues.

  • Dedicated proxy for each of 1,500 branches turned out too expensive to provide maintenance and support. So, it was decided to deploy fewer proxies and managed to connect all the devices in the branches using only four proxies.
  • Following deployment of all the metrics and the templates associated with over 10,000 devices, the Data Center database exceeded 3.5TB. To decrease the size of the database, we worked on throttling and on keep-alive and had to increase the keep-alive from 15 to 60 minutes and lower the sample interval to 5 minutes.
  • There is no official Zabbix Agent for Solaris 10 operating system. So, we needed to recompile and test this agent extensively.
  • The preprocessing step is not available for NFS stale status (ZBXNEXT-6257).
  • We needed to increase the maximum length of user macro to 2,048 characters on the server-side (ZBXNEXT-2603).
  • We needed to ask for JavaScript preprocessing user macros support (ZBXNEXT-5185).

Project deliverables

  • The project was started in April 2020, and massive deployment followed in July/August.
  • At the moment, we have over 5,000 monitored servers in two data centers and over 8,000 monitored devices in branches — servers, ATMs, switches, etc.
  • Currently, the data center database is less than 3.5TB each, and the branches’ database is about 0.5 TB.
  • We monitor two data centers with over 3,800 NPVS (new values per second).
  • Decommissioning of LibreNMS and CheckML is planned for the end of 2020.

Next steps

  • To complete the data center monitoring for other devices — to expand monitoring to networking equipment.
  • To complete branch monitoring for switches and Wi-Fi AP.
  • To implement Custom Periodic reporting.
  • To integrate with C-level dashboard.
  • To tune alerting and escalation to send the right messages to the right people so that messages will not be discarded.

Questions & Answers

Question. Have you considered upgrading to Zabbix 5.0 and using TimeScaleDB compression? What TimeScaleDB features are you interested in the most — partitioning or compression?

Answer. We plan to upgrade to Zabbix 5.0 later. First, we need to hold our infrastructure stress testing. So, we might wait for some minor release and then activate compression.

We use Postgres solutions for database, backup, and cluster management (Patroni), and TimeScaleDB is important to manage all this data efficiently.

Question. What is the expected NVPS for this environment?

Answer. Nearly 4,000 for the main DC and about 500 for the branches — a medium-large instance.

Question. What methods did you use to migrate from your numerous different solutions to Zabbix?

Answer. We used the easy method — installed everything from scratch as it was a complex task to migrate from too many different solutions. Most of the time, we used all monitoring solutions to check if Zabbix can collect the same monitoring information.

Scaling Zabbix with containers

Post Syndicated from Robert Silva original https://blog.zabbix.com/scaling-zabbix-with-containers/13155/

In this post, a new approach with Zabbix in High Availability is explained, as well as discussed challenges when implementing Zabbix using Docker Swarm with CI / CD and such technologies as Containers, Docker Swarm, Gitlab, and CI/CD.

Contents

I. Zabbix project requirements (0:33)
II. New approach (3:06)

III. Compose file and Deploy (8:08)
IV. Notes (16:32)
V. Gitlab CI/CD (20:34)
VI. Benefits of the architecture (24:57)
VII. Questions & Answers (25:53)

Zabbix project requirements

The first time using Docker was a challenge. The Zabbix environment needed to meet the following requirements:

  • to monitor more than 3,000 NVPS;
  • to be fault-tolerant;
  • to be resilient;
  • to scale the environment horizontally.

There are five ways to install Zabbix — using packages, compiling, Docker, cloud, or appliance.

We used virtual machines or physical servers to install Zabbix directly on the operation system. In this scenario, it is necessary to install the operating system and update it to improve performance. Then you need to install Zabbix, configure the backup of the configuration files and the database.

However, with such an installation, when the services are unavailable as Zabbix Server or Zabbix frontend is down, the usual solution is a human intervention to restart the service or the server, create a new instance, or restore the backup.

Still, we don’t need to assign a specialist to manually solve such issues. The services must be able to restore themselves.

To create a more intelligent environment, we can use some standard solutions — Corosync and Pacemaker. However, there are better solutions for High Availability.

New approach

Zabbix can be deployed using advanced technologies, such as:

  • Docker,
  • Docker Swarm,
  • Reverse Proxy,
  • GIT,
  • CI/CD.

Initially, the instance was divided into various components.

Initial architecture

HAProxy

HAProxy is responsible for receiving incoming connections and directing them to the nodes of the Docker Swarm cluster. So, with each attempt to access the Zabbix frontend, the request is sent to the HAProxy. And it will detect where there is the service listening to HAProxy and redirect the request.

Accessing the frontend.domain

We are sending the request to the HAProxy address to check which nodes are available. If a node is unavailable, the HAProxy will not send the requests to these nodes anymore.

HAProxy configuration file (haproxy.cfg)

When you configure load balancing using HAProxy, two types of nodes need to be defined: frontend and backend. Here, the traefik service is used as an example.

HAProxy listens for connections by the frontend node. In the frontend, we configure the port to receive communications and associate the backend to it.

frontend traefik
mode http
bind 0.0.0.0:80
option forwardfor
monitor-uri /health
default_backend backend_traefik

HAProxy can forward requests by the backend nodes. In the backend we define, which services are using the traefik service, the check mode, the servers running the application, and the port to listen to. 

backend backend_traefik
mode http
cookie Zabbix prefix
server DOCKERHOST1 10.250.6.52:8080 cookie DOCKERHOST1 check
server DOCKERHOST2 10.250.6.53:8080 cookie DOCKERHOST2 check
server DOCKERHOST3 10.250.6.54:8080 cookie DOCKERHOST3 check
stats admin if TRUE
option tcp-check

We also can define where the Zabbix Server can run. Here, we have only one Zabbix Server container running.

frontend zabbix_server
mode tcp
bind 0.0.0.0:10051
default_backend backend_zabbix_server
backend backend_zabbix_server
mode tcp
server DOCKERHOST1 10.250.6.52:10051 check
server DOCKERHOST2 10.250.6.53:10051 check
server DOCKERHOST3 10.250.6.54:10051 check
stats admin if TRUE
option tcp-check

NFS Server

NFS Server is responsible for storing the mapped files in the containers.

NFS Server

After installing the packages, you need to run the following commands to configure the NFS Server and NFS Client:

NFS Server

mkdir /data/data-docker
vim /etc/exports
/data/data-docker/ *(rw,sync,no_root_squash,no_subtree_check)

NFS Client

vim /etc/fstab :/data/data-docker /mnt/data-docker nfs defaults 0 0

Hosts Docker and Docker Swarm

Hosts Docker and Docker Swarm are responsible for running and orchestrating the containers.

Swarm consists of one or more nodes. The cluster can be of two types:

  • Managers that are responsible for managing the cluster and can perform workloads.
  • Workers that are responsible for performing the services or the loads.

Reverse Proxy

Reverse Proxy, another essential component of this architecture, is responsible for receiving an HTTP and HTTPS connections, identifying destinations, and redirecting to the responsible containers.

Reverse Proxy can be executed using nginx and traefik.

In this example, we have three containers running traefik. After receiving the connection from HAProxy, it will search for a destination container and send the package to it.

Compose file and Deploy

The Compose file — ./docker-compose.yml — a YAML file defining services, networks, and volumes. In this file, we determine what image of Zabbix Server is used, what network the container is going to connect to, what are the service names, and other necessary service settings.

Reverse Proxy

Here is the example of configuring Reverse Proxy using traefik.

traefik:
image: traefik:v2.2.8
deploy:
placement:
constraints:
- node.role == manager
replicas: 1
restart_policy:
condition: on-failure
labels:
# Dashboard traefik
- "traefik.enable=true"
- "traefik.http.services.justAdummyService.loadbalancer.server.port=1337"
- "traefik.http.routers.traefik.tls=true"
- "traefik.http.routers.traefik.rule=Host(`zabbix-traefik.mydomain`)"
- "[email protected]"

where:

traefik: — the name of the service (in the first line).
image: — here, we can define which image we can use.
deploy: — rules for creating the deploy.
constraints: — a place of deployment.
replicas: — how many replicas we can create for this service.
restart_policy: — which policy to use if the service has a problem.
labels: — defining labels for traefik, including the rules for calling the service.

Then we can define how to configure authentication for the dashboard and how to redirect all HTTP connections to HTTPS.

# Auth Dashboard - "traefik.http.routers.traefik.middlewares=traefik-auth" - "traefik.http.middlewares.traefik-auth.basicauth.users=admin:" 
# Redirect all HTTP to HTTPS permanently - "traefik.http.routers.http_catchall.rule=HostRegexp(`{any:.+}`)" - "traefik.http.routers.http_catchall.entrypoints=web" - "traefik.http.routers.http_catchall.middlewares=https_redirect" - "traefik.http.middlewares.https_redirect.redirectscheme.scheme=https" - "traefik.http.middlewares.https_redirect.redirectscheme.permanent=true"

Finally, we define the command to be executed after the container is started.

command:
- "--api=true"
- "--log.level=INFO"
- "--providers.docker.endpoint=unix:///var/run/docker.sock"
- "--providers.docker.swarmMode=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.file.directory=/etc/traefik/dynamic"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"

Zabbix Server

Zabbix Server configuration can be defined in this environment — the name of the Zabbix Server, image, OS, etc.

zabbix-server:
image: zabbix/zabbix-server-mysql:centos-5.0-latest
env_file:
- ./envs/zabbix-server/common.env
networks:
- "monitoring-network"
volumes:
- /mnt/data-docker/zabbix-server/externalscripts:/usr/lib/zabbix/externalscripts:ro
- /mnt/data-docker/zabbix-server/alertscripts:/usr/lib/zabbix/alertscripts:ro
ports:
- "10051:10051"
deploy:
<<: *template-deploy
labels:
- "traefik.enable=false"

In this case, we can use environment 5.0. Here, we can define, for instance, database address, database username, number of pollers we will start, the path for external and alert scripts, and other options.

In this example, we use two volumes — for external scripts and for alert scripts that must be stored in the NFS Server.

For this Zabbix, Server traefik is not enabled.

Frontend

For the frontend, we have another option, for instance, using the Zabbix image.

zabbix-frontend:
image: zabbix/zabbix-web-nginx-mysql:alpine-5.0.1
env_file:
- ./envs/zabbix-frontend/common.env
networks:
- "monitoring-network"
deploy:
<<: *template-deploy
replicas: 5
labels:
- "traefik.enable=true"
- "traefik.http.routers.zabbix-frontend.tls=true"
- "traefik.http.routers.zabbix-frontend.rule=Host(`frontend.domain`)"
- "traefik.http.routers.zabbix-frontend.entrypoints=web"
- "traefik.http.routers.zabbix-frontend.entrypoints=websecure"
- "traefik.http.services.zabbix-frontend.loadbalancer.server.port=8080"

Here, 5 replicas mean that we can start 5 Zabbix frontends. This can be used for more extensive environments, which also means that we have 5 containers and 5 connections.

Here, to access the frontend, we can use the ‘frontend.domain‘ name. If we use a different name, access to the frontend will not be available.

The load balancer server port defines to which port the container is listening and where the official Zabbix frontend image is stored.

Deploy

Up to now, deployment has been done manually. You needed to connect to one of the services with the Docker Swarm Manager function, enter the NFS directory, and deploy the service:

# docker stack deploy -c docker-compose.yaml zabbix

where -c defines the compose file’s name and ‘zabbix‘ — the name of the stack.

Notes

Docker Image

Typically, Docker official images from Zabbix are used. However, for the Zabbix Server and Zabbix Proxy is not enough. In production environments, additional patches are needed — scripts, ODBC drivers to monitor the database. You should learn to work with Docker and to create custom images.

Networks

When creating environments using Docker, you should be careful. The Docker environment has some internal networks, which can be in conflict with the physical network. So, it is necessary to change the default networks — Docker network overlay and Docker bridge.

Custom image

Example of customizing the Zabbix image to install ODBC drive.

ARG ZABBIX_BASE=centos 
ARG ZABBIX_VERSION=5.0.3 
FROM zabbix/zabbix-proxy-sqlite3:${ZABBIX_BASE}-${ZABBIX_VERSION}
ENV ORACLE_HOME=/usr/lib/oracle/12.2/client64
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/oracle/12.2/client64/lib
ENV PATH=$PATH:/usr/lib/oracle/12.2/client64/lib

Then we install ODBC drivers. This script allows for using ODBC drivers for Oracle, MySQL, etc.

# Install ODBC 
COPY ./drivers-oracle-12.2.0.1.0 /root/ 
COPY odbc.sh /root 
RUN chmod +x /root/odbc.sh && \ 
/root/odbc.sh

Then we install Python packages.

# Install Python3 
COPY requirements.txt /requirements.txt
WORKDIR /
RUN yum install -y epel-release && \ 
yum search python3 && \ 
yum install -y python36 python36-pip && \ 
python3 -m pip install -r requirements.txt
# Install SNMP 
RUN yum install -y net-snmp-utils net-snmp wget vim telnet traceroute

With this image, we can monitor databases, network devices, HTTP connections, etc.

To complete the image customization, we need to:

  1. build the image,
  2. push to the registry,
  3. deploy the services.

This process is performed manually and should be automated.

Gitlab CI/CD

With CI/CD, you don’t need to run the process manually to create the image and deploy the services.

1. Create a repository for each component.

  • Zabbix Server
  • Frontend
  • Zabbix Proxy

2. Enable pipelines.
3. Create .gitlab-ci.yml.

Creating .gitlab-ci.yml file

Benefits of the architecture

  • If any Zabbix component stops, Docker Swarm will automatically start a new service/container.
  • We don’t need to connect to the terminal to start the environment.
  • Simple deployment.
  • Simple administration.

Questions & Answers

Question. Can such a Docker approach be used in extremely large environments?

Answer. Docker Swarm is already used to monitor extremely large environments with over 90,000 and over 50 proxies.

Question. Do you think it’s possible to set up a similar environment with Kubernetes?

Answer. I think it is possible, though scaling Zabbix with Kubernetes is more complex than with Docker Swarm. 

User roles for the enterprise

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/user-roles-for-the-enterprise/12887/

In this post, we’ll talk about granular user roles introduced in Zabbix 5.2 and some scenarios where user roles should be used and where they give a great benefit to these specific environments.

Contents

I. Permissions granularity (0:40)
II. User Roles in 5.2 (5:16)
III. Example use cases (16:16)
IV. Questions &amp; Answers (h2)

Permissions granularity

Permissions granularity

Let’s consider two roles: the NOC Team role and the Network Administrator role. These are quite different roles requiring different permission levels. Let’s not also forget that the people working in these roles usually have different skill sets, therefore the user experience is quite important for both of these roles: NOC Team probably wants to see only the most important, the most vital data, while the Network Administrators usually require permissions to view data in more detail and have access to more detailed and granular information overviews of what’s going on in your environment.

For our example, let’s first define the requirements for these roles.

NOC Team role:

  • They will definitely require access to dashboards and maps.
  • We will want to restrict unnecessary UI elements for them just to improve the UX. In this case – less is more. Removing the unused UI elements will make the day-to-day workflow easier for the NOC team members who aren’t as proficient with Zabbix as our Monitoring team members.
  • For security reasons we need to restrict API access because NOC team members will either use API very rarely or not at all. With roles we can restrict the API access either partially or completely.
  • The ability to modify the existing  configuration will be restricted, as the NOC team will not be responsible for changing  the Zabbix configuration.
  • The ability to close problems manually will be restricted, since the network admin team will be responsible for that.

Network Administrator role:

  • Similar to the NOC team, the Network Administrators also require access to dashboards and maps. what’s going on in your environment, the health of the environment.
  • They need to have access to configuration, since members of this team are responsible for making configuration changes.
  • Most likely, instead of disabling the API access for our network administrator role, we would want to restrict API access in some way. They might still need access to get or create methods, while access to everything else should be restricted.
  • For each of our roles we will be implementing a UI cleanup by restricting UI elements – we will hide the functionality that we have opted out of using.

Roles and multi-tenancy

Granular permissions are one of the key factors in multi-tenant environments. We could use permissions to segregate our environment per tenant, but in 5.2 that’s not the end of it:

  • Imagine multiple tenants where each has different monitoring requirements. Some want to use the services function for SLA calculation, others want to use inventory, or need the maps and the dashboards.
  • Restricting access to elements and actions per tenant is important. So, for example, some tenants wish to be able to close problems manually, others need to have restrictions on map or dashboard creations for a specific user group..
  • Permissions are still used to enable isolation between tenants on host group level

User Roles in 5.2

With Zabbix 5.2 these use cases, which require additional permission granularity, are now fully supported.

So, let’s take a look at how the User Role feature looks in a real environment.

User role

User roles in Zabbix 5.2 are something completely new. Each user will have a role assigned to them on top of their User Type:

User permissions

We end up having our User types being  linked to User roles, and User roles linked to Users. This means that User types are linked to Users indirectly through the User roles.

User types

The User, Admin, and Super admin types are still in use. The role will be linked to one of these 3 user types.

User roles

Note that User type restrictions still apply.

  • Super admin has access to every section: Administration, Configuration, Reports, Inventory, and Monitoring.
  • Admin has access to Configuration, Reports, Inventory, and Monitoring.
  • User has access to Reports, Inventory, and Monitoring.

Frontend sections restricted by User type

Default User roles

Once we upgrade to 5.2 or install a fresh 5.2 instance, we will have a set of default user roles. The 4 pre-configured user roles are available under Administration > User roles:

  • Super admin,
  • Admin,
  • User, and
  • Guest.

Super admin role

  • The default Super admin role is static. It is set up by default once you upgrade or install a fresh instance. Users cannot modify this role.

All of the other default roles can be modified. In the Zabbix environment, we must have at least a single user with this Super admin role that has access to all of Zabbix functionality. This is similar to the root user in the Linux OS.

Newly created roles of either  Super admin, Admin, or User types can be modified. For example, we can create another Super admin role, change the permissions. For instance, we can have a Super admin that doesn’t have access to Administration > General, but has access to everything else.

User role section

Once we open the User roles section, we will see a list of features and functions that we can restrict per user role.

When we create a new role or open a pre-created role they will have the maximum allowed permissions depending on the User type that is used for the role.

Each of the default roles contains the maximum allowed permissions per user type

UI element restriction

We can restrict access to UI elements for each role. If we wish to create a NOC role we can restrict them to have access only to Dashboards and maps. When we open the User up and go to Permissions we will see the available sections highlighted in green.

NOC user role that has access only to Dashboards and maps

Once we open up the Dashboards or the Monitoring section, we will  see only the UI sections in our navigation menu that have been permitted for this specific user.

Global view: NOC user role that has access only to Dashboards and maps

Host group permissions

Note, that User Group access to Host Groups still has to be properly assigned. For instance, when we open the Dashboard, we still have to check if this user belongs to a user group, which has access to a specific host group. Then we will either display or hide the corresponding data.

User Group access to Host Group

Access to API

API access can also be restricted for each role. Depending on the Access to API “Enabled” checkbox the corresponding user of this specific role will be permitted or denied to access the API.

Used when creating API specific user roles

In addition to that, we can allow or restrict the execution of specific API methods. For this we can use an Allow or Deny list. For instance, we could create a user that has access only to get methods: they can read the data, but they cannot modify the data.

Restricting API method

Let’s use host.create method as an example. If I don’t have permission to do so, I will see an error message ‘no permissions to call’ and then the name of the call — host.create in this case.

Access to actions

Each role can have a specific list of actions that it can perform with respect to the role User type.

In this context, ‘Actions’ mean what this user can do within the UI: Do we wish for the user to be able to close problems, acknowledge them, create or edit maps.

Defining access to actions

NOTE. For the role of type ‘User’, the ‘Create and edit maintenance’ will be grayed out because the User type by default doesn’t have access to the Maintenance section. You cannot enable it for the role of User type, but you can enable or disable it for the Admin type role.

Restricting Actions example

Let’s restrict the role for acknowledging and closing problems. Once we define the restriction the acknowledgment and closing of problems will be grayed out in the frontend.

If we enable it (the checkboxes are editable), we can acknowledge and close problems.

Restricted role

Unrestricted role

Default access

We can also modify the Default access section. We can define that a role has default access to new actions, modules, and UI elements. For instance, if we are importing a new frontend module or upgrading our version 5.2 to version 6.0 in the future –  if any new UI elements, modules or action types appear, do we want for this specific role to have access to it by default once it is created or should this role by default have restricted access to all of these new elements that we are creating?

This allows to give access to any new UI elements for our Super Admin users while disabling the for any other User roles.

Default access for new elements of different types can be enabled or disabled for user roles

If Default access is enabled, whenever a new element is added, the user belonging to this role will automatically have access to it.

Role assignment post-upgrade

How are these roles going to be assigned after migration to 5.2? I have my users of a specific User type, but what’s going to happen with roles? Will I have to assign them manually?

When you upgrade to 5.2 from, for example, 5.0, the users will have the pre-created default roles for Admin, User, and Super admin assigned for them based on their types.

Pre-created roles after migration

This allows us to keep things as they were before 5.2 or go ahead with creating new User roles.

Example use cases

The following example use cases will give you an idea of how you can implement this in your environment.

Read-only role

ANOC Team User role, with no ability to create or modify any elements:

  • read-only access to dashboards,
  • no access to problems,
  • no access to API, and
  • no permissions to execute frontend scripts.

When we are defining this new role, we will mark the corresponding checkboxes in the Monitoring section. The User type for this role is going to be ‘User’ because they don’t need to have access to Administration or Configuration.

User type and sections the role has access to

We will also restrict access to actions, the API, and decide on the new UI element and module permission logic. Default access to new actions and modules will be restricted. Read up on Zabbix release notes to see if any new UI elements have been added in future releases!

Read-only role

When we log in with this user and go to Dashboards, we will see that this user has no option to create or edit a dashboard because we have restricted such an action. The access is still granted based on the Dashboard permissions — depending on whether it is a public or a private dashboard. When they open it up, the data that they will see will depend on the User group to Host group relationship.

When this user opens up the frontend, he will see that access to the unnecessary UI elements is restricted (the restricted UI elements are hidden). Even though he has access to the Problem widget on the dashboard, they are unable to acknowledge or close the problem as we have restricted those actions.

Restricted UI elements hidden and ‘Acknowledge’ button unclickable for this Role

Restrict access to Administration section

Another very interesting use case — restricting access to Administration sections. Administration sections are available only for our Super admins, but, in this case, we want to have a separate role of type Super admin that has some restrictions.

Our Super admin type role that has no access to User сonfiguration and General Zabbix settings will need to be able to:

  • create and manage proxies,
  • define media types and frontend scripts, and
  • access the queue to check the health of our Zabbix instance.

But they won’t be able to create new User groups, Users, and so on.

So, we are opening our Administration > User roles section, creating a new role of type Super admin, and restricting all of the user-related sections, and also restricting access to Administration > General.

User type – Super admin. General and User sections are restricted for this role

When we log in, we can see that there is no access to Administration > General section because we have restricted the ability to change housekeeper settings,  trigger severities, and other settings that are available in Administration > General.

But the Monitoring Super admin user still has the ability to create new Proxies, Media Types, Scripts and has access to the Queue section. This is a nice way to create different types of Super admins which was not possible before 5.2.

Access to Administration section elements

Roles for multi-tenant environment

Zabbix Dashboards and maps are used by multiple tenants to provide monitoring data.

In our example, we will imagine a customer portal that different tenants can access. They log in to Zabbix and based on their roles and permissions can access different elements. One of our Tenant requires a NOC role :

  • read-only access to dashboards,
  • read-only access to maps,
  • no access to API,
  • no access to configuration,
  • isolation per tenant so we won’t be able to see the host status of other tenants.

We will create a new role in Administration > User roles — new role of type User. We will restrict access only to the UI elements that need to be visible for the users belonging to this role.

User type role with very limited access to UI

Since we need to have isolation, we will also be using tag-based permissions to isolate our Hosts per tenant. We’ll go to Permissions section, add read-only or write permissions on a User group to a specific Host group. Then we will also define the tag-based permissions so that these users have access only to problems that are tagged with a specific tag.

Tag-based permissions to isolate our Hosts per tenant

Don’t forget to actually tag those problems and define these tags either on the trigger level or on the host level.

Tagging on the host level

Once we have implemented this, if we open up the UI, we go to Monitoring > Dashboards. We can see that:

  • The UI is restricted only to the required monitoring sections.
  • Tag-based permission ensure that we are seeing problems related to our specific tenant.

Isolation and role restriction have been implemented, and we can successfully have our multi-tenant environment.

Roles for multi-tenant environments

What’s next?

How would you proceed with upgrading to Zabbix 5.2 and implementing this? At the design stage, you need to understand that User roles can help you with a couple of things and you need to estimate and assign value to these things if you want to implement them in your environment.

  1. User roles can improve auditing. Since you have restricted roles per each user it’s easier to audit who did what in your environment.
  2. Restricting API access. We can not only enable or disable API access, but we can also restrict our users to specific methods. From the security and auditing perspective, this adds a lot of flexibility.
  3. Restricting configuration. We can restrict users to specific actions or limit their access to specific Configuration sections as in the example with the custom Super admin role. This allows us to have multiple tiers of admins in our environment
  4. Removing unwanted UI elements. By restricting access to only the necessary UI elements we can give Zabbix a much cleaner look and improve the UX of your users.

Thank you! I hope I gave you some insight into how roles can be used and how they will be implemented in Zabbix 5.2. I hope you aren’t too afraid to play around with this new set of features and implement them in your environment.

Questions & Answers

Question. Can we have a limited read-only user that will have access to all the hosts that are already in Zabbix and will be added in the future?

Answer. Yes, we can have access to all of the existing Host groups. But when you add a new Host Group, you will have to go to your Permissions section and assign User Group to Host Group permissions for the newly added group.

Question. So that means that now we can have a fully customizable multi-tenant environment?

Answer. Definitely. Fully customizable based both on our User group to Host group permissions and roles to make the actions and different UI sections available as per the requirements of our tenants.

Question. I want to create a user with only API access. Is that possible in 5.0 or 5.2?

Answer. It’s been possible for a while now.  You can just disable the frontend access and leave the user with the respective permissions on specific Host groups. But with 5.2 you can make the API limitations more granular. So, you can say that this API-only user has access only to specific API methods

Question. Can we make a user who can see but cannot edit the configuration?

Answer. Partially. For read-only users, read-only access still works for the Monitoring section. But if we go to Configuration, if we want to see anything in the Configuration section, we need write access.You can use Monitoring > Hosts section, where you can see partial configuration. Configuration section unfortunately still is not available for read-only access.

 

 

Data solution for solar energy application

Post Syndicated from Brad Berwald original https://blog.zabbix.com/data-solution-for-solar-energy-application/13005/

Morningstar, the world’s leading supplier of solar controllers for remote solar power systems, has partnered with Zabbix to provide pre-configured integration of their data-enabled solar power products with the Zabbix network monitoring solution. Now, both power system data and network performance metrics integrate seamlessly to allow remote solar systems to be monitored and managed from a single software platform on the premise or in the cloud, using solutions from Zabbix.

Contents

I. About Morningstar (1:43)
II. Products and technology (6:01)

III. Data solution for solar application (10:28)

IV. Zabbix solution (15:09)

V. Conclusion (21:40)
VI. Questions and Answers (23:09)

 

In this post, an overview of Morningstar’s diverse product line is presented and the industry applications they support, as well as of how the Zabbix network monitor can be used to manage and log time-series data using SNMP and MODBUS protocols for powerful and scalable system oversight and trend analysis.

Morningstar has been working in partnership with Zabbix to provide integration for the Morningstar products, including easy-to-use templates and pre-formatted data sets in order to speed up getting these products online, so that customers can monitor both their network data and solar power system data at remote sites.

About Morningstar

Morningstar is the leading supplier of charge controllers and inverters generally used in remote power systems around the world.

Morningstar, located in Newtown, Pennsylvania, USA, has sold over 4 million products deployed into the field since the company’s inception in 1993. Morningstar currently works in over 100 countries and provides reliable remote power for mission-critical applications.

We’d like to think of ourselves as the ‘charging experts’ because of our focus on battery life and many years of charging innovation.  We have a diverse product line and many models designed for application specific needs, such as solar lighting and telecommunications. Morningstar has one of the lowest hardware failure rates in the industry.

Some of these mission-critical applications include:

— Residential and rural electrification.

— Commercial systems.

— Industrial products, including telecommunications, oil and gas, security applications.

— Mobile and marine application, which generally includes boats, RVs, and caravans, agricultural applications, etc.

Overview of Morningstar solar applications

 

— Railroad industry where remote signaling and track management is often remotely powered by solar applications because of its critical nature and absence of a readily available electric grid.

— Traffic applications, early warning systems, signaling messaging systems, traffic, and speed monitoring equipment also can be easily powered for mobile deployment with a battery-based system.

— Oil and gas industry is a specific and notable market for Morningstar, because oil field automation measurement of the gas flow and pressure (RTUs), as well as methane injection points used to keep the gas flowing and avoid well freeze-ups, can be powered by solar power with a very modest amount of power for data monitoring. Since the pipelines often traverse very remote regions, this is highly advantageous to get power where it’s needed.

— In telecommunications, cellular base stations and backhaul links to provide the data for the sites, lane mobile radio applications, and satellite-based infrastructure benefit from remote solar power. In these applications, the loads can be modest or they can be quite significant. In that case, several controllers can be combined together to charge a very large battery bank often with a hybrid diesel gen-set in conjunction with other renewable energy sources to provide a hybrid power system. This increases reliability, provides diversity during inclement weather, and maintains the high integrity of the site link.

— In the cases of rural electrification, small amounts of DC power can be provided in remote locations with no grid access in the countries with a large populace and huge needs for lighting and cell phone charging.

Recently, we did a notable project in Peru, where nearly 1 million Peruvians were provided remote power access using 200,000 DC energy boxes. They provided basic 12V DC power, USB charging, and were distributed over the country in some of the most remote locations.  These home systems easily met the needs for lighting, device charging and other small equipment power needs. In addition, 3,000 integrated power systems for community centers were also deployed to provide 230V AC power for more critical loads, including more substantial lighting, communication and, in some cases, health equipment for the benefit of the local population.

So, we’re very proud to have deployed probably one of the largest and most ambitious rural electrification projects in the history of the off-grid industry. The project was completed last year with our partner Tozzi Green of Italy.

Products and technology

Morningstar has a diverse product set covering all power levels  — anywhere from modest 50W needs up to models that handle 3.2kW per device and that can be paralleled for even greater capacity. We also provide inverter systems in both our SureSine and coming MultiWave, which will be coming to market in the near future.  These inverters provide AC power and enable hybrid system charging (combining both solar and AC sources together).  These meet more demanding load needs and add robust high current charging capabilities from the grid or from diesel generators.

Morningstar products and technologies

So, together all these product lines make up a diverse set of products that really fulfill the variety of needs in an off-grid remote power system. In many cases, each of our products includes open communication protocols, which can be used for remote management.

Charge controllers

A charge controller is installed between the PV modules and the battery. It monitors various system power and voltage readings and temperatures, The charge controlled is also intended for managing the batteries in order to provide long-life adequate charging, take the batteries through their various charging stages, and to manage the DC loads connected to the device. They can, of course, extend the battery life significantly if the battery’s setpoints are configured correctly and the right choice for the battery model is made. That depends a lot on the battery chemistry, the temperatures it will experience, and how deeply it will be cycled or discharged each day while providing power to the system.

We have products in both the PWM and MPPT in our line of charge controller topologies.

 

MPPT controllers are able to convert DC power from the PV array to the proper battery voltage. So, it has an integrated DC-to-DC converter and controls the charge of the battery preventing overcharging and extending the life.

 

It is also used with one of our inverter products in order to provide small AC loads, such as equipment that requires 120 or 230-volt power remotely in the field.

Our product line covers a variety of PWM charge controllers.

PWM charge controllers

  • Pulse width modulation products are more cost-effective, simpler in design (from a complexity standpoint), and provide direct charging from an equally sized nominal solar array.
  • The MPPT charge controller line ensures the maximum power point is tracked (MPPT), therefore optimizing power harvest. The modules can be of a much wider range of voltages, much higher voltage, and will actually be monitored and tracked by the controller to provide the optimal operating point for the system.
  • The SunSaver and ProStar MPPT lines are used extensively in smaller systems under a thousand watts.
  • Our TriStar family is used for 3kW or greater and is able to be paralleled. A notable product is our 600V controller, which allows the PV modules to be wired in series for very high voltage input providing advantages in efficiency and PV array distance from the controller. All the MPPT controllers will take the input voltage and conver to the proper output voltage. They will convert it to the expected output to support 12V, 24V, or 48V battery systems.
  • Morningstar inverter line includes the SureSine and MultiWave inverter chargers. We also have a very extensive line of accessories used with each of these controllers. These generally provide protocol conversion hardware, interface adapters, and other items that can control relays to support system control or actuate additional components in remote off-grid systems.

EMC-1 Morningstar’s Ethernet MeterBus converter

EMC-1 is a simple serial 2 Ethernet converter that also supports a real-time operating system and a variety of protocols. So, Morningstar products can be connected to industry-specific applications using those standard protocols — Modbus over IP, or SNMP. It can display a simple HTML web GUI to allow direct connection and a one-to-one basis with the product for simple status monitoring using any type of device, including mobile devices, such as phones or tablets.

Data solution for solar application

Challenges to remote monitoring of solar power sources

Power for wireless ISP infrastructure is a common off-grid application requiring network traffic and power to be monitored together. Customers’ access in the field using Wi-Fi or LTE communications should also be enabled.

When these devices are deployed, clearly they have to have their network equipment monitored with the network management system or NMS. These network monitors can now be enabled with EMS and using SNMP, and Zabbix to make this far easier to integrate the power systems into the same monitoring system. So, you have a single point of software and data collection, and both power and network bandwidth and status can be monitored at the same time.

What this monitoring can help achieve:

  • Measurement of the true load consumption in the field. The power levels will vary depending on the type, amount of usage, and the technology and frequencies used. So, the load in the field throughout the day, during peak and off-peak hours can be directly monitored in real time.
  • Detection and root cause of network outages. We need to minimize network outages and to ensure that the site is reliable and the network is on at all times to avoid customer dissatisfaction and frustration by the operating carrier. The ability to monitor both power and network allows the root cause of network outages to be determined, whether it’s a system configuration, bandwidth restrictions, or something that has caused difficulty with the power system itself, such as a depleted battery, insufficient solar, even electrical faults, or possibly tampering with the system.
  • Ensuring sufficient power at the site to prevent deep battery discharge. It also ensures that you have adequate PV to cycle the battery properly. So, when the battery is depleted each day from powering the loads, it can be fully recharged the next day when PV power is available again. This balance is difficult to manage because you have to always ensure power for the battery, protect the loads, but you may or may not have adequate sun each day. So, reserve power is often provided in the system to ensure that the site will remain up during lower than average or uneven periods of PV supply.

Measurement of the current system status, as well as historical data. Measurement of all this data is ensured by the network monitoring software. Sometimes, with a high level of granularity, so that you can see what is happening in the system on a minute-by-minute or hour-by-hour basis.  This helps detect system faults otherwise missed.

Ensuring system resiliency during low periods of production. In peak times, you usually have more than adequate power. However, off-grid systems may be system-sized in order to handle a worst-case scenario, for instance, for the winter months or the off-peak months with the less amount of sun hours and lower levels of solar insulation that can provide as much power as you expected during the summer. So, during these worst-case periods of the year,  monitoring can be really critical because it’s when you’re most likely to experience an outage due to inadequate solar.

  • When long-term data is available, you can compare, for instance, month-to-month or season-to-season power output, as well as look at trends and analyze the system lifetime of operation to detect anomalies and negative trends during operation to indicate pending battery failure.

With a lot of lead-acid batteries, a minimum of five years is generally acceptable. With newer lithium technologies, battery life is extended to 10 or more years when adequately sized. So, the batteries can have a robust life as long as they are sized correctly and given adequate power.

But monitoring the end stage of a battery, which most likely will occur at some time in the system, is really critical. Many remote sites that are deployed for an extended period of time can go through one or two battery replacement cycles. So, a downward trend with the power declining over time and the batteries beginning to show signs that their health is no longer adequate to support the system can be detected with this long-term data analysis.

Zabbix solution

Remote monitoring of a Morningstar EMC-1 adapter’s IP connection through Zabbix monitoring platform

A typical system in this diagram shows one of the ProStar MPPT controllers connected to a solar array and a battery storage system. Loads can be typical among many of the applications. EMC-1 can be connected to the device in order to provide IP connectivity. That’s something that can be connected to a variety of services:

  • Modbus protocol to connect to SCADA or other HMI Data viewing solutions, which are common in automation and oil and gas.
  • Simple HTTP or HTML web pages to get a simple look at a dashboard to understand what’s going on in the system.
  • SNMP can be used alongside the network monitoring software, and the Zabbix network monitoring software can be enabled to monitor the entire site with just one tool.

Being cloud-based or server-based can have great applications for energy storage, data logging, notifications, and alarms. Native or external databases in the cloud can be used to archive the large amounts of data that will be accumulated. The sites can grow to hundreds or even thousands of deployed systems. So, the software tool and the server must be scalable so that they can grow in time and keep up with the needs of the data.

Advantages:

  • The benefits of an IP-based solution involve its compatibility with any network transport layer. In addition, there’s a variety of wireless applications in the field, including point-to-point, licensed, unlicensed, Wi-Fi, proprietary, wireless protocols,and cellular.
  • Recently, notable gains in the satellite industry have provided lower latency and higher bandwidth. With satellite, you can often reach almost any part of the world, which gives it great benefits for solar power applications.
  • SNMP — a very lightweight protocol. On a metered and wireless connection, especially in these hard-to-reach locations, low overhead UDP packets and minimal infrastructure for monitoring can make sure that you have minimal impact on the system itself in terms of overhead.

Zabbix dashboards for Morningstar solar systems

  • Native Morningstar SNMP support is provided by these tools. We work to review use cases, system needs for solar applications, as well as data sets. The MIB files are already being imported and device templates are being pre-configured for a variety of Morningstar product solutions that support the EMC-1.
  • Dedicated templates allow you to easily connect the hardware to an existing system and go about monitoring Morningstar’s tools using your existing Zabbix instance.
  • Performance visibility of solar-powered systems is available:

— on a very high level to see if there are any systems that have needs or are in a fault state, or

— in greater detail to analyze the time series data and to allow you to correlate that data with other aspects of the system, to determine what is the root cause of the system and how that data is trending. Time series data correlation provides for accelerated troubleshooting.

  • At a glance dashboard management tool makes it easy to monitor the status of all Morningstar devices on the network and to scale to hundreds of sites.
  • Active advanced and custom alerts sent out by the Zabbix system and triggered by the power system events ensure proactive notification of when there is a pending issue at the site, hopefully, before critical loads drop. If you can be notified, then using the bi-directional nature of some of the other protocols, system changes, corrections, extended runtimes, or even auxiliary charging systems, such as generators, can be activated to prevent an outage. Such proactive monitoring can only take place when you’re working at scale using a tool such as Zabbix.

Advantages of monitoring with Zabbix

  • Morningstar provides some simple PC-based utilities that run on Windows software and can provide direct Modbus capability for communication, very simple data logging on a very small number of devices. Morningstar MSView functional utility allows configuration files for the products to be uploaded and deployed to the controllers in the field, as well as basic troubleshooting.
  • Morningstar Live View is our built-in web dashboard that also runs on the EMC. It allows a simple web page with everything displayed in HTML so that it can be viewed on any device regardless of the operating system.

These two products are meant for troubleshooting, site deployment, and configuration of small-scale systems. They’re not set to be scaled.

  • With Zabbix, an almost unlimited number of devices can be connected depending on your computing resources and power.
  • Zabbix supports SNMP and Modbus, which is beneficial for both telecom and industrial automation or smart city applications.
  • Zabbix gives you a real-time data display, as well as custom alerts and notifications. You can set up custom log intervals, downloading of extensive amounts of log data, etc. Reports can be generated based on custom filtering, as well as long-term historical data, which becomes more critical to understanding the site’s longevity.
  • There are cloud-based systems where APIs are available, cloud-to-cloud integrations can be utilized and advanced data management analysis or intelligence can be added onto existing servers by using additional third-party tools.

So, it’s really the only way to manage data of this scale and size.

Conclusion

Zabbix adds a great deal of value and capabilities to Morningstar products when used in the field. If additional access is provided via satellite, cellular, or fixed wireless technologies, then the charge controllers can perform their duty of providing remote power for these systems but easily integrating using the existing protocols to monitor across the entire system deployment.

As solar equipment is often used to provide power for remote network infrastructure. Integrating data from the network components and power systems into a centralized NMS provides an essential management tool to optimize system health and increase uptime. Zabbix also adds configuration options and valuable data analytics to ensure full system visibility. More information on the Zabbix network monitoring tools or Morningstar data-enabled remote power products is available at https://www.zabbix.com and www.morningstarcorp.com or can be requested from [email protected] and [email protected], respectively.

Questions and Answers

Question. Are Morningstar templates shared somewhere? Are they available to the public?

Answer. A part of the partnership with Zabbix is to get all this integrated. We’re putting the finishing touches on how that will be available and easily downloadable as part of our SNMP support documentation. In addition, we can do some cross-referring, so that we can help our products get online. Hopefully, you can get them plugged into the major network monitoring access. All that will be available probably within the next month.

Question.  Zabbix starting from 5.2 natively supports Modbus and MQTT. Do you plan on using that in your environment?

Answer.  Yes, MQTT has come up quite recently and is an indeal solution where IP addressing challenges exist and pub/sub style data reporting is preferred from session initiated within the network. Currently, we support Modbus and SNMP, though we are considering other protocols. Modbus has been used within the solar industry for a long time for automation and control. We also have extenive market in the oil gas industry, where they utilize Modbus both for polling of the data, as well as real-time control by actually making configuration changes to the product remotely.

SNMP is a more recent development and it helps to get on the bandwagon with telecommunications and IT-related markets. So, it’s an easy transition using a protocol that customers are already familiar with.

Question. How do you use report generation? How do you enable it and implement it in Zabbix?

Answer. A lot of our customers are looking for trending data over a certain period of time. So, they would set up regular intervals for the data to be collected and reported because the long-term trending data is about looking at the same site during different periods of time or looking at the same site next to its peers to see how the power system may be varying from what is expected. So, regular report intervals can be executed and filtered based on certain conditions.

There are really a few key parameters of a solar site to look at to understand the health of the system. You need to focus on the battery levels, the maximum power of the solar panel, and a quick diagnostic check to find out if the controller shows any faults or alarms. So, if you have a simple report you can quickly be sure that hundreds of sites are in good shape. If one of them isn’t, you could drill down into more detail on just that specific site.

Aranet — a wireless IOT sensor platform

Post Syndicated from Toms Reksna original https://blog.zabbix.com/aranet-a-wireless-iot-sensor-platform/12953/

Aranet — wireless IoT sensor platform. Wherever you need to measure anything – temperature, air quality, light, or any other physical parameter – Aranet’s main mission is to deliver these measurements simply, easily, and above all – wirelessly. Aranet is manufactured by SAF Tehnika — a company with over 20 years of experience in the telecom industry, microwave radio, and test & measurement equipment manufacturing, and a certified partner of Zabbix.

Contents

I. Aranet wireless sensor network (1:41)
II. Aranet in retail (5:53)
III. Indoor air quality and COVID19 (8:20)
IV. Partnership with Aranet (12:11)
V. Questions & Answers (13:52)

Aranet wireless sensor network

Aranet is a wireless sensor network consisting of the Aranet PRO base station and sensors transmitting data to one another over the 868 MHz frequency in Europe and 920 MHz frequency in the United States. This frequency allows us to have a very large line of sight distance between the sensor and the base station — up to 3 kilometers line of sight and a couple of hundred meters indoors.

Sensors are intended to measure different environmental parameters. You can connect up to a hundred sensors per base station. Sensors can be configured to send the data over different intervals — once every minute, two minutes, five, or 10 minutes. Sensors are very power efficient — with a regular AA battery, they will last up to 10 years.

Aranet ecosystem

Aranet technology is based on the LoRa physical layer. We have built our proprietary LPWAN protocol with XXTEA encryption on top of LoRa to make the radio parameters better and to increase the battery life.

Aranet technology

The brain of the system is the Aranet PRO base station – the radio receiver with a built-in web server housing SensorHUB software and internal memory for local data storage. It is made with ease of use in mind – you can connect directly to the base station with your PC, laptop, or phone over Ethernet or Wi-Fi, open up your web browser and access the free SensorHUB software. You don’t even need to install anything.

Aranet PRO base station offers a lot of features such as graphing, exporting data, etc. In addition, its internal memory allows for storing 10 years of readings even if the Internet goes down.

The sensors are sending data to the base station. Several such base stations can be agglomerated into the Aranet Cloud solution collecting data from several base stations and allowing you to access the data from anywhere.

Aranet architecture

With over 20 years of experience in radio manufacturing, we believe that we’ve created one of the best-in-class systems in terms of wireless connectivity with our base stations and in-house cloud. However, we are looking for a strategic partnership where the Aranet system can become a part of a larger system. This brought us to the partnership with Zabbix so that we can integrate our cloud solution with the Zabbix monitoring system.

Aranet philosophy

Aranet Example Use Cases

Aranet for retail

Rimi

Aranet has been actively used in retail, for instance, Rimi — a chain of Latvian supermarkets, where 6,500 sensors have been installed in 125 stores. Aranet is planning to expand to other Baltic states.

Aranet equipment is primarily used for:

  • Monitoring of freezer temperatures. Earlier, they had to check the temperature manually — somebody had to walk around with the legal pad and check the temperature to make sure that freezers are working properly and to report to the relevant government agencies. Aranet allowed for automating this process.
  • Alarms in case of malfunction. In the case of a malfunction, an alarm can be sent to avoid product spoilage.
  • Working on predictive maintenance, including machine learning algorithms for predictive maintenance to locate anomalies in the defrost cycle temperature data helping to prevent breakages.

Aranet in retail

Benefits

  • Even the largest supermarkets (8800 m2/94 000 ft2) can be covered with a single base station.
  • Manual data collection can be avoided
  • Freezer temperature operating costs can be optimized (20% energy costs reduction).
  • Product spoilage can be avoided.
  • Litigation/fines for slip and fall accidents can be avoided.

Aranet for indoor air quality and COVID19 safety

Due to COVID19, many governments and health agencies have changed their guidelines, including the Center for Disease Control in the United States, and they now state that COVID19 can be transmitted through aerosols. Aerosols are small droplets that are released when we cough, sneeze, or talk. As these droplets are small — about five microns, they linger in the air for up to nine or more minutes. So, that means that you don’t even have to be in contact with the infected person to actually catch the disease.

This requires proper ventilation practices, which can decrease x10 the time aerosol particles stay in the air.

Aranet4 PRO – a wireless COVID19 safety network

One way to estimate if ventilation is sufficient is to measure CO2. The amount of CO2 (air exhaled by other people) in a certain room is a measure of the risk of contagion. The recommended air circulation per person is 60m3 /h, which is approximately 800ppm CO2 concentration — almost twice as much as the outside value.

Aranet wireless CO2 sensor

Aranet offers a wireless CO2 sensor that also measures temperature, relative humidity, and air pressure. It comes with a useful Bluetooth application, which allows you to easily get the latest readings. But the most important thing is that this sensor can generate alerts. So, whenever the value exceeds the critical level, you have a visual indication — green, yellow, or red, as well as an audible alert prompting to manually increase the ventilation, for instance, by opening windows.

Lately, these sensors have been gaining popularity, especially in schools, universities, and offices as they offer:

  • Simple plug-and-play setup with the Aranet base station.
  • Updating information available locally on each sensor, as well as centrally on the base station, so that you can see what spaces need additional ventilation.
  • Free software – graphs, reports, centralized alarms.
  • Control of airborne COVID19 spread in schools, offices, and other indoor facilities.

Partnership with Aranet

Aranet wireless network can be implemented in many other industries:

  • Horticulture,
  • Livestock,
  • Building Management,
  • Warehousing,
  • Data Centres,
  • Pharma,
  • Medical,
  • Retail.

So, Aranet is looking for integration and distribution partners, which are interested in wireless monitoring. Details of the partnership are available on aranet.com or can be requested from [email protected].

The Aranet’s core value is the wisdom of Lord Kelvin: “you can only improve what you can measure”. So, we strive for delivering these measurements in the easiest and the most straightforward way possible so that you could improve whatever you wish.

Questions & Answers

Question. Is there some way or some benefit to integrating Aranet with Zabbix?

Answer. Aranet has many and diverse applications, as well as Zabbix. So, adding physical parameters on top of the monitoring solution network parameters would help out. For data centers or retail stores, in addition to alerts of something wrong with the network, alarms of something physical happening would be useful. It might be useful to be alerted, for instance, if it’s too hot.

Question. Is it possible to switch your sensors to LoRaWAN so that we can use existing networks?

Answer. We have decided to have our proprietary network based on the LoRa physical layer with proprietary communication software. This decision was made for several reasons:

  • ease of use— the main thing that our customers actually value. Aranet system can be easily set up in a couple of minutes — you just lay the sensors and they start working. With LoRaWAN you have the base station from one provider, and sensors from the other, so it takes time to make the system work. Aranet works out of the box.
  • improved battery due to our protocol.
  • improved security as with Aranet you control the whole ecosystem from the base station to sensors. In addition, with Aranet you won’t face dependencies, password management, or communication issues.
  • private network

Question. Are there any electrical sensors — volts, amps, power, or anything like that?

Answer. We can monitor voltage, but these are mostly for third-party integrations. We have pulse output sensors, which you can connect to these electricity meters, for instance. So, this can be monitored.

 

Let me subscribe – Zabbix masters IoT topics

Post Syndicated from Wolfgang Alper original https://blog.zabbix.com/let-me-subscribe-zabbix-masters-iot-topics/12710/

Zabbix 5.2 supports two important protocols used in the world of the Internet of Things — MQTT and Modbus. Now we can benefit from the newest Zabbix features and integrate Zabbix network monitoring in the world of IoT.

Contents

I. What is MQTT? (3:32:13)
II. MQTT and Zabbix integration (3:39:48)

1.MQTT setup (3:40:03)
2.Node-RED (3:42:12)
3.Splitting data (3:45:45)
4.Publishing data from Zabbix (3:52:23)

III. Questions & Answers (3:55:42)

What is MQTT?

MQTT — the Message Queuing Telemetry Transport was invented in 1999, and designed to be bandwidth-efficient and lightweight, thus battery efficient. Initially, it was developed to allow for monitoring oil pipelines.

It is a well-defined ISO standard — ISO/IEC 20922, and it is getting increasingly adopted due to its suitability for the Internet of Things (IoT), sensor networks, home automation, machine-to-machine (M2M), and mobile applications. MQTT usually uses TCP/IP as the transport protocol — over ports 1883, and can be encrypted using TLS transport mechanism with 8883 as the default port.

There is a variation of MQTT available — MQTT-SN (MQTT for Sensor Networks) used for non-TCP/IP networks, such as Zigbee (IEEE 80215.4 radio-based protocol) or other UDP / Bluetooth-based implementations.

There are 2 types of network entities available: ‘Message broker‘ and ‘Clients‘.

MQTT supports 3 Quality-of Service levels:

— 0: At most once – “Fire and forget” where you might or might not receive the message.
— 1: At least once – The message can be sent/delivered multiple times.
— 2: Exactly once – Safest and slowest service.

MQTT is based on a ‘publish’ / ‘subscribe-to-topic’ mechanism:

1. Publish/subscribe.

Publish/subscribe pattern

MQTT Message Broker consumes messages published by clients (on the left) using two-level ‘Topics‘ (such as, for instance, office temperature, office humidity, or indoor air quality). The clients on the rights side act as subscribers receiving any information published on a particular topic. Every time a message is published to the broker, the broker notifies all of the subscribers (Clients 3 and 4), and these clients get the sensor value.

2. Combined publishing/subscribing

Combined pub/sub

A client can be a subscriber and a publisher at the same time. So, in this example, Client 1 is publishing a brightness value and Client 3 has a subscription for that brightness value. Client 3 may decide that the brightness, for instance, of 1,500 might be too low, so it can publish a new message to the topic ‘office’ to let the light controller know that it should increase the brightness, while Client 2, for instance, the light controller with a subscription, may change the brightness level on receipt of the message.

3. Wildcards subs

+ = single-level, # = multi-level

Wildcards in MQTT are easy. So, you can have, for instance, ‘office + brightness’ topic,  where the ‘+’ sign can be substituted by any topic name. If the ‘+’ sign substitutes just one level in our topic, then it is a single-level wildcard. While the pound sign works for a multi-level wildcard.

MQTT features:

  • Clients can publish and subscribe to one or more topics.
  • One client can publish and subscribe at the same time.
  • Clients can subscribe using single/multi-level wildcards.
  • Clients can choose between three different QoS levels.

MQTT advanced features:

  • Messages can be retained by the broker for new subscribers. So, if a new client subscribes to a particular topic, then the publisher can mark its messages as ‘Retained‘ so that the new subscriber gets the last retained message.
  • Clients can provide a “last will and testament” that will be published by the broker when the client “dies”.

MQTT and Zabbix integration

MQTT setup

Integrating Zabbix into the multiple-client mix

Integrated structure:

1. Four sensors:

    • Server room.
    • Training room.
    • Sales room.
    • Support room.

2. Four different topics:

    • office
    • bielefeld (home town)
    • serverroom
    • trainingroom

3. Mosquitto MQTT Message Broker, which is one of the well-known message brokers.

So, the sensors are publishing the data to the Mosquitto Message Broker, where any MQTT-enabled device or system can pick those values up. In our case, it’s the home automation system, which subscribes to the Message Broker and has access to all of the values published by the sensors.

Thanks to MQTT support in Zabbix 5.2, Zabbix can now subscribe to the Mosquitto Message Broker and immediately get access to all of the sensors publishing their values to the broker.

As we can have multiple subscribers, multiple clients can subscribe to one topic on the Message Broker. So the home automation system can subscribe to the same values published to the Message Broker, as well as Zabbix.

Node-RED

Sooner or later, you will need Node-RED, which is a flow-based programming tool allowing you to subscribe to the broker and to publish messages to the broker acting as the client, as well as to work with the data.

Data Processing in Node-RED

This setup might be useful, if, for instance, some Zabbix trigger fires and passes the information over to the MQTT to publish the outcome of the trigger to the Message Broker, which will be then picked up by the home automation system.

Zabbix publishes data to the broker

You can have two different Zabbix instances subscribing to the same Message Broker acting just as two different clients.

Multiple Zabbix servers sharing the same data

Node-RED:

    • Construction kit for the Internet of Things and home automation.
    • Acts as MQTT client able to publish and subscribe.
    • Flow-based tool for visual programming based on Node.js.
    • Graphical web editor.
    • Supports input, processing, and output nodes.
    • Extensible with plugins and custom function nodes.

Different types of nodes can be connected in the workspace. For instance, the nodes subscribing to a topic and transforming the data, or the nodes writing the data to a log file.

Node-RED

We can get the data from the sensors as the raw JSON string containing 20-30 metrics in a payload, and as a parsed JSON object in the Node-RED Debug node with easy-to-read metrics, such as, for instance, temperature, humidity, WiFi quality, indoor air quality, etc.

Multiple metrics in one message

Splitting data

We have different options for data splitting available:

  • Split on MQTT level: use Node-RED to split metrics and then publish them in their own topics (it’s good to set up when other clients can handle only a single metric at a time).

Splitting data in Node-RED

 

  • Split on Zabbix level: set up an MQTT item as a master item and use Zabbix JSON preprocessing with corresponding dependent items. Its more efficient because Zabbix would need only one subscription.

We can get the data with the brand-new mqtt.get item in Zabbix 5.2:

— Requires Agent 2.
— Requires active checks. As every time a client publishes a message to the topic, we need the broker to push that data to us, we need active checks, so mqtt.get must listen to the subscription and get notified when the new data comes in.
— Broker URL default is localhost.
— User name and password are optional.
— Uses Eclipse Paho Go client library.

One Zabbix agent in active mode sending data to multiple hosts

For our setup with four sensors: in Sales Room, Server Room, Support Room, and Training Room, we need four hosts in Zabbix. Traditionally, you need four different agents to handle them as each agent running as active needs to configure its own hostname. However in our setup, we need just one agent installed and handling different hosts by subscribing to multiple topics.

This is possible because of the the new feature  running active agent checks from multiple hosts which is now available in Zabbix 5.2. All we need is:

—  to set up hosts in Zabbix (as usual),
—  to define our MQTT items (as usual),
—  to set up just one agent with all of the hostnames the agent should be responsible for (the new feature),
—  to set up the master item, which is our mqtt.get item,
—  to define several dependent items and preprocessing for each of the dependent items, and
—  to start preprocessing with JSONPath.

NOTE. Every time the master item gets an update, so do all of the dependent items in Zabbix.

Master item and dependent items

  • Combine both methods: let other clients subscribe to a single metric using their specific topic, but publish all sensor data for Zabbix in one topic.

NOTEData received and displayed on the dashboard is based on the MQTT item, the payload, and the MQTT messages received from the Message Broker.

Sensor data dashboard

Publishing data from Zabbix

Now you want to publish the outcome of a Zabbix trigger, so it can be consumed by other MQTT-enabled devices. Any MQTT subscriber, like Node-RED, should receive the alert. To do that, you need:

  • to define a new media type to send problems to the topic, that is, to pass the data over to the Message Broker:
  • to use the command-line tool for Mosquitto — mosquitto_pub allowing us to publish the message.
#!/bin/sh
mosquitto_pub -h yourbroker.io -m "$1" -t "zabbix/problems/$2"

  • to make sure that the data is sent to the broker in the right format. In this case, we use JSON as transport and define a JSON problem template and a JSON problem recovery template.

 

In Zabbix, you’ll see the problem, the actions, and the media type firing using the subscription, and in the Debug node of Note-RED, you’ll see that the data is received from Zabbix.

Zabbix problems  published via MQTT

This model with Node-RED can be used to create sophisticated setups. For instance, you can take the data from Zabbix, forward it by actions and media types, preprocess them in Node-RED, and transform the data in many different ways.

IoT devices and other subscribers can react to issues detected by Zabbix using Node-RED

NOTE. To try out the MQTT setup and new Zabbix features, you can use the Live broker available on IntelliTrend new GitHub account, getting data from Zabbix sensors every 10 minutes. You’ll also find templates,  access data, address of the broker, etc. —  everything you need to to get started.

Questions & Answers

Question. If the MQTT client gets overloaded due to high message frequency on subscribe topics, how will that affect Zabbix?

Answer. Here the broker might be overloaded or the Zabbix agent might not be able to follow up. If for the problem with the broker, the quality of service levels is defined in the MQTT protocol, more specifically — QoS level 2, which guarantees delivery. So if QoS2 is used as a QoS level, the messages won’t get lost but would be resent in case of failure.

Question. What else would you expect from the IoT side of Zabbix? What kind of protocols or things would get added? 

Answer. There’s always room for improvement. You can use third-party tools, custom scripts, or any tools to enhance Zabbix. I’m sure that using user script parameters was an excellent design decision. But the official support of MQTT is a quantum leap for Zabbix because it opens the door to most IoT infrastructures, as MQTT is the most important IoT protocol so far.

For instance, one of our customers is monitoring the infrastructure of electricity generators, production systems, etc. They use their own monitoring platform provided by vendors. The request was to integrate alerts or some metrics into Zabbix. The customer’s monitoring platform used MQTT protocol. So, all we had to do was to make their monitoring platform use external scripts and MQTT support.

Lift and shift your Zabbix to Oracle Cloud with MySQL database service

Post Syndicated from Vittorio Cioe original https://blog.zabbix.com/lift-and-shift-your-zabbix-to-oracle-cloud-with-mysql-database-service/12792/

 

If you are tired of administering the infrastructure on your own and would prefer to gain time to focus on real monitoring activities rather than costly platform upgrades, you can easily lift and shift your MySQL-based Zabbix installation stack to Oracle Cloud.

Contents

I. Moving to the Cloud (1:46)
II. Moving Zabbix to Oracle Cloud (2:41)

1. Planning migration (3:22)
2. Migrating Zabbix to Oracle Cloud (6:17)
3. Migrating the database to MySQL Database Service (8:47)

III. Questions & Answers (15:12)

Moving to the Cloud

The data is increasingly moving to the cloud — the consumer data followed by the enterprise data, as enterprises are always a bit slower in adopting technologies.

Data moving to the cloud

Oracle Cloud Infrastructure, OCI, is the 4th cloud provider in the Cloud Infrastructure Ranking of the Gartner Magic Quadrant based on ‘Completeness of Vision’ and ‘Ability to Execute’.

OCI is available in 26 regions and has 26 data centers across the world with 12 more planned.

26 Regions Live, 12+ Planned

24+ Industry and Regional Certifications

Moving Zabbix to Oracle Cloud

With Zabbix in the Oracle Cloud you can:

  1. get the latest updates on the technology stack, minimizing downtime and service windows.
  2. convert the time you spend managing your monitoring platform into the time you spend monitoring your platforms.
  3. leverage the most secure and cost-effective cloud platform in the market, including security information and security updates made available by OCI.

Planning migration

To plan effective migration of the on-premise Zabbix instance with clients, proxies, management server, interface, and database, we need to migrate the last three instance components. Basically, we need:

  • the server configuration;
  • on-premise network topology to understand what can communicate with the outside or what would eventually go over VPN, that is, the network topology of client and proxies; and
  • the database.

Migration requirements

We also need to set up the following in the OCI tenancy:

  • MySQL Database System,
  • Compute instance for the Zabbix Server,
  • storage for database and backup,
  • networking/load balancing.

The target architecture involves setting up the VPN from your data center to the Oracle cloud tenancy and deploying the load balancer, the Zabbix server in redundancy over availability domains, and the MySQL database in a separate subnet.

Required Components:
• Cloud Networking,
• Zabbix Cloud Image,
• MySQL Database Service,
• VPN Connection for client/proxies.

Oracle Cloud target architecture for Zabbix

You can also have a lighter setup, for instance, with proxies communicating over TLS connections over the Internet or communicating directly with the Zabbix Server in the Oracle Cloud, and the Zabbix server interfacing with the database. Here, you will need fewer elements: server, database, and VCN.

Oracle Cloud target architecture for Zabbix — a simpler solution

Migrating Zabbix to Oracle Cloud

Zabbix migration to the Oracle Cloud is straightforward.

1. Before you begin:

  • set up tenancy and compartments,
  • set up cloud networking — public and private VCN.

2. Zabbix deployment on the VM:

  • select one-click deployment or DIY — use the official Zabbix OCI Marketplace Image or deploy an OCI Compute Instance and install manually,
  • choose the desired Compute ‘shape’ during deployment.

3. Configuration:

  • start the instance,
  • edit the config file,
  • point to the database with the IP address, username, and password (to do that, you’ll need to open several ports in the cloud network via the GUI).

The OCI infrastructure allows for multiple choices. The Zabbix Server is lightweight software requiring resources. In the majority of cases, a powerful VM will be enough. Otherwise, you’ll have the Oracle Cloud available.

Compute services for any enterprise use case

In the Oracle Cloud you’ll have the bare metal option — the physical machines dedicated to a single customer, Kubernetes container engine, and a lot of fast storage possibilities, which end up being quite cheap.

Migrating the database to MySQL Database Service

MySQL Database Service is the managed offer for MySQL in Oracle Cloud, fully developed, managed, and supported by the MySQL team. It is secure and provides the latest features as it leverages the Oracle Cloud, which has been rated by various sources as one of the most secure cloud platforms.

In addition, the platform is built on the MySQL Enterprise Edition binaries, so it is fully compatible with the platform you might be using. Finally, it costs way less on a yearly basis than a full-blown on-premise MySQL Enterprise subscription.

MySQL Database Service — 100% developed, managed, and supported by the MySQL team

Considerations before migration

Before you begin:

  • check your MySQL 8.0 compatibility,
  • check your database size (to assess the time needed to migrate), and
  • plan a service window.

High-level migration plan

  1. Set up cloud networking.
  2. Set up your (on-premise) networking secure connection (to communicate with the cloud).
  3. Create MySQL Database Service DB System with storage.
  4. Move the data using MySQL Shell Dump & Load utility.

Creating MySQL DB system with just a few clicks

  • Create a customized configuration.
  • Start the wizard to create DB system.
  • Select Virtual Cloud Network (VCN).
  • Select subnet to place your MySQL endpoint.
  • Select MySQL configuration (or create customized instances for your workload).
  • The shape for the DB System (CPU and RAM) will be set automatically.
  • Select the size of the storage for data and backup.
  • Create a backup policy or accept the default.

Creating MySQL instances

You can use MySQL Shell Upgrade Checker Utility to check the compatibility with MySQL8.0.

util.checkForServerUpgrade()

Loading the data

To move the data, you can use the MySQL Shell Dump & Load utility, which is capable of multi-threading and is callable with the JavaScript methods from MySQL Shell.

So, you can dump on what can be a bastion machine, and load your instance to the cloud. It will take several minutes to load the database of several gigabytes, so it is necessary to plan the service maintenance window accordingly.

In addition, the utility is easy to use. You just need to connect to an instance and dump.

MySQL Shell Dump & Load

The operation is pretty straightforward and the migration time will depend on the size of the database.

Free trial

You can have a test drive of the MySQL Database Service with $300 in cloud credits, which you can spend in the Oracle Cloud on MySQL Database Service or other cloud services.

 

Questions & Answers

Question. Do you help with migrating the databases from older versions to MySQL 8.0?

Answer. Yes, this is the thing we normally do for our customers — providing guidance, though data migration is normally straightforward.

Question. Does the database size matter? How efficient MySQL Shell Dump is? What if my database is terabytes in size?

Answer. MySQL Shell Dump & Load utility is much more efficient than what MySQL Dump used to be. The database size still matters. In that case, it will require more time, still way less than it used to take

 

 

 

 

What’s new in Zabbix 5.2

Post Syndicated from Alexei Vladishev original https://blog.zabbix.com/whats-new-in-zabbix-5-2/12550/

Zabbix is a universal open-source enterprise-level monitoring solution, therefore Zabbix has all the enterprise-grade features included: SSO, distributed monitoring, Zabbix Insights, advanced security, no data storage limits, and much more. Zabbix 5.2 offers over 35 new features and functional improvements.

Contents

I.Introduction
II. New features and functional improvements

1. Synthetic monitoring
2. Keep secrets in the external vault
3. Zabbix insights
4. User roles
5. IoT Monitoring
6. Load balancing
7. User Timezones
8. Yaml for import/export
9. Template improvements
10. Discovery and cloud monitoring
11. Usability improvements
12. Preprocessing improvements
13. Other improvements

III. Questions & Answers

Introduction

Zabbix gives you freedom, as it offers:

  • no per-metric fees,
  • no license fees,
  • deployment anywhere, and
  • easy migration from on-premise to the cloud and vice versa.

Zabbix also offers business benefits for the companies that need centralized monitoring and collecting data all over their IT infrastructure and other sources.

  • Umbrella monitoring, as Zabbix is flexible to replace most of the monitoring solutions already in use.
  • Free and open-source solution with 24×7 vendor support worldwide.
  • Technical Support at fixed prices for unlimited monitoring regardless of the number of devices monitored and extremely low TCO.

Business benefits

New features and functional improvements

Zabbix 5.2 offers over 35 new features and functional improvements.

Synthetic monitoring

1. Zabbix 5.2 supports complex multi-step scripted data collection, advanced availability checks, and complex interaction with different HTTP APIs.

Multi-step data collection

  • Multiple steps to get data.

Multi-step data collection is needed is, for instance, you need to authenticate and then to retrieve data from different APIs.

Authentification and retrieving data from different API.

  • 2. Check if the whole service works: Zabbix API.

Advanced availability checks and APIs

  • Calculate the sum of unknown parts.

For a list of customers retrieved from an API with URLs behind each customer, Zabbix allows for checking the availability of all URLs.

2. New item-type script

Now the process of data collection can be scripted. It’s no longer a one-step process, so we can take advantage of cycles, event statements, and all the power of JavaScript to retrieve the data.

New item-type scripts

Keep secrets in the external vault

The ability to store secrets in the vault is valuable for using sensitive information, for instance, in financial, military, or government industries, as:

  • all sensitive information is kept outside of Zabbix in a secure place: HashiCorp Vault,
  • no secret data is stored in Zabbix DB, and
  • all sensitive data, such as passwords, API tokens, user names, etc., shall be secured.

So, in Zabbix 5.2 a new user macro type is introduced — Vault secret. In Zabbix 5.0, the secret text was introduced, which is stored in the Zabbix DB, but is never exposed to end-users. With the Vault secret macro, the data is stored externally.

Vault secret macro

Now security measures in Zabbix comply with the best security standards possible:

  • all communications with Zabbix Agent or Zabbix Proxy are encrypted using HTTPS, TLS, or PSK;
  • Agent key restrictions can be used on the Zabbix Agent side;
  • communication with the Zabbix Web Interface is encrypted using HTTPS; and
  • integration with HashCorp Vault is now possible to keep secrets externally.

Security enhancements

NOTE. The one-day security training course is now held by Zabbix with no prerequisites and simple signup.

  • Recommended for experienced Zabbix users.
  • Does not require existing Zabbix certification.
  • Will cover security options on an expert level.
  • Secret macros and Vault.
  • Securing connections using PSKor certificates.
  • Restricting Agent keys.
  • Granular user permissions

 

 

 

Zabbix insights

  • Ability to analyze long term data efficiently using new trigger functions.
  • Zabbix will provide you with information about anomalies.
  • More value out of Zabbix trend data, which is kept longer.

This new feature allows Zabbix to generate alerts, for instance, “Average number of transactions increased by 24% in September”.

Zabbix 5.2 new functions

  • The new functions allow for specifying, which trend data is needed, and then for comparing this data with the data for another period.
trendavg(period, period_shift)
trendcount(period, period_shift)
trenddelta(period, period_shift)
trendmax(period, period_shift)
trendmin(period, period_shift)
trendmin(period, period_shift)
  • Trends tables instead of history. The time shift function is already available in Zabbix, but it has significant limitations, as it works only with history tables, that is, involves heavy processing, and doesn’t allow for specifying an absolute period of time.

  • Use the Gregorian calendar for period and period_shift.

— h (hour), d (day), w (week), M (month), and y (year).

  • Calculate upon the end of a period.
  • Customized event name — a new field in the trigger definition, which is:

— optional, can use Trigger Name instead,
— displaying problem with a context,
— supports a new macro {? … } (“Expression macro”):

      • fmtnum(digits)

— applicable to ITEM.VALUE, ITEM.LASTVALUE and expression macros;
fmtnum(2) gives 14.85 instead of 14.8512345.

      • fmttime(format, time_shift)

— applicable to {TIME};
— uses strftime format codes;
{TIME}.fmttime(“%B,%Y”) gives October 2020.

For instance, to detect abnormal traffic, we can define the expression to compare traffic for different periods. If the difference exceeds the abnormality factor defined by the user macro, Zabbix will generate the event defined by the user.

Triggers

Then Zabbix will generate the following message:

Problems

Use cases
  • Trend functions can be used to detect abnormal behavior of IT metrics and non-IT KPIs.
  • Real-world applications:
    — business performance,
    — sales and marketing,
    — warehousing,
    — human resources,
    — customer support.

User roles

Granular control of user permissions

  • Customer portal, read-only users.
  • Different parts of UI can be made accessible for different user roles.
  • Control what user operations are accessible: maintenance, editing of dashboards, etc.
  • Fine-grained control access to API and its methods for extra security.

In Zabbix 5.2, the ability to define user roles is introduced. It is possible to define as many user roles, as you need. Here, it’s necessary to specify:

  • User type (User, Admin, Super Admin),
  • Access to UI elements (what the user can do),
  • Access to API (if enabled, we may filter by API methods),
  • Access to actions (define, what user actions are available to different users).

User roles defined

IoT Monitoring

Zabbix is a universal solution used to support not only IT infrastructure, so the capacity to monitor factory equipment or sensors is really important. New Zabbix 5.2 now offers out-of-the-box support of Modbus and MQTT protocols — the most important IoT protocols. Now it is possible to monitor sensors and hardware equipment, and integration with built-in management systems, factory equipment, and IoT gateways is available without using external scripts.

Modbus

Modbus has become a de facto standard communication protocol — a commonly available means of connecting industrial electronic devices working on Agent and Agent 2 TCP or serial connections.

where:

modbus.get — new item key,
endpoint — endpoint defined as protocol://connection_string,
slave id — slave ID,
function — Modbus function,
address — address of first registry, coil or input,
count — number of records to read,
type — type of data,
endianness — endianness configuration offset – number of registers, starting from ‘address’, the results of which will be discarded.

modbus.get is made to get information out of Modbus and returns JSON:

modbus.get[“tcp://192.168.6.1:511”]
Modbus.get[“rtu://COM1:9600:8n”]
MQTT
  • MQTT is a standard messaging protocol for the Internet of Things (IoT) among others.
  • Native solution for monitoring messages published by MQTT brokers.
  • Supported by Agent 2 Active Check only.

broker_url — MQTT broker URL (if empty, localhost with port 1883 is used),
topic — MQTT topic (mandatory). Wildcards (+,#) are supported,
username, password — authentication credentials (if required).

  • MQTT subscribes to a specific topic or topics (with wildcards) of the provided broker and
    waits for publications.
mqtt.get["tcp://host:1883","path/to/topic"]
mqtt.get["tcp://host:1883","path/to/topic"]

Load balancing

Starting from Zabbix 5.2, it has become easy to make horizontal scaling for Zabbix UI and API components. You just need to set up HAProxy or another load balancing solution, then some cluster nodes running as containers on physical or virtual machines or in the cloud, and you’ll get redundancy, high availability, and load balancing out of the box for Zabbix UI and API components.

Horizontal scaling for Zabbix UI and API

User Timezones

Zabbix 5.2 supports user timezones for each user. This is a feature appreciated by larger companies with users connecting to Zabbix UI from different countries or continents.

User timezones for each user

YAML for import/export

For import and export operations in Zabbix YAML is now used by default, though JSON and XML are still supported.

YAML is more user-friendly and easy to edit manually, while JSON and XML are excessive in the use of special characters. So if you keep your templates in a repository, you can modify them using a text editor. All official templates in Zabbix have been already converted to YAML.

YAML for import/export

Template improvements

  • Simpler template names, which are also easier to search for.

  • Templated screens converted to template dashboards. When modifying dashboards now you are dealing with dashboard widgets, not screen elements anymore.

  • See all hosts linked to a specific template.

  • The number of templates in System information.

Discovery and cloud monitoring

  • Host interfaces can be discovered from LLD. Now it is possible to define ways to discover host interfaces when a host prototype is created. This feature is especially useful to discover cloud resources.

  • Hosts without interfaces. We can create virtual hosts or hosts with no interface for service checks, for instance.
  • Tags on host prototypes from any discovery macro. Tags play an increasingly important role in Zabbix, and now in addition to tags on the template level, on the host level, and on the trigger level, it is possible to define tags on the host prototype level as well.

Usability improvements

  • Save filters. This feature is implemented to monitor problems and hosts. In Zabbix 5.2, you can basically name filters. This functionality is similar to that used in modern browsers, such as Firefox or Safari. We have different tabs, and every tab displays a number of problems in real time, and you can easily switch from one filter to another.

Filter tabs

  • Show clearly that any tab in Zabbix UI contains a non-empty list, for instance, the number of preprocessing rules. This functionality is implemented for all tabs in Zabbix UI.

Number of lists displayed in the tabs

  • The default language can now be defined for the system.

Defining system default language

  • Essential configuration parameters moved from defines.inc.php to Zabbix UI, which allows for finer tuning.

Finer tuning

  • SNMP settings in the test item window,  for instance, before adding an item to a template.

Testing SNMP parameters

  • Filters and additional details in the list of dashboards.

Additional information in dashboards

Preprocessing improvements

  • Macros in JavaScript preprocessing (also backported to 5.0).
  • Check for not supported value and override items unsupported for any reason, which is useful for advanced availability checks: any problem -> service is down.

Other improvements

  • In larger environments, there may be performance issues, and understanding what’s happening inside Zabbix is vitally important. Now it is possible to specify diagnostic information to be retrieved from Zabbix. We can also retrieve this information from the Zabbix API.

Retrieving diagnostic information from the value cache log file

  • UI protected from checking the existence of a user.
  • Simpler schedule for unsupported items.
  • Ability to mass-update item Timeout.
  • Ability to retrieve HTTP response headers in Webhooks.
  • Ability to specify the default search path for user parameters.
  • Max length of user macro values increased to 2048 characters.
  • Active Agent can work as multiple hosts (Hostname=host1,host2,host3), which might be useful if you run different services on one host and need to split them.
  • Official support of Docker images.
  • Eventlog-related macros in operational data.
  • Support of user macros in the item description.

Out of the box monitoring and alerting

We have increased the number of integrations supported in Zabbix out-of-the-box and the number of officially supported monitoring templates and plugins for Zabbix Agent 2.

  • Ticketing.

  • Alerting.

  • Monitoring.

Deployment

You can deploy Zabbix anywhere: on-premise or in the cloud.

Deploy on-premise

Deploy in the cloud

How to upgrade

Procedure for upgrading from Zabbix 5.0 is as for any other Zabbix release:

  • Backup DB.
  • Upgrade packages (Zabbix Server, Frontend)
  • Restart zabbix_server.
  • Watch the log file, Zabbix will start DB schema upgrade automatically.
  • Upgrade all proxies.
  • Update agents (optional).

Otherwise, contact Zabbix engineers, order an upgrade to the new release, and enjoy the new features effortlessly.

Questions & Answers

Question. Does Zabbix plan to support other scripting languages?

Answer. No, we don’t have such plans. We analyzed other languages but selected JavaScript as Zabbix embedded language. However, now you can use any scripting language in Zabbix in external scripts, including PowerShell, Python, etc.

Question. Does Zabbix plan to support other vaults besides HashCorp?

Answer. We might support other solutions. This new Zabbix functionality allows for implementing other vaults. If you need some other vault to be supported, you need to register the respective Zabbix feature request.

Question. Does Zabbix plan to improve the existing graphs and provide official Grafana integration?

Answer. We do plan to provide more advanced visualization options for dashboards. Now we are merging the screens and dashboard functionality, and we plan to release new widgets for more advanced visualization in Zabbix 5.4.

The existing Zabbix plugin for Grafana works smoothly, and we don’t plan to introduce another solution.

Question. Does Zabbix plan to support another database backend, for instance, time-series databases?

Answer. According to Zabbix Roadmap, in Zabbix 5.4 we plan to introduce generic API allowing to connect to any storage for time-series data, that is, to create some official connectors to storage solutions.

Question. Does Zabbix plan to natively support integration with LDAP? At the moment Zabbix provides LDAP support, but we still have to manually create users and so on. Does Zabbix plan to automate it in some way?

Answer. We created this functionality a couple of years ago, but we designed it in a complex way and decided not to implement it yet. It’s not on our shortlist, but we plan to implement it, as it is one of the top-voted features.

Question. Can Agent secrets be stored in the vault?

Answer. At this moment we don’t support this feature. In a highly-distributed environment, where agents are distributed all across your IT infrastructure, you’ll have to maintain a connection between Zabbix Agents and the vault. Still, if you feel the feature should be in Zabbix, feel free to register the respective Zabbix feature request.

Question. Does Zabbix have a Kubernetes operator?

Answer. We don’t have the Kubernetes operator officially supported yet, but there are a few operators available from our community.

Question. Do we plan to improve our report functionality?

Answer. Absolutely. This is the primary focus of Zabbix 5.4 and Zabbix 6.0. We are exploring two directions: improving the widgets to enrich visualization in Zabbix and supporting schedule report generation so that Zabbix would generate PDF reports and send them out on a regular basis.

Question. Do we plan to enable changing server configuration parameters without the need to restart the server?

Answer. That depends on the configuration parameters. It can be implemented for some configuration parameters. What is really needed is the ability to change parameters related to performance in real-time, the ability to change the number of pollers, trappers, escalators, etc. I think this functionality will be implemented soon.

Question. Can we create a Zabbix instance as a code via JSON, XML, or some other way?

Answer. We are moving in this direction. For instance, the transition to YAML format is a step in this way. So, you will be able to keep your templates in the git repository. The missing step is versioning for templates in order to manage templates, as well as the ability to export the whole Zabbix configuration to YAML format. Versioning is on the roadmap to Zabbix 5.4.

Question. Do we plan to support metric gathering from Spring Actuator and Spring Boot? As at the moment, Prometheus is to be used to gather metrics.

Answer. If Prometheus can be used to gather metrics from these systems, Zabbix can do it as well as Zabbix support data collection from Prometheus out-of-the-box. 

Question. How can someone become a partner of Zabbix?

Answer. The best way to become a partner is to contact Zabbix by email at [email protected]

Question. How does Zabbix see interaction with Grafana? As that of competitors or friendly entities?

Answer. Grafana focuses on the visualization of data coming from different sources. Though Grafana provides some monitoring options, I see Grafana as an add-on to Zabbix. If you need a better visualization from Zabbix or Zabbix doesn’t deliver the visualization you expect, you are free to use Grafana.

Zabbix Summit Online 2020: Remote Experience of Sharing Knowledge and Being Together

Post Syndicated from Jekaterina Petruhina original https://blog.zabbix.com/zabbix-summit-online-2020-remote-experience-of-sharing-knowledge-and-being-together/12526/

Zabbix Summit 2020 was supposed to be the greatest Zabbix event of the decade – we planned to celebrate the 10th anniversary Zabbix Summit. But the very different 2020 circumstances intervened, and we had to adjust to the new reality and shift our focus. It so happened that in 2020 we held not the tenth anniversary but the first online Zabbix summit.


Available for everyone

When it was evident that the on-site event is not an option, we made a decision – the event should be available for everyone, and we made it free of charge. The other task was to manage the timing so that Summit would be available for attendees from all over the globe. We managed it efficiently by making the event as long as it was necessary to be convenient for users from Japan and China, Europe, and the USA and Latin American Region. Yes, it was quite a long day for Zabbix Team. However, we achieved what we were aiming for – about 8000 Zabbix enthusiasts from worldwide joined the Zabbix Summit live stream. This year, it became available to have extensive speeches from all regions, because the traveling issue was solved.

We made the most of the focus on the recently released Zabbix 5.2. And of course, we left enough place for use cases and professional tips as well. If, for some reason, you couldn’t join us on October 30, you are always welcome to watch the speeches in the record.

 

Traditionally every Zabbix Summit delivers an option to attend hands-on workshops. Due to the online format, it was possible to run more workshop sessions than in previous years, and attendees could join as many sessions as they wanted. Moreover, the workshops have been recorded and now are available on the Zabbix website.

Summit fun

Every Zabbix Summit is all about networking and fun. Let’s be honest, this unofficial part means a lot for the community along with the agenda. Unfortunately, we couldn’t meet in person this time and have fun all together at parties. Still, the community chat in Telegram made it clear – there are no boundaries for you guys to keep in touch, discuss ideas, and communicate. You made the networking part exist this year, and we are delighted and grateful for seeing such enthusiasm, activity, and interest in Zabbix. We provided the Summit attendees opportunity to communicate with the Zabbix Sales and Technical team, get acquainted with the event’s sponsors, and ask the questions via special Zoom rooms, and it worked well. Even though there were hundreds and hundreds of thousands of kilometers between the visitors of the event, there was a feeling that it happens here and now – with an audience full of interested people looking for opportunities to learn new things and help others.

What about next year?

Well, we think positive, however, stay realistic. Thus Zabbix Summit 2021 will also be held online.

If you care for better further events organized by Zabbix, we encourage you to fill out this post-Summit survey. It will help us understand what we have to improve to make Zabbix Summit Online 2021 even more generous. 

PS: Take a break and look at some behind the scenes photos – how the Zabbix Summit 2020 looked from the inside.

Security and Human Behavior (SHB) 2020

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/06/security_and_hu_9.html

Today is the second day of the thirteenth Workshop on Security and Human Behavior. It’s being hosted by the University of Cambridge, which in today’s world means we’re all meeting on Zoom.

SHB is a small, annual, invitational workshop of people studying various aspects of the human side of security, organized each year by Alessandro Acquisti, Ross Anderson, and myself. The forty or so attendees include psychologists, economists, computer security researchers, sociologists, political scientists, criminologists, neuroscientists, designers, lawyers, philosophers, anthropologists, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.

Our goal is always to maximize discussion and interaction. We do that by putting everyone on panels, and limiting talks to six to eight minutes, with the rest of the time for open discussion. We’ve done pretty well translating this format to video chat, including using the random breakout feature to put people into small groups.

I invariably find this to be the most intellectually stimulating two days of my professional year. It influences my thinking in many different, and sometimes surprising, ways.

This year’s schedule is here. This page lists the participants and includes links to some of their work. As he does every year, Ross Anderson is liveblogging the talks.

Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, eleventh, and twelfth SHB workshops. Follow those links to find summaries, papers, and occasionally audio recordings of the various workshops. Ross also maintains a good webpage of psychology and security resources.

Speakers Censored at AISA Conference in Melbourne

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/10/speakers_censor.html

Two speakers were censored at the Australian Information Security Association’s annual conference this week in Melbourne. Thomas Drake, former NSA employee and whistleblower, was scheduled to give a talk on the golden age of surveillance, both government and corporate. Suelette Dreyfus, lecturer at the University of Melbourne, was scheduled to give a talk on her work — funded by the EU government — on anonymous whistleblowing technologies like SecureDrop and how they reduce corruption in countries where that is a problem.

Both were put on the program months ago. But just before the event, the Australian government’s ACSC (the Australian Cyber Security Centre) demanded they both be removed from the program.

It’s really kind of stupid. Australia has been benefiting a lot from whistleblowers in recent years — exposing corruption and bad behavior on the part of the government — and the government doesn’t like it. It’s cracking down on the whistleblowers and reporters who write their stories. My guess is that someone high up in ACSC saw the word “whistleblower” in the descriptions of those two speakers and talks and panicked.

You can read details of their talks, including abstracts and slides, here. Of course, now everyone is writing about the story. The two censored speakers spent a lot of the day yesterday on the phone with reporters, and they have a bunch of TV and radio interviews today.

I am at this conference, speaking on Wednesday morning (today in Australia, as I write this). ACSC used to have its own government cybersecurity conference. This is the first year it combined with AISA. I hope it’s the last. And that AISA invites the two speakers back next year to give their censored talks.

EDITED TO ADD (10/9): More on the censored talks, and my comments from the stage at the conference.

Slashdot thread.

Security and Human Behavior (SHB) 2019

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/security_and_hu_8.html

Today is the second day of the twelfth Workshop on Security and Human Behavior, which I am hosting at Harvard University.

SHB is a small, annual, invitational workshop of people studying various aspects of the human side of security, organized each year by Alessandro Acquisti, Ross Anderson, and myself. The 50 or so people in the room include psychologists, economists, computer security researchers, sociologists, political scientists, criminologists, neuroscientists, designers, lawyers, philosophers, anthropologists, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.

The goal is to maximize discussion and interaction. We do that by putting everyone on panels, and limiting talks to 7-10 minutes. The rest of the time is left to open discussion. Four hour-and-a-half panels per day over two days equals eight panels; six people per panel means that 48 people get to speak. We also have lunches, dinners, and receptions — all designed so people from different disciplines talk to each other.

I invariably find this to be the most intellectually stimulating two days of my professional year. It influences my thinking in many different, and sometimes surprising, ways.

This year’s program is here. This page lists the participants and includes links to some of their work. As he does every year, Ross Anderson is liveblogging the talks — remotely, because he was denied a visa earlier this year.

Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, and eleventh SHB workshops. Follow those links to find summaries, papers, and occasionally audio recordings of the various workshops. Ross also maintains a good webpage of psychology and security resources.