All posts by Arturs Lontons

Handy Tips #16: Automating Zabbix host deployment with autoregistration

2021-12-16 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-16-automating-zabbix-host-deployment-with-autoregistration/18232/

Configure Zabbix agent autoregistration to automatically deploy and start monitoring Zabbix agent hosts.

As your IT infrastructure scales up, there comes a point where manual host creation is simply not feasible. At this point, the preferable approach is to find a way to automate host deployment.

Deploy and manage hosts automatically with Zabbix active agent autoregistration:

Automatically deploy and start monitoring hosts
Assign different templates depending on hostname and host metadata

Receive notifications whenever a new host is deployed
Automatically make changes or perform offboarding of hosts

Check out the video to learn how to configure Zabbix agent autoregistration.

How to configure Zabbix agent autoregistration:

Navigate to Configuration → Actions → Autoregistration actions
Click the Create action button
Define action conditions
Navigate to the Operations tab and define the action operations
Connect to your host and open the Zabbix agent configuration file
Populate the Hostname, HostMetadata and HostInterface fields
Restart the Zabbix agent
Navigate to Configuration → Hosts and make sure that the host has been created
Navigate to Monitoring → Latest Data and make sure that the host is receiving metrics

Tips and best practices:

Host name and host metadata values are case-sensitive
Agent restart is required to apply host name and host metadata changes
HostnameItem can be used to populate the host name with a Zabbix agent item value
HostMetadataItem can be used to populate the host metadata with a Zabbix agent item value

Gaining new insights with Business service monitoring by Aleksandrs Petrovs-Gavrilovs / Zabbix Summit Online 2021

2021-12-14 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/gaining-new-insights-with-business-service-monitoring-by-aleksandrs-petrovs-gavrilovs-zabbix-summit-online-2021/17973/

Zabbix 6.0 LTS comes with a complete redesign of the service monitoring. From improved business service scalability to advanced service status calculation logic and alerting. Let’s take a look at the Business Service monitoring feature and how you can use it to ensure full transparency for your business services.

The full recording of the speech is available on the official Zabbix Youtube channel.

Business services can be quite complex. They tend to consist of many different moving parts with redundancy and failover mechanism in place, all of which need to be taken into consideration when we wish to analyze the current status of our services.

BSM Checklist

Let’s take a look at what needs to be done so we can successfully define and monitor our business service:

First, we have to define what exactly is our business service and what components does it consist of?
We need to understand what are our expectations when it comes to service uptime. When should the service be up and running? What are the acceptable downtimes? Should it run 24/7/365 or maybe it’s a service that is critical only during our working hours?
Once we know what needs to be monitored, we need to make sure that we are collecting the data that reflects the status of different service components.
Finally – we have to find a suitable tool to track and measure our service.

Define your business

Let’s take a look at how a business may look like. As I mentioned before – business services can consist of many different components. Let’s take a look at an example of how business services may look like:

The tree structure here represents our Business services. We can see that we have classified the services into two branches – Internal services and User services. The User services consist of components such as Websites, Helpdesk services, Phones. These general services are based on lower-level components such as the actual physical phones for the phone service, underlying software for the Website and Helpdesk services, and so on.

This can make things quite complicated since usually, organizations will have many more components to take care of. That’s why, let’s see how we can simplify this tree and define our services in a more simple manner, like the service tree below:

Now we are left with only 3 levels for our services. Let’s take a look at how we can move this to Zabbix:

Here we can see a high-level view of our services. Once again we have our Internal services and User services. These here high-level services consist of child services and define what these components consist of and what their SLAs should be. We can also define tags to provide additional details to our services – which customer uses the service, the type of service, maybe even the location that the service is used in – this part is completely up to your imagination.

Once you have defined the services, their respective components and have linked them to the problems by using tags, you will finally be able to see the full picture. Zabbix will display not only the status of the service but also the root cause of the problem. This way we can provide service status information not only on the service owner level but also provide information that your technical staff can use to fix the issue.

Configuring SLAs

Configuring Business Service monitoring can be done from the Monitoring – Services section. In Zabbix 6.0 LTS you are not required to start defining the service tree from the root service. Now you can define your own root level services. To create a service, all we have to do is switch to the Edit mode by clicking the Edit button in the upper right corner of the services screen and click the Create service button right next to it. We have also made some additional changes to the service section UI/UX. Now you also have multiple fast edit buttons next to each service. You can use them to Add a child service, edit an existing service, or delete an existing service.

Next, let’s take a look at the actual service creation steps.

We need to provide a name for our service
If the service is not a top-level service you have to select a parent service
Define problem tags. Problems tagged with the matching tags will affect the service status
Define the status calculation rule

Major improvements have been made to status calculation rules. We still support the old logic of the Use the most critical of child services / Most critical if all children have problems / Set status to ok, but there are also many advanced service status calculation rules.

Now we have the ability to select a specific status (Warning, Average, High, and so on) for our service in case of a problem
Select the number of children, More than/Less than N children, Percentage of children that should be affected for the parent service status change to take place
Define weights for child services and perform status changes based on the weight of the affected child services

Child services can also apply different propagation rules for the parent service

Child services can Increase or decrease the parent status service status by N severities, ignore the child service, apply a fixed status or apply the status depending on the problem severity

For our example let’s use an HA cluster use case. HA clusters consist of multiple nodes – for our example, we will use 3 nodes.

First, we define that the HA cluster consists of 3 nodes – 3 child services.
Each node will have equal weight – 1
On the parent service, we will define multiple status rules
- If the weight of the child services is 1 (1 node is down) – the parent service will change its status to Warning
- If the weight of the child services is 2 (2 nodes are down) – the parent service will change its status to Average
- If the weight of the child services is 3 (all nodes are down) – the parent service will change its status to Disaster

In the above image, we can see how the corresponding status change will look like in the Services section. Note that we can also see the root cause of the parent service status change in the Root cause column.

We also have the ability to define the acceptable SLAs as well as SLA calculation uptime and downtime periods for our services. We have the option to define scheduled uptimes and downtimes, during which SLA should or shouldn’t be calculated (Such as weekends, for example), as well as one-time downtimes for one-time maintenance purposes.

Services can utilize tags to provide additional information about your services, such as the service type, service customer, service location, and more. On top of that, tags can also be used in the Service action condition logic, so you can define granular alerting logic for your service status changes.

The Child services tab allows you to quickly look at the related child services, their problem tags, and status calculation rules.

Child services can also be crosslinked between multiple parent services. This means that you don’t have to duplicate and recreate child services if they are used as a component of multiple parent services.

Track, solve and measure

Once we have configured our service, what remains is keeping track of our service statuses, SLAs and staying notified about service status changes and their root cause.

For this purpose, it is vital to secure access to our services. This is especially critical for MSPs, which may have multiple customers and each customer should have access only to the services related to that particular customer. To that end, the Roles section has also received an update related to the Service permissions. We can now define Read-Write and Read access to either specific services or services marked with a particular tag.

The Root cause section displays the root cause problems that affected the service status change. You will be able to click on the root cause problem and open it in the Problems section for further analysis of what caused your services to change their status and which host has been affected by it.

Previously I mentioned alerting on service status change, so let’s dig deeper into that. In Zabbix 6.0 LTS we have added a new type of action – Service actions. Zabbix can now react to service status changes and notify you when a service changes its status. The Service action conditions can analyze if a status has been changed on a particular service, a service that matches or contains a specific string in its name, tag, or tag value. If the conditions are true, Zabbix can send out an email, deliver a phone call or an SMS, create a ticket in your helpdesk system or perform any other alerting and notification workflow.

Many other BSM features are coming as we continue the development of Zabbix 6.0 LTS:

SLA graphical visualizations with support for over 100k services
Daily, Monthly, Weekly SLA reports
New service tree and SLA reporting widgets available from the dashboard
Service tree import and export
Impact analysis – see which service affects other related services in what way.

Questions

Q: Will the existing services be migrated to Zabbix 6.0 LTS?
A: The existing services will be migrated to Zabbix 6.0 LTS during the upgrade. All of the configuration for the existing services will stay intact after the migration.

Q: Does host maintenance suppress service calculation in Zabbix 6.0 LTS?
A: Host maintenance will not affect the service calculation. If you wish to define maintenance periods for your services – use scheduled or one-time downtime options when configuring an individual service.

Q: How are the Fixed status and Ignore this service calculation rules going to work?
A: Fixed status services will not change their status no matter what happens to the child services – the service status will remain fixed. As for Ignore this service – the service status change will be ignored and will not affect the parent services.

Zabbix NOT AFFECTED by the Log4j exploit

2021-12-13 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/zabbix-not-affected-by-the-log4j-exploit/17873/

A newly revealed vulnerability impacting Apache Log4j 2 versions 2.0 to 2.14.1 was disclosed on GitHub on 9 December 2021 and registered as CVE-2021-44228 with the highest severity rating. Log4j is an open-source, Java-based logging utility widely used by enterprise applications and cloud services. By utilizing this vulnerability, a remote attacker could take control of the affected system.

Zabbix is aware of this vulnerability, has completed verification, and can conclude that the only product where we use Java is Zabbix Java Gateway, which does not utilize the log4j library, thereby is not impacted by this vulnerability.

For customers, who use the log4j library with other Java applications, here are some proactive measures, which they can take to reduce the risk posed by CVE-2021-44228:

Upgrade to Apache log4j-2.1.50.rc2, as all prior 2.x versions are vulnerable.
For Log4j version 2.10.0 or later, block JNDI from making requests to untrusted servers by setting the configuration value log4j2.formatMsgNoLookups to “TRUE” to prevent LDAP and other queries.
Default both com.sun.jndi.rmi.object.trustURLCodebase and com.sun.jndi.cosnaming.object.trustURLCodebase to “FALSE” to prevent Remote Code Execution attacks in Java 8u121.

What’s new in Zabbix 6.0 LTS by Artūrs Lontons / Zabbix Summit Online 2021

2021-12-10 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/whats-new-in-zabbix-6-0-lts-by-arturs-lontons-zabbix-summit-online-2021/17761/

Zabbix 6.0 LTS comes packed with many new enterprise-level features and improvements. Join Artūrs Lontons and take a look at some of the major features that will be available with the release of Zabbix 6.0 LTS.

The full recording of the speech is available on the official Zabbix Youtube channel.

If we look at the Zabbix roadmap and Zabbix 6.0 LTS release in particular, we can see that one of the main focuses of Zabbix development is releasing features that solve many complex enterprise-grade problems and use cases. Zabbix 6.0 LTS aims to:

Solve enterprise-level security and redundancy requirements
Improve performance for large Zabbix instances
Provide additional value to different types of Zabbix users – DevOPS and ITOps teams, Business process owner, Managers
Continue to extend Zabbix monitoring and data collection capabilities
Provide continued delivery of official integrations with 3rd party systems

Let’s take a look at the specific Zabbix 6.0 LTS features that can guide us towards achieving these goals.

Zabbix server High Availability cluster

With the release of Zabbix 6.0 LTS, Zabbix administrators will now have the ability to deploy Zabbix server HA cluster out-of-the-box. No additional tools are required to achieve this.

Zabbix server HA cluster supports an unlimited number of Zabbix server nodes. All nodes will use the same database backend – this is where the status of all nodes will be stored in the ha_node table. Nodes will report their status every 5 seconds by updating the corresponding record in the ha_node table.

To enable High availability, you will first have to define a new parameter in the Zabbix server configuration file: HANodeName

Empty by default
This parameter should contain an arbitrary name of the HA node
Providing value to this parameter will enable Zabbix server cluster mode

Standby nodes monitor the last access time of the active node from the ha_node table.

If the difference between last access time and current time reaches the failover delay, the cluster fails over to the standby node
Failover operation is logged in the Zabbix server log

It is possible to define a custom failover delay – a time window after which an unreachable active node is considered lost and failover to one of the standby nodes takes place.

As for the Zabbix proxies, the Server parameter in the Zabbix proxy configuration file now supports multiple addresses separated by a semicolon. The proxy will attempt to connect to each of the nodes until it succeeds.

Other HA cluster related features:

New command-line options to check HA cluster status
hanode.get API method to obtain the list of HA nodes
The new internal check provides LLD information to discover Zabbix server HA nodes
HA Failover event logged in the Zabbix Audit log
Zabbix Frontend will automatically switch to the active Zabbix server node

You can find a more detailed look at the Zabbix Server HA cluster feature in the Zabbix Summit Online 2021 speech dedicated to the topic.

Business service monitoring

The Services section has received a complete redesign in Zabbix 6.0 LTS. Business Service Monitoring (BSM) enables Zabbix administrators to define services of varying complexity and monitor their status.

BSM provides added value in a multitude of use cases, where we wish to define and monitor services based on:

Server clusters
Services that utilize load balancing
Services that consist of a complex IT stack
Systems with redundant components in place
And more

Business Service monitoring has been designed with scalability in mind. Zabbix is capable of monitoring over 100k services on a single Zabbix instance.

For our Business Service example, we used a website, which depends on multiple components such as the network connection, DB backend, Application server, and more. We can see that the service status calculation is done by utilizing tags and deciding if the existing problems will affect the service based on the problem tags.

In Zabbix 6.0 LTS there are many ways how service status calculations can be performed. In case of a problem, the service state can be changed to:

The most critical problem severity, based on the child service problem severities
The most critical problem severity, based on the child service problem severities, only if all child services are in a problem state
The service is set to constantly be in an OK state

Changing the service status to a specific problem severity if:

At least N or N% of child services have a specific status
Define service weights and calculate the service status based on the service weights

There are many other additional features, all of which are covered in our Zabbix Summit Online 2021 speech dedicated to Business Service monitoring:

Ability to define permissions on specific services
SLA monitoring
Business Service root cause analysis
Receive alerts and react on Business Service status change
Define Business Service permissions for multi-tenant environments

New Audit log schema

The existing audit log has been redesigned from scratch and now supports detailed logging for both Zabbix server and Zabbix frontend operations:

Zabbix 6.0 LTS introduces a new database structure for the Audit log
Collision resistant IDs (CUID) will be used for ID generation to prevent audit log row locks
Audit log records will be added in bulk SQL requests
Introducing Recordset ID column. This will help users recognize which changes have been made in a particular operation

The goal of the Zabbix 6.0 LTS audit log redesign is to provide reliable and detailed audit logging while minimizing the potential performance impact on large Zabbix instances:

Detailed logging of both Zabbix frontend and Zabbix server records
Designed with minimal performance impact in mind
Accessible via Zabbix API

Implementing the new audit log schema is an ongoing effort – further improvements will be done throughout the Zabbix update life cycle.

Machine learning

New trend functions have been added which utilize machine learning to perform anomaly detection and baseline monitoring:

New trend function – trendstl, allows you to detect anomalous metric behavior
New trend function – baselinewma, returns baseline by averaging data periods in seasons
New trend function – baselinedev, returns the number of standard deviations

An in-depth look into Machine learning in Zabbix 6.0 LTS is covered in our Zabbix Summit Online 2021 speech dedicated to machine learning, anomaly detection, and baseline monitoring.

New ways to visualize your data

Collecting and processing metrics is just a part of the monitoring equation. Visualization and the ability to display our infrastructure status in a single pane of glass are also vital to large environments. Zabbix 6.0 LTS adds multiple new visualization options while also improving the existing features.

The data table widget allows you to create a summary view for the related metric status on your hosts
The Top N and Bottom N functions of the data table widget allow you to have an overview of your highest or lowest item values
The single item widget allows you to display values for a single metric
Improvements to the existing vector graphs such as the ability to reference individual items and more
The SLA report widget displays the current SLA for services filtered by service tags

We are proud to announce that Zabbix 6.0 LTS will provide a native Geomap widget. Now you can take a look at the current status of your IT infrastructure on a geographic map:

The host coordinates are provided in the host inventory fields
Users will be able to filter the map by host groups and tags
Depending on the map zoom level – the hosts will be grouped into a single object
Support of multiple Geomap providers, such as OpenStreetMap, OpenTopoMap, Stamen Terrain, USGS US Topo, and others

Zabbix agent – improvements and new items

Zabbix agent and Zabbix agent 2 have also received some improvements. From new items to improved usability – both Zabbix agents are now more flexible than ever. The improvements include such features as:

New items to obtain additional file information such as file owner and file permissions
New item which can collect agent host metadata as a metric
New item with which you can count matching TCP/UDP sockets
It is now possible to natively monitor your SSL/TLS certificates with a new Zabbix agent2 item. The item can be used to validate a TLS/SSL certificate and provide you additional certificate details
User parameters can now be reloaded without having to restart the Zabbix agent

In addition, a major improvement to introducing new Zabbix agent 2 plugins has been made. Zabbix agent 2 now supports loading stand-alone plugins without having to recompile the Zabbix agent 2.

Custom Zabbix password complexity requirements

One of the main improvements to Zabbix security is the ability to define flexible password complexity requirements. Zabbix Super admins can now define the following password complexity requirements:

Set the minimum password length
Define password character requirements
Mitigate the risk of a dictionary attack by prohibiting the usage of the most common password strings

UI/UX improvements

Improving and simplifying the existing workflows is always a priority for every major Zabbix release. In Zabbix 6.0 LTS we’ve added many seemingly simple improvements, that have major impacts related to the “feel” of the product and can make your day-to-day workflows even smoother:

It is now possible to create hosts directly from Monitoring – Hosts
Removed Monitoring – Overview section. For improved user experience, the trigger and data overview functionality can now be accessed only via dashboard widgets.
The default type of information for items will now be selected automatically depending on the item key.
The simple macros in map labels and graph names have been replaced with expression macros to ensure consistency with the new trigger expression syntax

New templates and integrations

Adding new official templates and integrations is an ongoing process and Zabbix 6.0 LTS is no exception here’s a preview for some of the new templates and integrations that you can expect in Zabbix 6.0 LTS:

f5 BIG-IP
Cisco ASAv
HPE ProLiant servers
Cloudflare
InfluxDB
Travis CI
Dell PowerEdge

Zabbix 6.0 also brings a new GitHub webhook integration which allows you to generate GitHub issues based on Zabbix events!

Other changes and improvements

But that’s not all! There are more features and improvements that await you in Zabbix 6.0 LTS. From overall performance improvements on specific Zabbix components, to brand new history functions and command-line tool parameters:

Detect continuous increase or decrease of values with new monotonic history functions
Added utf8mb4 as a supported MySQL character set and collation
Added the support of additional HTTP methods for webhooks
Timeout settings for Zabbix command-line tools
Performance improvements for Zabbix Server, Frontend, and Proxy

Questions and answers

Q: How can you configure geographical maps? Are they similar to regular maps?

A: Geomaps can be used as a Dashboard widget. First, you have to select a Geomap provider in the Administration – General – Geographical maps section. You can either use the pre-defined Geomap providers or define a custom one. Then, you need to make sure that the Location latitude and Location longitude fields are configured in the Inventory section of the hosts which you wish to display on your map. Once that is done, simply deploy a new Geomap widget, filter the required hosts and you’re all set. Geomaps are currently available in the latest alpha release, so you can get some hands-on experience right now.

Q: Any specific performance improvements that we can discuss at this point for Zabbix 6.0 LTS?

A: There have been quite a few. From the frontend side – we have improved the underlying queries that are related to linking new templates, therefore the template linkage performance has increased. This will be very noticeable in large instances, especially when linking or unlinking many templates in a single go.
There have also been improvements to Server – Proxy communication. Specifically – the logic of how proxy frees up uncompressed data. We’ve also introduced improvements on the DB backend side of things – from general improvements to existing queries/logic, to the introduction of primary keys for history tables, which we are still extensively testing at this point.

Q: Will you still be able to change the type of information manually, in case you have some advanced preprocessing rules?

A: In Zabbix 6.0 LTS Zabbix will try and automatically pick the corresponding type of information for your item. This is a great UX improvement since you don’t have to refer to the documentation every time you are defining a new item. And, yes, you will still be able to change the type of information manually – either because of preprocessing rules or if you’re simply doing some troubleshooting.

Handy Tips #15: Deploying Zabbix passive and active agents

2021-12-09 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-15-deploying-zabbix-passive-and-active-agents/17696/

Deploy Zabbix Agent to collect OS-level metrics.

There are multiple approaches to collecting OS-level metrics – from collecting only simple ping statistics to deploying an SNMP agent or using custom scripts. In many cases, this is either not sufficient or too time-consuming and not flexible enough. By deploying the Zabbix Agent you can enable a quick and easy way to collect OS-level metrics.

Deploy Zabbix Agent and collect metrics by polling the agent or autonomously sending data to your Zabbix Server:

Zabbix Agent supports polling (passive) and trapping (active)
Negligible performance overhead
Can be installed from packages on Linux or the MSI installer on Windows

Supports the most popular Unix-like operating systems
Zabbix Agent can be extended with custom User Parameters

Check out the video to learn how to deploy the Zabbix Agent in passive and active modes

How to configure and deploy Zabbix passive and active agents:

Install the Zabbix repository and the Zabbix Agent on your host
Open the zabbix_agentd.conf configuration file
Specify your Zabbix Server address in the Server and ServerActive parameters
Define the name of your host in the Hostname parameter
Restart the Zabbix Agent
Navigate to Configuration → Hosts
Create two hosts in Zabbix frontend – one for passive and one for active checks
For passive Agent host – define an Agent interface containing the address of your Zabbix Server
The active agent Host name should match the Hostname parameter value in the Agent configuration file
Apply the corresponding Linux by Zabbix Agent/Linux by Zabbix Agent active template
Navigate to Monitoring → Latest data and check if you have received metrics from the hosts

Tips and best practices:

A single Zabbix Agent can collect metrics in both passive and active modes
Zabbix Agent is backward compatible
The frontend ZBX interface icon is related only to passive checks
Active Zabbix Agent doesn’t require an interface configuration in Zabbix frontend
For active checks, the Zabbix Agent Hostname in the configuration file must match the Host name in the frontend

Handy Tips #14: Gain new insights into your metrics with the Graph widget

2021-12-02 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-13-gain-new-insights-into-your-metrics-with-the-graph-widget/17416/

Group, aggregate, and visualize your collected metrics with the Graph widget to obtain additional insights.

Sometimes simple graphs may not be sufficient – we may need to visualize data trends, display all metrics matching a pattern, or define custom draw styles for specific metrics.

The Graph widget allows you to visualize your metrics in an advanced fashion:

Customize your graph draw styles
Select between displaying History or Trend values
Calculate and display aggregated metric values on the fly

Define custom aggregation intervals
Define flexible problem display logic
Visually group metrics by defining item and host data sets

Check out the video to learn how to visualize your metrics by using the Graph widget.

How to visualize data with the Graph widget:

Navigate to Monitoring → Dashboards – All Dashboards
Click Create Dashboard to create a new Dashboard
Left-click on an empty space and select Add Widget
Select Widget type – Graph
Type in the data set host and item patterns
Use wildcard symbol ‘*‘ to match multiple hosts/items
Click Add new data set to add a second data set
Fill in the host pattern and item pattern to match a different set of metrics
For both data sets, select an Aggregation function, Aggregation interval, and Aggregate – Data set
Observe how metrics get aggregated in the graph on the fly

Tips and best practices:

Wildcards can be used in host and item pattern matching
The Overrides section can be used to further customize a particular set of items
Items within a single data set will use the same base color
Up to 50 items may be displayed in the graph

Deploying Zabbix in Amazon Web Services cloud platform

2021-12-02 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/deploying-zabbix-in-amazon-web-services-cloud-platform/17283/

With the rapid evolution and proliferation of different cloud services, many organizations have decided to move parts of their infrastructures from on-prem to cloud. As an essential part of your infrastructure, Zabbix is no exception – you always have the option to either deploy Zabbix on-prem or select from one of the many supported cloud service providers to deploy your Zabbix Server or Zabbix Proxy on.

In this blog post, let’s look at how we can quickly deploy Zabbix Server and Zabbix Proxy nodes in Amazon Web Services cloud platform.

Deploying the Zabbix Server in AWS

Let’s begin with the Zabbix download page. Under the Zabbix Cloud Images section, select the AWS cloud vendor and then the Cloud Image you wish to deploy. Let’s start with Zabbix Server 5.0 with MySQL DB backend and Nginx Web server backend for our frontend.

Next, we will be redirected to the AWS marketplace, where we will have to subscribe to the Zabbix Server 5.0 image.

Once we have subscribed to the Zabbix Server image, we can continue with the deployment configuration.

Next, we must select our Region, Zabbix minor version (usually the latest available), and the Fulfillment option. Once that is done, we can finalize the launch configuration.

Select the preferred Launch option, EC2 Instance Type, VPC, and Subent settings on the Launch page.

Next – We have to select or create a security group.

We also have to select or generate EC 2 Key pair – make sure to save your private key in a safe location!

Note that creating a security group based on seller settings does not guarantee that the group will have an inbound SSH access rule! Make sure to double-check the security group and manually add the SSH inbound rule if it hasn’t yet been added. We will need to access this instance via SSH to obtain the initial frontend login credentials!

Once you click on the Launch button, the deployment process for your Zabbix application will be initiated.

Accessing the application

Let’s open up the Instances section and open our newly deployed Zabbix instance

We can access the Zabbix Frontend by opening the Public IPv4 address or Public IPv4 DNS of the Zabbix instance

Note that the Zabbix frontend password is still unknown to us. Recall how I mention that we will need to access the instance via SSH to obtain the frontend password. Let’s do so now.

Write down the login credentials and use them to log in to the Zabbix instance.

Accessing the database

In case we wish to access the Zabbix database backend, we can do so from the command line. Zabbix database can be accessed by using the root user. By default, it can be used without a password.

The MySQL root password is stored in /root/.my.cnf configuration file.

Modifying the Zabbix Frontend timezone

By default, the Zabbix frontend uses the “UTC” timezone. If you need to change it, edit php_value[date.timezone] PHP variable in /etc/php-fpm.d/zabbix.conf and restart php-fpm process:

systemctl restart php-fpm

Zabbix proxy

If you wish to deploy a Zabbix proxy instance in your AWS cloud, the deployment steps are very much the same. Most likely, you will still require SSH access if you wish to perform some configuration changes in the Zabbix proxy configuration file.

Note, that by default, the SQLite proxy database is stored in /tmp/zabbix_proxy.sqlite3

As always, don’t forget the point the proxy at your Zabbix server instance by modifying the Server parameter in the Zabbix proxy configuration file, located in /etc/zabbix/zabbix_proxy.conf

And that’s all! With just a few clicks, we are able to deploy a fully functional Zabbix instance or a small Zabbix proxy to distribute or scale our monitoring. Don’t forget that AWS is just one of the many cloud service providers you can use with Official Zabbix images. If you have any questions about the AWS deployment – you are very much encouraged to leave a comment under this blog post.

If you wish to learn more about the Zabbix Monitoring solution, check out the official documentation https://www.zabbix.com/documentation/current/manual/quickstart.

Summary of Zabbix Summit Online 2021, Zabbix 6.0 LTS release date and Zabbix Workshops

2021-12-01 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/summary-of-zabbix-summit-online-2021-zabbix-6-0-lts-release-date-and-zabbix-workshops/17155/

Now that the Zabbix Summit Online 2021 has concluded, we are thrilled to report we hosted attendees from over 3000 organizations from more than 130 countries all across the globe.

This year, the main focus of the speeches was the upcoming Zabbix 6.0 LTS release, as well as speeches focused on automating Zabbix data collection and configuration, Integrating Zabbix within existing company infrastructures, and migrating from legacy tools to Zabbix. 21 speakers in total presented their use cases and talked about new Zabbix features during the Summit with over 8 hours of content.

In case you missed the Summit or wish to come back to some of the speeches – both the presentations (in PDF format) and the videos of the speeches are available on the Zabbix Summit Online 2021 Event page.

Zabbix 6.0 LTS release date

As for Zabbix 6.0 LTS – as per our statement during the event, you can expect Zabbix 6.0 LTS to release in early 2022. At the time of this post, the latest pre-release version is Zabbix 6.0 Alpha 7, with the first Beta version scheduled for release VERY soon. Feel free to deploy the latest pre-release version and take a look at features such as Geomaps, Business Service monitoring, improved Audit log, UX improvements, Anomaly detection with Machine Learning, and more! The list of the latest released Zabbix 6.0 versions as well as the improvements and fixes they contain is available in the Release notes section of our website.

Zabbix 6.0 LTS Workshops

The workshops will focus on particular Zabbix 6.0 LTS features and will be available once the Zabbix 6.0 LTS is released. The workshops will provide a unique chance to learn and practice the configuration of specific Zabbix 6.0 LTS features under the guidance of a certified Zabbix trainer at absolutely no cost! Some of the topics covered in the workshops will include – Deploying Zabbix server HA cluster, Creating triggers for Baseline monitoring and Anomaly detection, Displaying your infrastructure status on Geomaps, Deploying Business Service monitoring with root cause analysis, and more!

Upcoming events

But there’s more! On December 9 2021 Zabbix will host PostgreSQL Monitoring Day with Zabbix & Postgres Pro. The speeches will focus on monitoring PostgreSQL databases, running Zabbix on PostgreSQL DB backends with TimescaleDB, and securing your Zabbix + PostgreSQL instances. If you’re currently using PostgreSQL DB backends r plan to do so in the future – you definitely don’t want to miss out!

As for 2022 – you can expect multiple meetups regarding Zabbix 6.0 LTS features and use cases, as well as events focused on specific monitoring use cases. More information will be publicly available with the release of Zabbix 6.0 LTS.

Handy Tips #13: Automate inventory information collection with Zabbix

2021-11-24 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-13-automate-inventory-information-collection-with-zabbix/17367/

Metrics collected by Zabbix can be used not only for historic data analysis and problem detection but they can also be utilized to populate your host inventory.

Keeping track of your host inventory is a necessity in organizations of every size. Using another tool to do that can be costly and cause software bloat and sometimes we may have to resort to populating inventory data manually.

(more…)

Handy Tips #12: Optimizing Zabbix database size with custom data storage periods

2021-11-18 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-12-optimizing-zabbix-database-size-with-custom-data-storage-periods/17396/

Zabbix allows its users to configure custom data retention periods for different types of data – from history and trend storage periods to user session storage periods.

Data retention requirements can vary a lot between different environments. With considerations to data storage footprint and company policies, some environments might require storing months of historical data, while others are fine with storing mostly trends.

Use housekeeping settings to define custom data storage periods:

Storage periods can be defined for history, trends, events, and more
Unique storage periods can be defined for each individual item

TimescaleDB backends support native data partitioning and compression
Housekeeping for individual data types can be disabled – not recommended in production environments

Check out the video to learn how to define data storage periods on your Zabbix instance.

How to define data storage periods on your Zabbix instance:

Navigate to Configuration → Hosts and click on the Items button next to an existing host
Select any integer or float item
Set the History storage period to 30d, Trend storage period to 180d
Save the item
Navigate to Administration → General → Housekeeping
Set the Trigger data storage period to 90d
Tick the checkbox next to the Override item history period option
Set the History storage period to 90d
Navigate back to Configuration → Hosts and click on your host
Click on the Items next to your host and find the previously modified item
Click on the green i next to the History storage period
Read the override notification

Tips and best practices::

Usually, long term storage of internal, network discovery, and autoregistration events is not required
Item and trend storage periods can be overridden by global settings
Storage period will not be overridden for items that have Do not keep history or Do not keep trends enabled
An event will not be removed until the associated problem is resolved

Advanced Zabbix API – 5 API use cases to improve your API workfows

2021-11-18 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/advanced-zabbix-api-5-api-use-cases-to-improve-your-api-workfows/16801/

As your monitoring infrastructures evolve, you might hit a point when there’s no avoiding using the Zabbix API. The Zabbix API can be used to automate a particular part of your day-to-day workflow, troubleshoot your monitoring or to simply analyze or get statistics about a specific set of entities.

In this blog post, we will take a look at some of the more advanced API methods and specific method parameters and learn how they can be used to improve your API workflows.

1. Count entities with CountOutput

Let’s start with gathering some statistics. Let’s say you have to count the number of some matching entities – here we can use the CountOutput parameter. For a more advanced use case – what if we have to count the number of events for some time period? Let’s combine countOutput with time_from and time_till (in unixtime) and get the number of events created for the month of November. Let’s get all of the events for the month of November that have the Disaster severity:

{
"jsonrpc": "2.0",
"method": "event.get",
"params": {
"output": "extend",
"time_from": "1635717600",
"time_till": "1638223200",
"severities": "5",
"countOutput": "true"
},
"auth": "xxxxxx",
"id": 1
}

2. Use API to perform Configuration export/import

Next, let’s take a look at how we can use the configuration.export method to export one of our templates in yaml:

{
"jsonrpc": "2.0",
"method": "configuration.export",
"params": {
"options": {
"templates": [
"10001"
]
},
"format": "yaml"
},
"auth": "xxxxxx",
"id": 1
}

Now let’s copy and paste the result of the export and import the template into another environment. It’s extremely important to remember that for this method to work exactly as we intend to, we need to include the parameters that specify the behavior of particular entities contained in the configuration string, such as items/value maps/templates, etc. For example, if I exclude the templates parameter here, no templates will be imported.

{
"jsonrpc": "2.0",
"method": "configuration.import",
"params": {
"format": "yaml",
"rules": {
"valueMaps": {
"createMissing": true,
"updateExisting": true
},
"items": {
"createMissing": true,
"updateExisting": true,
"deleteMissing": true
},
"templates": {
"createMissing": true,
"updateExisting": true
},

"templateLinkage": {
"createMissing": true
}
},
"source": "zabbix_export:\n version: '5.4'\n date: '2021-11-13T09:31:29Z'\n groups:\n -\n uuid: 846977d1dfed4968bc5f8bdb363285bc\n name: 'Templates/Operating systems'\n templates:\n -\n uuid: e2307c94f1744af7a8f1f458a67af424\n template: 'Linux by Zabbix agent active'\n name: 'Linux by Zabbix agent active'\n 
...
},
"auth": "xxxxxx",
"id": 1
}

3. Expand trigger functions and macros with expand parameters

Using trigger.get to obtain information about a particular set of triggers is a relatively common practice. One particular caveat that we have to consider is that by default macros in trigger name, expression or descriptions are not expanded. To expand the available macros we need to use the expand parameters:

{
"jsonrpc": "2.0",
"method": "trigger.get",
"params": {
"triggerids": "18135",
"output": "extend",
"expandExpression":"1",
"selectFunctions": "extend"
},
"auth": "xxxxxx",
"id": 1
}

4. Obtaining additional LLD information for a discovered item

If we wish to display additional LLD information for a discovered entity, in this case – an item, we can use the selectDiscoveryRule and selectItemDiscovery parameters.
While selectDiscoveryRule will provide the ID of the LLD rule that created the item, selectItemDiscovery can point us at the parent item prototype id from which the item was created, last discovery time, item prototype key, and more.

The example below will return the item details and will also provide the LLD rule and Item prototype IDs, the time when the lost item will be deleted and the last time the item was discovered:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"itemids":"36717",
"selectDiscoveryRule":"1",
"selectItemDiscovery":["lastcheck","ts_delete","parent_itemid"]
}, "auth":"xxxxxx",
"id": 1
}

5. Searching through the matched entities with search parameters

Zabbix API provides a couple of standard parameters for performing a search. With search parameter, we can search string or text fields and try to find objects based on a single or multiple entries. searchByAny parameter is capable of extending the search – if you set this as true, we will search by ANY of the criteria in the search array, instead of trying to find an entity that matches ALL of them (default behavior).

The following API call will find items that match agent and Zabbix keys on a particular template:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"output": "extend",
"templateids": "10001",
"search": {
"key_": ["agent.","zabbix"]
},
"searchByAny":"true",
"sortfield": "name"
},
"auth": "xxxxxx",
"id": 1
}

Feel free to take the above examples, change them around so they fit your use case and you should be able to quite easily implement them in your environment. There are many other use cases that we might potentially cover down the line – if you have a specific API use case that you wish for us to cover, feel free to leave a comment under this post and we just might cover it in one of the upcoming blog posts!

Zabbix 6.0 LTS at Zabbix Summit Online 2021

2021-11-16 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/zabbix-6-0-lts-at-zabbix-summit-online-2021/16115/

With Zabbix Summit Online 2021 just around the corner, it’s time to have a quick overview of the 6.0 LTS features that we can expect to see featured during the event. The Zabbix 6.0 LTS release aims to deliver some of the long-awaited enterprise-level features while also improving the general user experience, performance, scalability, and many other aspects of Zabbix.

Native Zabbix server cluster

Many of you will be extremely happy to hear that Zabbix 6.0 LTS release comes with out-of-the-box High availability for Zabbix Server. This means that HA will now be supported natively, without having to use external tools to create Zabbix Server clusters.

The native Zabbix Server cluster will have a speech dedicated to it during the Zabbix Summit Online 2021. You can expect to learn both the inner workings of the HA solution, the configuration and of course the main benefits of using the native HA solution. You can also take a look at the in-development version of the native Zabbix server cluster in the latest Zabbix 6.0 LTS alpha release.

Business service monitoring and root cause analysis

Service monitoring is also about to go through a significant redesign, focusing on delivering additional value by providing robust Business service monitoring (BSM) features. This is achieved by delivering significant additions to the existing service status calculation logic. With features such as service weights, service status analysis based on child problem severities, ability to calculate service status based on the number or percentage of children in a problem state, users will be able to implement BSM on a whole new level. BSM will also support root cause analysis – users will be informed about the root cause problem of the service status change.

All of this and more, together with examples and use cases will be covered during a separate speech dedicated to BSM. In addition, some of the BSM features are available in the latest Zabbix 6.0 LTS alpha release – with more to come as we continue working on the Zabbix 6.0 release.

Audit log redesign

The Audit log is another existing feature that has received a complete redesign. With the ability to log each and every change performed both by the Zabbix Server and Zabbix Frontend, the Audit log will become an invaluable source of audit information. Of course, the redesign also takes performance into consideration – the redesign was developed with the least possible performance impact in mind.

The audit log is constantly in development and the current Zabbix 6.0 LTS alpha release offers you an early look at the feature. We will also be covering the technical details of the new audit log implementation during the Summit and will explain how we are able to achieve minimal performance impact with major improvements to Zabbix audit logging.

Geographical maps

With Geographical maps, our users can finally display their entities on a geographical map based on the coordinates of the entity. Geographical maps can be used with multiple geographical map providers and display your hosts with their most severe problems. In addition, geographical maps will react dynamically to Zoom levels and support filtering.

The latest Zabbix 6.0 Alpha release includes the Geomap widget – feel free to deploy it in your QA environment, check out the different map providers, filter options and other great features that come with this widget.

Machine learning

When it comes to problem detection, Zabbix 6.0 LTS will deliver multiple trend new functions. A specific set of functions provides machine learning functionality for Anomaly detection and Baseline monitoring.

The topic will be covered in-depth during the Zabbix Summit Online 2021. We will look at the configuration of the new functions and also take a deeper dive at the logic and algorithms used under the hood.

During the Zabbix Summit Online 2021, we will also cover many other new features, such as:

New Dashboard widgets
New items for Zabbix Agent
New templates and integrations
Zabbix login password complexity settings
Performance improvements for Zabbix Server, Zabbix Proxy, and Zabbix Frontend
UI and UX improvements
Zabbix login password complexity requirements
New history and trend functions
And more!

Not only will you get the chance to have an early look at many new features not yet available in the latest alpha release, but also you will have a great chance to learn the inner workings of the new features, the upgrade and migration process to Zabbix 6.0 LTS and much more!

We are extremely excited to share all of the new features with our community, so don’t miss out – take a look at the full Zabbix Summit online 2021 agenda and register for the event by visiting our Zabbix Summit page, and we will see you at the Zabbix Summit Online 2021 on November 25!

Simplifying Zabbix API workflows with named Zabbix API tokens

2021-11-11 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/simplifying-zabbix-api-workflows-with-named-zabbix-api-tokens/16653/

Zabbix API enables you to collect any and all information from your Zabbix instance by using a multitude of API methods. You can even utilize Zabbix API calls in your HTTP items. For example, this can be used to monitor the number of particular sets of metrics and visualize their growth over time. With named Zabbix API tokens, such use cases are a lot more simple to implement.

Before Zabbix 5.4 we had to perform the user.login API call to obtain the authentication token. Once the user session was closed, we had to relog, obtain the new authentication token and use it in the subsequent API calls.

With the pre-defined named Zabbix API tokens, you don’t have to constantly check if the authentication token needs to be updated. Starting from Zabbix 5.4 you can simply create a new named Zabbix API token with an expiration date and use it in your API calls.

Creating a new named Zabbix API token

The Zabbix API token creation process is extremely simple. All you have to do is navigate to Administration – General – API tokens and create a new API token. The named API tokens are created for a particular user and can have an optional expiration date and time – otherwise, the tokens are defined without an expiry date.

You can create a named API token in the API tokens section, under Administration – General

Once the Token has been created, make sure to store it somewhere safe, since you won’t be able to recover it afterward. If the token is lost – you will have to recreate it.

Make sure to store the auth token!

Don’t forget, that when defining a role for the particular API user, we can restrict which API methods this user has access to.

Simplifying API tasks with the named API token

There are many different use cases where you could implement Zabbix API calls to collect some additional information. For this blog post, I will create an HTTP item that uses item.get API call to monitor the number of unsupported items.

To achieve that, I will create an HTTP item on a host (This can be the default Zabbix server host or a host dedicated to collecting metrics via Zabbix API calls) and provide the API call in the request body. Since the named API token now provides a static authentication token until it expires, I can simply use it in my API call without the need to constantly keep it updated.

An HTTP agent item that uses a Zabbix API call in its request body

{
    "jsonrpc": "2.0",
    "method": "item.get",
    "params": {
			"countOutput":"1",
			 "filter": {
 "state": "1"
 }
    },
    "id": 2,
    "auth": "b72be8cf163438aacc5afa40a112155e307c3548ae63bd97b87ff4e98b1f7657"
}

HTTP item request body, which returns a count of unsupported items

I will also use regular expression preprocessing to obtain the numeric value from the API call result – otherwise, we won’t be able to graph our value or calculate trends for it.

Regular expression preprocessing step to obtain a numeric value from our Zabbix API call result

Utilizing Zabbix API scripts in Actions

In one of our previous blog posts, we covered resolving problems automatically with the event.acknowledge API method. The logic defined in the blog post was quite complex since we needed to keep an eye out for the authentication tokens and use a custom script to keep them up to date. With named Zabbix API tokens, this use case is a lot more simple.

All I have to do is create an Action operation script containing my API call and pass it to an action operation.

Action operation script that invokes Zabbix event.acknowledge API method

curl -sk -X POST -H "Content-Type: application/json" -d "
{
\"jsonrpc\": \"2.0\",
\"method\": \"event.acknowledge\",
\"params\": {
\"eventids\": \"{EVENT.ID}\",
\"action\": 1,
\"message\": \"Problem resolved.\"
},
\"auth\": \"<Place your authentication token here>",
\"id\": 2
}" <Place your Zabbix API endpoint URL here>

Problem remediation script example

Now my problems will get closed automatically after the time period which I have defined in my action.

Action operation which runs our event.acknowledge Zabbix API script

These are but a few examples that we can now achieve by using API tokens. A lot of information can be obtained and filtered in a unique way via Zabbix API, thus providing you with a granular analysis of your monitored environment. If you have recently upgraded to Zabbix 5.4 or plan to upgrade to Zabbix 6.0 LTS in the future, I would recommend implementing named Zabbix API tokens to simplify your day-to-day workflow and consider the possibilities that this new feature opens up for you.

If you have any questions or if you wish to share your particular use case for data collection or task automation with Zabbix API – feel free to share them in the comments section below!

Combining preprocessing with storing only trend data for high-frequency monitoring

2021-11-04 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/combining-preprocessing-with-storing-only-trend-data-for-high-frequency-monitoring/16568/

There are many design choices to consider when we build our monitoring environment for high-frequency monitoring. How to minimize performance impact? What are the data retention policies with storage space in mind? What are the available out-of-the-box features to solve these potential problems?

In this blog post, we will discuss when you should use preprocessing and when it is better to use the “Do not keep history” option for your metrics, and what are the pros and cons for both of these approaches.

Throttling and other preprocessing steps

We’ve discussed throttling previously as the go-to approach for high-frequency monitoring. Indeed, with throttling, you can discard repeated values and do so with a heartbeat. This is extremely useful with metrics that come as discreet values – services states, network port statuses, and so on.

Example of throttling with and without heartbeat

In addition, since starting from Zabbix 4.2 all preprocessing is also performed by Zabbix proxies. This means we can discard the repeated values before they reach the Zabbix server. This can help us both with the performance (fewer metrics to insert in the Zabbix server DB) and reduce the DB size (Fewer metrics stored in the DB. This also helps with improving overall Zabbix performance)

There are a few caveats with this approach – since metrics get discarded before they reach the Zabbix server, the triggers will not react on these metrics (This is where having a heartbeat is useful) and, since trends are calculated by Zabbix server based on the received history data, there could be a lack of trend information for these metrics. Keep in mind that this applies not only to throttling preprocessing rules – any preprocessing can be done on the proxy and any preprocessing rules can be used to transform your data.

Understanding “Do not keep history” option

The behavior of “Do not keep history” which we can define when configuring an item is a bit different though. If we collect an item by a Proxy and configure the item with “Do not keep history”, the history won’t always get discarded! There are a couple of reasons for this.

First off, let’s not forget that some of our values can populate host inventory! If the particular item is configured to populate an inventory field – it will be forwarded to the Zabbix server, but it will not get stored in the history tables.
If the item does not populate an inventory field – the text data such as character, log and text will indeed get discarded before reaching the Zabbix server, but Numeric values – both float and integer, will get forwarded to the server. The reason for that is deriving trend information from the numeric values. Mind that the numeric data will still not get stored in the history tables, only trends will be available for these items.

Note: This behavior has been properly implemented starting from Zabbix 5.2. See ZBX-17548

Setting the “Do not keep history” option for an item

Using trend functions with high-frequency monitoring

With the specifics of “Do not keep history” in mind, we should now recall that starting from Zabbix 5.2 we have trend functions available at our disposal!

History functions such as trendavg, trendcount, trendmax, trendmin, trendsum allow us to perform different kinds of trend calculations – from counting the number of trend values to retrieving min/max/avg trend values for a time period.

This means, that if we require only the metric trend for specific time periods (hours, days, weeks, etc) we can use these trend functions together with “Do not keep history” option, thus discarding unnecessary data and improving our Zabbix server performance!

There are two approaches two using trend functions:

If you wish to collect and display the trend data, you need to create the item which will collect the metrics (say, a net.if.in Agent item for collecting incoming network traffic) and then create a separate calculated item that uses the trend function to calculate the avg/min/max value for the trend over a time period. The original item can then have “Do not keep history” option selected for it.

trendavg item for calculating hourly trends from the net.if.in[ifHCInOctets.5] item

If you wish to simply define triggers and react on long-term trends and are not required to collect the trend values, then we can skip the creation of the calculated item and simply use the trend function on the original item in the trigger.

This trigger fires if the hourly average trend value exceeds 100M.
Note: In this case only the original item is required.

By combining these approaches in our environment – using preprocessing when we wish to discard or transform the data and also implementing opting out of storing the history data, whenever this is appropriate, we can minimize the performance impact on our Zabbix instance. Add a layer of distributed Zabbix proxies on top of this and you can truly achieve a large, scalable Zabbix infrastructure optimized for high-frequency ingestion and processing of your data.

Keeping your Zabbix templates up to date

2021-10-28 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/keeping-your-zabbix-templates-up-to-date/16412/

Have you recently updated your Zabbix environment but are still wondering – why haven’t the templates been updated? Where can I obtain the latest official Zabbix templates, and how should I update them? In this blog post, we will discuss why it is vital to keep your templates up to date and how we the template update process looks like.

Updating your templates

“Will updating Zabbix also update my templates?” is a question that I receive quite often. The answer to that question is – no changes are made to your templates whenever you update your Zabbix instance – be it a minor or a major update. The reasoning behind that is quite simple – we always recommend that you tune the out-of-the-box templates as per your particular requirements. That may consist of changing update intervals, disabling items/triggers, or even changing the existing trigger expressions or adding whole new entities to the template.

This is where the current behavior with template updates starts to make more sense. If Zabbix were to automatically update your templates, there could be a chance of overwriting your custom changes and could potentially disrupt the monitoring of your environment. That is something that we definitely wish to avoid.

The question still stands – Then how am I supposed to update my templates?

The answer – you can find the latest official Zabbix templates on our official git page – https://git.zabbix.com/

First, navigate to the Zabbix repository and open the Templates folder. Then, select the release branch that matches your Zabbix instance version. Here you can find all of our official templates and also the official media types under the media folder. All you have to do now is open the template up and download the raw template file.

Zabbix 5.4 release git templates/db folder

Once that is done, we can import the template into our Zabbix environment

Template template_db_oracle_agent2 import

Don’t forget to back up your existing templates, especially if you have made some custom changes to them! Ideally – add a prefix to their names, so the new and old templates can live side by side, and you can then manually copy over the changes from the latest official template to your custom template.

The benefits of keeping templates up to date

But what is the point of updating your templates – what do you get out of it? Well, that varies on the specific fixes or improvements we make to the particular template over time. Sometimes the updated template will provide improved trigger expressions or preprocessing logic. Other times the updated template will provide extra value to your monitoring with completely new items and triggers. In the case of Webhook media types – the updates usually contain fixes or improvements for some particular use cases, for example – fixing a compatibility issue for a specific OS.

You can always track these changes either in the release notes of a particular Zabbix version or by looking up a specific bug or a feature request in our bug tracker – https://support.zabbix.com

Some of the template changes in Zabbix 5.4 major update

Zabbix self-monitoring templates

Another key aspect of why it’s important to keep your templates up to date is so you can implement the changes made to the Zabbix self-monitoring templates. For example, if we compare Zabbix 5.0 to Zabbix 5.4, there are multiple new Zabbix processes and caches added to Zabbix 5.4, such as report writer/manager process, availability manager process, trend function caches, and other new components.

Zabbix server health template version 5.0 and 5.4 difference in the number of entities

So, if you update from Zabbix 5.0 to 5.4 (or Zabbix 6.0 if you’re sticking with LTS versions), you WILL NOT be monitoring these processes and caches if you don’t update your Zabbix server and Zabbix proxy templates to the current Zabbix versions. This means that you will be completely unaware of any potential performance issues related to these processes or caches.

Tracking template changes

With Zabbix 5.4 and later, you will notice some great improvements to the template import process. If you’re wondering what has changed when comparing an older template version with a newer one, you will now be able to see the changes during the import process. The added and removed elements will be highlighted in red or green accordingly.

Preview of the changes made during the template import process

How often should you update the templates? Ideally, you would follow the Zabbix update release notes and take note of any changes made to the templates that are of use in your environment. At the very least – definitely check for changes in the self-monitoring templates when moving to a newer major version of Zabbix. Otherwise, you risk losing track of potential issues in your Zabbix environment.

Now that you know the answer to the question “How can I update my Zabbix templates?” try and think back to when you last updated your Zabbix instance to a new version – did you also check the official templates for updates? If not, then don’t hesitate and visit https://git.zabbix.com/ to find the latest templates for your Zabbix version. Chances are that you will be pleasantly surprised with a set of new and updated templates for your monitoring endpoints and new webhook media types to help you integrate Zabbix with your existing systems.

Extend your Low-level discovery rules with overrides

2021-09-07 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/extend-your-low-level-discovery-rules-with-overrides/15496/

Overrides is an often overlooked feature of Low-level discovery that makes the discovery of different entities in your environment so much more flexible. In this blog post, we will take a look at how overrides work and how we can use them to extend our Low-level discovery rules with additional logic.

When it comes to Zabbix, Low-level discovery (also known as LLD) is a vital part of the Zabbix feature set. Automated creation of items, triggers, graphs, and hosts, based on existing entities is invaluable in larger environments, where creating these entities by hand is simply not feasible.

But what about use cases where some of your entities need to be created in a somewhat unique fashion compared to the rest of the discovered entities? Say, one of the items needs to have a unique update interval because it’s more critical than the rest. Or what if some of my network interfaces need to have lower trigger severities since they are not mission-critical?

Before Zabbix 5.0 this used to be a common question – how can I work around use cases like that? The only solution used to be creating separate discovery rules and utilizing LLD filters, for example:

LLD rule A discovers one set of entities with their own properties, and filters out everything else, while LLD rule B discovers the few unique entities that require different configuration parameters while filtering out the entities already discovered by the LLD rule A.

Not very elegant, is it? You can also imagine the LLD rule bloat that would occur in large infrastructures with many unique entities. Thankfully, overrides introduced in Zabbix 5.0 address the question in a very efficient manner.

Low-level discovery overrides

Let’s recall how LLD works. Under the hood LLD is a JSON array with LLD macro and macro value pairs:

[{"{#IFNAME}":"eth0"},{"{#IFNAME}":"lo"},{"{#IFNAME}":"eth2"},{"{#IFNAME}":"eth1"}]

In this small example of an LLD JSON, we can see each of our interfaces next to the LLD macro – these are our discovered entities. With overrides, we can define modify objects related to each of these discovered entities.

Overrides are available for each of your LLD rules. Once you open the particular LLD rule, navigate to the “Overrides” tab and add a new override. There are three parts to Overrides:

Filters define which of the discovered entities we are going to modify with this override. This is done by matching against the contents of a specific LLD macro or simply checking if a specific LLD macro exists for a discovered entity.
Operation conditions define which of the objects (items, triggers, graphs, hosts) we are going to override for the discovered entity.
Lastly, we have to define which of the object attributes we are going to change (item update intervals, trigger severity, item history storage period, etc.)

Multiple overrides in a single Low-level discovery rule

In use cases where we have defined multiple overrides in a single LLD rule, there are few things that we need to consider. First off, the override order does matter! The overrides are displayed in a reorderable drag-and-drop list, so we can change the ordering of the overrides by dragging them around.

Secondly, when defining configuring an LLD override we can select from two behaviors if the override matches the discovered entity:

Continue overrides – subsequent overrides for the current entity will be processed.
Stop processing – subsequent overrides for the current entity will be ignored.

For this reason, the order of our overrides can have a significant impact, especially if we have selected to Stop processing subsequent overrides if one of our overrides matches a discovered entity.

Network interface discovery override

Let’s take a look at an example of how we can use overrides and define unique settings for some of the discovered network interfaces.

Without any overrides, we can see that we are discovering interfaces eth0, eth1, eth2, and lo. All of them have the same update interval and history/trend retention settings. When we open up the trigger section, we can also see that the triggers for all of the interfaces have the same severity settings, all are enabled and discovered.

Discovered triggers without using any overrides

Now let’s implement a few overrides.

Change severity to high for the eth1 interface down trigger
Change the history/trend storage period for the items created for the lo interface

Let’s define our first override. We are going to be overriding a trigger prototype on {#IFNAME} matches eth1.
Override only interfaces that match eth1 in the {#IFNAME} macro value

For this entity, we will be changing the severity of the Trigger prototype containing the string “Link down” in the trigger prototype name. Change trigger severity only for triggers that contain Link down in their name

For our second override let’s change the history and trend storage period on items for the entity where {#IFNAME} matches lo, since storing history data for the loopback interface isn’t critical for us.

Override only interfaces that match lo in the {#IFNAME} macro value

To apply this override on the items created from any item prototype in this discovery rule, my operation condition is going to contain an empty matches pattern.

Change item History/Trend storage period for any item created for the lo interface

Once we re-run the discovery rule, we will see that the changes have been applied to our items and triggers created from the corresponding prototypes.

Note the lo interface History/Trend storage periods – they have been changed as per our override

Note the eth1 trigger severity – it’s now High, as per our override

This way we can finally create any discovered object with a set of unique properties, despite our object prototypes having static settings. The example above is just a general use case of how we can utilize overrides – there can be many complex scenarios for utilizing overrides, especially if we take the execution order and stop/continue processing settings into account. If you are interested in the full list of changes that we can perform on different objects by using overrides, feel free to take a look at the Overrides section in our documentation.

Hopefully, the blog post gave you a glimpse of the flexibility that the override feature is capable of delivering. If you have an interesting use case for overrides or any questions/comments – you are very much welcome to share those in the comments section below the post!

Deploying and configuring Zabbix 5.4 in a multi-tenant environment

2021-08-10 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/deploying-and-configuring-zabbix-5-4-in-a-multi-tenant-environment/15109/

In this post and the video, we will discuss deploying and configuring Zabbix 5.4 in a multi-tenant environment and how Zabbix is finally ready for real multi-tenant use cases thanks to multiple features.

I. Monitoring requirements of multi-tenant environments (0:30)
II. Supported monitoring approaches (2:32)

- - Agent (2:32)
  - Latest improvements (4:53)

III. Zabbix and multi-tenant environment (5:56)

- - Zabbix proxy (6:25)
  - Data preprocessing — throttling (8:45)
  - Permissions (10:08)
  - High availability (13:27)
  - Database replication (17:04)
  - Database performance tuning (18:10)
  - History table partitioning (19:23)

IV. To-do list (21:36)
V. Questions & Answers (23:32)

Monitoring requirements of multi-tenant environments

Before talking Zabbix, let us first analyze the core requirements behind multi-tenant environments. Such environments can be quite complex with a particular set of prerequisites that we have to be sure we can satisfy before continuing further.

The core idea behind multi-tenancy is support for multiple customers. Therefore we need to support granular role/permission schema. The ability to define different roles for different customers and limit what they can access is key to success for such deployments.
Multiple customers means a lot of data. No matter if we’re talking about a single Zabbix instance or scaling by deploying multiple Zabbix instances (say, for different regions) we need to have the ability to process large amounts of data.
On top of that, we must be able to scale upwards, ideally – both horizontally and vertically. More customers, different requirements, varying amounts of data to process – all of this needs to be accounted for in advance.
Redundancy is another key factor for us. As service providers, we absolutely cannot afford any downtime or data loss. While this may be acceptable in our own home labs or classrooms, this is not the case here. Unscheduled downtime could potentially result in a loss of a customer.

Supported monitoring approaches

Now that we have covered the architectural requirements, let’s focus on data collection. No matter the monitoring solution, the easiest approach in most cases would simply be to tell our customer to deploy an agent and be done with it. Unfortunately, this often doesn’t sit will with the end user and their security team. Let us also not forget that it is simply not possible to deploy an agent on some devices or environments – what then? Having a vast selection of monitoring methods is key for successful deployment of a multi-tenant monitoring service.

Let us take a look at this in the context of Zabbix.

Agent

With Zabbix agents we can obtain data in two ways – in passive (polling) and active modes (trapping). This is extremely useful while working with multiple customers, since each of them will have different internal network security policies. I have personally seen cases where only one of these approaches is supported, while the other is restricted by the security policies.
Agents also support deployment on different platforms (Windows, Unix, etc.), as well as execution of external third-party scripts either by way of User parameters or system.run item key.
Active agents are also capable of reading log files and event logs on Windows environments. This can be extremely useful, since many applications, even in-house ones, can provide a lot of monitoring data by logging it.

Agent-supported deployment on various platforms

Since we need to stay flexible, there are many other monitoring approaches supported by Zabbix that we can utilize:

SNMP, HTTP, IPMI and SSH agentless monitoring.
Simple checks (ICMP pings, port statuses).
Database and Java application monitoring.
External scripts (executed by the Zabbix server, Zabbix proxy or Zabbix agent).
Aggregations and calculations of existing data.
VMware monitoring and integration.
Web monitoring by creating web scenarios.
Synthetic monitoring for simulating real life user transactions.

Latest improvements

Why are we putting emphasis on multi tenancy just now? The reason is a couple of great features added in the last few releases. These features can finally allow us to utilize Zabbix in a truly multitenant environment:

Added in Zabbix 5.2:

Ability to create customizable user roles based on user types;
Secrets can now be stored in an, highly secure external vault;
Improvements in configuring frontend were also added. For example, each user can now select their time zone for frontend data display. This will be relevant for users in different geographical locations.

Added in Zabbix 5.4:

Users now have the ability to send scheduled reports. This is extremely useful for customers who may wish to receive scheduled reports about their environments. Now, instead of utilizing third-party scripts to export data and generate reports, you can use the native Zabbix functionality.
Major performance improvements have also been added, especially for really large instances with tens of thousands of new incoming values per second.

Zabbix and multi-tenant environment

How do we use Zabbix in a multi-tenant environment? Essentially, we provide Zabbix as a service. We use the Zabbix monitoring tool to monitor our clients (ABC and BCD in the image). We monitor their network traffic, their operating system statistics, application statistics, log files, etc. For each tenant, these monitoring requirements are going to be different.

Multi-tenant environment

Zabbix proxy

Multi-tenancy would not be possible without Zabbix proxies. With Zabbix proxies we can deploy them in customer offices, data centers, organization branches and collect data locally. Since proxies also perform preprocessing, we can even utilize them to transform and normalize metrics or even discard some of the collected metrics before forwarding them to the central Zabbix backend server.

Proxies are capable of performing preprocessing ever since Zabbix version 5.0. This allows us to normalize and transform data, for example – change our textual data to numeric data, use throttling and other pre-processing approaches. Even custom JavaScript is supported nowadays to format or normalize the data before we send it to our central Zabbix backend server. So, instead of the server being responsible for all of the preprocessing and having quite a large preprocessing overhead, now the proxy can do it and then forward the data to the server.
In addition, on the proxy, the data gets compressed before forwarding to the server thus saving some network traffic overhead.
The proxy still continues collecting data and storing it in its own database even in case of a network outage on the customer’s site.
Once we collect the data by the proxy, it gets sent to the server via a single connection, which is a lot more feasible from the network security perspective. In this case we need to create only a single firewall rule as opposed to a wide array of rules if we were to monitor the customer’s site directly from the central Zabbix backend server.
We can execute remote scripts on the proxy.
We can also deploy multiple proxies to improve scalability. If a single proxy cannot handle the amount of data that we are gathering or preprocessing, we can always deploy an extra proxy. They are easy to deploy, and can even use out-of-the-box SQLite databases.

Passive and active proxies

With proxies can also select the direction of the connection. We can deploy passive proxies, which get polled by the Zabbix backend server. In that case, the Zabbix server pulls the data from the proxy. In this scenario the Zabbix server is the one responsible for establishing the connection to the proxy. This adds a minor performance overhead to the Zabbix backend server. On the other hand, we can also deploy active proxies, where we remove that overhead from the server and proxy sends the data autonomously to the server.

At the end of the day, similar to how it goes with agent requirements, the proxy mode will depend on the security policies of the customer. Don’t forget that we aren’t restricted to a single type of proxy – we can have both of these proxy types running at the same time.

Selecting the connection direction

Data preprocessing — throttling

Preprocessing can help us not only normalize our data, but we can also utilize it to save up on storage and performance overhead, which is vital in large environments.

When monitoring a service or an application state, we are going to be obtaining discrete values such as 1, 2, or 3, or any number. These numbers have a tendency to repeat – if our server stays up, we are going to continue receiving a number which represents “Up”. By using the preprocessing method called throttling, we can decrease the amount of these numbers stored by discarding repeating values. Only status changes are stored, therefore we can potentially save some database space and remove unneeded data processing overhead.

Discarding unchanged values

At this point in time, this feature sees more and more usage in many Zabbix environments, though it was severely underutilized initially when Zabbix 5.0 came out. So, if you aren’t using throttling yet and you’re running on 5.0 or newer, I definitely suggest trying to implement it to some extent. It is available in Preprocessing section of the item configuration.

Permissions

Robust permission design is essential to a multi-tenant environment. Even though permission logic has seen an addition of roles, the user group to host group relations haven’t been abandoned and still play vital role in overall permission schema.

With roles we still have to utilize the three user types – Users, Admin, and Super admins.

User role overview

Here you can see the user role and the UI elements the user has access to together with API restrictions and the actions the user can perform.

Roles grant the ability to configure access to specific UI elements, actions and restrict API calls in a granular fashion. So, when you’re configuring a role, you will see a screen similar to the one below:

Configuring user roles

User roles

Here you can select User type. The user type restrictions still apply. Users can get access only to Monitoring and Inventory, Admins can get access to except the Administration section, and Super admins can get access to every section, including Administration.

With roles we can further restrict these user types. You can have Super admins with some limitations, so that they could only do specific actions and access specific UI sections.

This option has two core benefits. The first one is security as we can limit what our customers can do and what they can access. The other benefit is in the UX, as we can simplify the UI for our users, especially people not experienced with Zabbix. We can restrict the visibility of the sections that the end users don’t have access to, so they will not be concerned with navigating through multiple sections and subsections that they are not familiar with.

User groups

We still have user groups and user groups to host groups relations, which we have to take into consideration. Access to hosts is defined on User groups. So, we have to define our user groups and assign Full/Read only/Deny permissions on particular host groups. This is how we limit what specific customers can access.

User groups

In addition, we can have host groups defined in a hierarchical manner. For instance, if you have two customers each of then having a “Network Devices” subgroup, we can select to include the root group and all of its subgroups when assigning user group to host group permissions. This is a really elegant and quick way to give a User or an Admin on the customer’s side access to all of their hosts or limit a specific organizational unit to only access what they need, e.g.: only permit access to network devices for network administrators.

Using group hierarchy

High availability

The next important decision is the HA implementation. Going without some sort of HA solution is simply too risky and therefore is not an option with such environments.

HA can be used to minimize downtime and add redundancy.
Zabbix supporst Linux HA tools – PCS, Corosync, Pacemaker, which are used to enable HA. You are also welcome to try and use other third-party tools for HA.
Out-of-the-box HA is planned for Zabbix 6.0.

HA setup

To achieve a quorum in our HA environment, we will require an odd number of nodes. For Zabbix backend HA it is very much recommended to have at least three nodes. Does that mean that you have to deploy 3 Zabbix servers? Not really – our third node is going to be a really small arbitration node, which is simply going to be checking connections to the two other Zabbix nodes and giving a vote to achieve quorum in case of issues with one of the nodes.

In the end we will have three nodes: Zabbix server A, Zabbix server B, and the Arbiter node

An odd number of nodes is recommended to achieve quorum.
Only Active/Passive cluster architecture is supported.
We cannot have two Zabbix nodes active running at the same time and talking to the same database. It is important to use some ‘shoot the other node in the head’ mechanism — STONITH to avoid such split-brain scenarios.

Failure to abide by these requirements can result in issues with database consistency, issues with underlying queries and cleaning up or inserting data. This can cause unexpected Zabbix backend server crashes down the line.

In addition, it is very common to have a requirement for proxies to be deployed with HA. Before implementing HA for proxies, we need to decide if we really do need it. HA adds a significant configuration management overhead. We can have hundreds if not thousands of proxies, and managing HA for each of those can add a significant overhead. Of course, the more comfortable you feel with the HA tools, the easier the deployment and the management of the environment.

Another approach for Zabbix proxy HA can also be implemented by using Zabbix API scripts. We can essentially have two proxies running without the need to have the HA suite. In this case, if proxy A is down, we can use Zabbix API to move a host from proxy A to proxy B.

Using Zabbix API script to change the proxy

Here, host.massupdate is used to change the proxy on the hosts. Combine this with a robust scripting logic and you end up with a very viable approach to move your hosts between proxies in failover scenarios.

Database replication

We have covered the HA for Zabbix server backend and let’s remember that with frontend servers, we can simply bring up additional frontends, for instance, by utilizing Docker containers. But what about the DB redundancy?

Database replication can be used as a form of redundancy for the Zabbix DB. No matter the DB backend – Postgres, MySQL, Oracle, we can deploy multiple DB nodes and utilize the native DB replication or use third-party tools for replication, for instance, Galera Cluster.
I personally prefer using native replication tools as it is a bit more simple and you don’t have to concern yourself with another configuration and management layer that could potentially fail and be a bother to troubleshoot. But this will depend on your requirements, design and skillset.

Let’s look at an example with MySQL replication. You can set it up in many different ways as multiple replication approaches are supported: master/slave, master/master, or even have multiple masters replicating to one another. It is completely up to you how to implement replication, especially if you are already experienced with such deployments.

Which approach is best? At the end of the day it will all depend on your company policies, database backend and a compromise between simplicty and extra redundancy. I definitely suggest delving deeper and studying use cases and articles for the DB backend of your choice, before you decide to go with any particular approach.

Database replication

Database performance tuning

Database tuning is vital for the long term stability of your Zabbix instance. The database defaults might be sufficient for your home office, but for large multi-tenant environments with tens of thousands new values per second they will not suffice. The database defaults depend on the database backend and the database version used, but ideally, these should be tuned and tested, preferably during the design stage, before you have deployed your Zabbix instance in production.

After installing the database backend we need to take a look at the hardware resources available. Ideally, you have already estimated the hardware resources required for your instance and ensured that DB hosts have sufficient memory, CPU resources and storage has been selected according to the I/O requirements. Now, you can move on to tuning your database backend.

As an example with Postgres I used PGTune — an online database tuning tool. This is a simple estimate that should still provide you with a somewhat adequate configuration. Though ideally, you should have a DBA on board that is aware of what kind of data loads you will be dealing with to help you with an optimal database configuration.

Database performance tuning

History table partitioning

In such large environments, you will most likely see that housekeeper cannot keep up with the amount of data stored, unable to clean it up in a timely fashion with the housekeeper processes utilization reaching reaching 100 percent for 20-30 minutes at a time. This will have a negative effect on the overall database performance for the duration of housekeeping.

At this point, it is recommended to implement partitioning for history/trend tables. We can use Postgres with TimescaleDB plugin for this. Partitioning is supported out of the box, and you can configure it in Administration > General > Housekeeping.

For MySQL and Oracle backends we would have to rely on custom partitioning scripts or procedures. In addition, community-provided partitioning scripts are publicly available.

As always – don’t forget to test 3rd party scripts in a test environment before deploying it in production!

Community partitioning solution for MySQL

You can always create your own partitioning script, but you should be aware of what you’re doing and how things should be partitioned. We should always be partitioning only history and trend tables.

History table partitioning with TimescaleDB

TimescaleDB plugin for PostgreSQL DB backends supports out-of-the-box partitioning. You don’t have to rely on community scripts.
On TimescaleDB, we need to specify the chunk_time_interval parameter, which will define the partition chunk size.
In addition, we can also add compression of history/trends, which helps to reduce the history table size by 60-80 percent. Again, in such scenarios, your database is going to be huge — terabytes in size with hundreds of customers, each having thousands of metrics per second. So, compression is a really valuable asset.
The only thing we have to take into account is that compressed data is read-only and cannot be changed post-compression. So, no more changes or inserts are possible for the compressed chunks.

History and trends compression

To-do list

Deploy the latest available Zabbix version. Ideally stick with an LTS version.
Deploy proxy servers, define and configure HA/Replication on Zabbix proxies, as well as on Zabbix servers and databases.
Implement partitioning to improve database performance.
Implement throttling to reduce the volume of the incoming data.
Tune your database! Either use online guides or consult with your DBA.

With our to-do list completed, we can have our Zabbix environment with deployed with redundancy in mind, providing monitoring as a service for hundreds of customers, multiple proxies running for each of the customers, HA in place, and Zabbix performing up to our expectations.

Questions & Answers

Question. Will Zabbix have its native HA solution? Will it be the whole package or does it involve installing individual components and maintaining them?

Answer. It’s planned on the roadmap to have a native HA solution in Zabbix 6.0. You should be able to get your hands on it when the 6.0 beta version gets released. Hopefully, you’ll be able to get your hands on it, test it out yourselves and give us feedback. From the looks of it it should be very much plug-and-play and will remove a lot of management overhead when comparing it with current HA implementation. Right now this is being developed only for the Zabbix backend server. As for the frontend – nothing is stopping you from having multiple frontends pointed at the same DB/Zabbix backend server.

Question. Can I run Zabbix on a single server and sell monitoring service to several customers with fully isolated environments, not just GUI, but also items, triggers, etc.

Answer. Yes, you can. You can have a single Zabbix instance and multiple customers being monitored by this instance. The only extra step that might be required is deploying proxies on the customer’s side. By using permission restrictions, proxy servers, roles, etc., we can then monitor multiple customers from a single Zabbix instance.

Question. When we change proxy, the agent configuration has to be updated. What about HA configuration on the proxy?

Answer. That really depends on the approach. If the agent is getting pointed at the virtual IP address and HA is managed by PCS, Corosync, or Pacemaker, then it should be fine as is and the VIP should just be on the currently active host. So, you’ll be essentially rerouted. With the HA by way of API approach, you can simply allow your agent to accept connections from both proxies. With ServerActive we can also specify multiple endpoints, so agents can actually be prepared for such an environment.

Question. How to merge two different instances into a single monitoring instance?

Answer. This is a complex task. First off, both instances need to have the same major Zabbix backend version. You might simply migrate the history from one instance to another, but then you will have some problems with underlying element IDs. So, in one instance you have your own set of items, triggers, users, etc. with your own set of IDs. These will most likely conflict with the set of IDs on the other instance.

You can do partial migration or use the export function to export your templates, hosts, value maps, network maps. I would try to export as much as I can as migration on SQL level will be a real pain. It is possible if you’re stubborn enough, but it can end up being a really complex task that can take days if not weeks to fully implement and test.

Question. Do subgroups relate to templates as well?

Answer. Subgroups relate to templates in a way where we can also define permissions to reading and modifying templates. For templates, you can also create per-customer templates and assign them to host groups. Users that have access to these host groups can then read or modify the templates.

Scheduled report generation in Zabbix 5.4

2021-07-13 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/scheduled-report-generation-in-zabbix-5-4/14776/

The release of version 5.4 grants Zabbix users the ability to receive scheduled PDF reports in their mailbox, which is a very sought-after feature. This post and the video will cover all-new report-related configuration parameters and walk you through setting up scheduled report generation.

I. Reporting in Zabbix 5.4 (0:45)
II. Scheduled reports (2:26)

- - - Creating a report (2:58)
    - Receiving a report (3:36)
    - Permissions (4:29)
    - Recipients of scheduled reports (6:04)
    - Report prerequisites (7:04)
    - Configuring reports — Web service (8:08)
    - Configuring reports — Server (8:52)
    - Configuring reports — Frontend (9:37)
    - Reports — testing (9:59)
    - Common issues (10:41)

III. Questions & Answers (13:28)

Reporting in Zabbix 5.4

Zabbix 5.4 is our first big step in bringing out-of-the-box reporting for our end users. With this feature, we now have a foundation to build upon in the future and make reporting more robust and versatile over time. Since reports are 100% based on dashboard widgets, it’s only a matter of time until more report-focused widgets get released, thus enabling not only better dashboards, but also improving the reporting functionality.

We have implemented a new web service component responsible for generating reports — of course, you can install this server in a quick and easy fashion by using the provided packages.
Reporting works out of the box without the need to deploy or develop any custom scripts.
The initial configuration is easy to understand and implement.
Reporting will use the existing Email media types to send out these reports.
The reports do respect your permissions, as well as roles introduced in Zabbix 5.2.
You will be able to test the report before implementing it as per our schedules just by clicking the Test button.

Scheduled reports

We have added a new Scheduled report section, where the list of reports is available, displaying the report Name, their Owner, Repeats (daily, weekly, etc.), the Period for which the report is generated, and the Last sent date.

Scheduled reports

NOTE. When you configure new reports, and they have not been sent out yet, the Last sent date will be set to ‘Never.’

Creating a report

When you create a report, you will also have to fill in a couple of fields:

Owner,
Name of the report,
Dashboard, the report will be based on,
Period — if you send the report for the Previous day, Previous week, Previous month, of Previous year,
Cycle — how often you send the report Daily/Weekly/Monthly/ Yearly,
Start time (Zabbix server time is used here),
Start date and end date.

Creating reports

Receiving a report

When you receive a PDF report to your mailbox, you can also use the {TIME} macro to display server time both in the subject and the body of the message.

Receiving a report

In the PDF report, you can display any information from the included dashboard – Graphs, Problems, Latest data, and much more. Thanks to all of the available widgets, we will be able to customize our reports in a very granular fashion.

Receiving a report example

The report does respect user permissions. So, in the example above, the report shows only the data to which the user (either the recipient or the report creator) has access.

Permissions

After upgrading to Zabbix 5.4, you will see two new options in the User roles section:

Scheduled reports UI element. Under the UI elements, you can grant or deny access to the Scheduled reports section. This is accessible only to Super admin and Admin user Types.

Permissions

If the Scheduled reports UI element is unchecked for the role, the user won’t be able to access the Scheduled report section and will see an error message. The same behavior is true if you use a URL to access the Scheduled reports.

Access to scheduled reports denied message for the users of a user role

You can also manage scheduled report permissions in the Access to actions section by checking the Manage scheduled reports box. This action permission grants or denies the ability to create or edit scheduled reports and is also accessible to Admin and Super admin user types.

Manage scheduled reports

If this check box is unchecked, the users won’t be able to create new or edit existing reports, though they will be able to access the UI section and see the list of reports and how they are configured.

Access to Manage scheduled reports restricted

Recipients of scheduled reports

When you are defining a new report, you can select the recipient. Report subscription can contain a user or a user group.

When selecting a user, you can specify to include or exclude the user from the subscription.
User group to host group permissions still apply.
You can specify which user is going to be generating the report – recipient or the creator of the report.:

Report recipients

For example, if we need to send some extra information to our NOC team that might not be directly available to them, you can select Current user, and the report will be generated with the permissions of the report creator. Since it is the admin that is creating the report, you can add some extra information that wouldn’t be visible to your NOC team or other regular users. They still won’t be able to access it in Zabbix, but they’ll receive it in their mailbox if you configure the report for them.

Report prerequisites

Diving a bit deeper into the technical side of things, we need to set up two additional packages to enable the reports:

zabbix-web-service — the additional reporting service by default listening to port 10053. The service needs to be reachable from the Zabbix server and can be deployed on the same machine as our frontend or our server. We also have the option to deploy it on a completely separate machine. The zabbix-web-service package should be available if you have added the Zabbix repository.

#yum install zabbix-web-service

Google Chrome is required. However, on some distributions, Chromium is reported to also work, though this is not 100% tested. Note that Google Chrome packages are not included in Zabbix. The Google Chrome packages can simply be downloaded from the Google Chrome website and then installed on the zabbix-web-service host.

#wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

#yum install google-chrome-stable_current_x86_64.rpm

Configuring reports — Web service

We have a whole new configuration file for the web service. Web service supports many different configuration options:

Logging — similar to that for server and for proxy. You can set up debug levels, select the log types, rotations, and so on.

### Option: LogType –system (syslog), file, console (standard output)
### Option: LogFile–Log file location
### Option: LogFileSize -Size in MB before rotation
### Option: DebugLevel –0 -5

List of allowed server addresses that can access this web service.

### Option: AllowedIP List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers

Timeout settings

### Option: Timeout -Spend no more than Timeout seconds on processing (Default –3)

Listen port

### Option: ListenPort -Service will listen on this port for connections from the server
(Default -ListenPort=10053)

Encryption settings by using certificates. This way the communication with the web service can be secured.

### Option: TLSAccept –unencrypted or cert
### Option: TLSCAFile–pathname of a file containing top level CA(s) certificates
### Option: TLSCertFile–pathname of a file containing the service certificate
### Option: TLSKeyFile–pathname of a file containing the service private key

Configuring reports — Server

In addition, the server settings now contain report-related parameters:

The number of report writer instances.

### Option: StartReportWriters -Number of pre-forked report writer instances.
(Default –0)

NOTE. You need to have at least one StartReportWriter

NOTE. The number of the necessary report writers will depend on the number of reports and how often you generate them.

Zabbix Web Service URL (to be passed on to the server)

### Option: WebServiceURL -URL to Zabbix web service, used to perform web-related tasks. (No default value)
#Example: http://192.168.1.156:10053/report

You need to make sure that we can communicate with the Zabbix Web Service URL and permit the incoming traffic through this port to the web service.

Configuring reports — Frontend

As the last step, you need to enable communication between the frontend and the web service.

In Administration > General > Other, we have a new configuration parameter where you need to specify your frontend URL that will be reachable by the web service.

Frontend URL

Once this is done, we can create a report.

Reports — testing

After you have created the report, you can test it. You can click the Test button and send out your test report to see if it works. The users to which we’re sending the report need to have an Email media assigned to them in the User settings.

NOTE. Currently, {TIME} macros are resolved only with the scheduled generation and are not available in test reports, though this might change in the future.

Testing reports

Common issues

Some parameters can certainly be misconfigured, so let’s look at the most common issues:

Make sure that you have a properly configured Email media assigned to the user that should be receiving the report. Otherwise, they will fail to receive it.

— Make sure that the Email media type settings are properly configured.
— Once you define the media type, if you’re creating it from scratch, make sure that you test the media type and generate a test report.

Media configuration failed

NOTE. Sending out the report failed in this example siince no media is configured for the report recipients.

Make sure that the correct Web service address is configured on the Zabbix server in the WebServiceURL parameter.

— Confirm that the Zabbix server can connect to the Zabbix web service and that so that we can connect to the specified port/IP address.
— Check your firewall settings if the web service is running on a dedicated machine.
— Make sure that third-party security software, such as SELinux or firewalls don’t block the communication.

Wrong WebServiceURL parameter

Otherwise, you will receive an error message on the Frontend. The error messages should be sufficient enough to point you in the right direction.

Make sure that the Web service URL is configured without any typos. Otherwise, you will reach the web service, but the report page will output an error — ‘404 page not found’.

WebServiceURL=http://192.168.1.156:10053/reportwrong

Typos in configuration error message

NOTE. If you see this error message, check for typos in the Zabbix server configuration file for WebServiceURL.

Don’t forget to assign the Frontend URL in Administration > General > Other.

— If a URL is misconfigured, you might start receiving empty reports.
— If the URL syntax is wrong, you will receive an error message about the malformed URL.

Malformed URL error message

Frontend URL configuration parameter

Google Chrome is not pre-packaged with Zabbix,

— You need to have Google Chrome package installed separately. You can download Google Chrome from the official Google Chrome website, for instance, by using wget.

— Make sure that Google Chrome is available via $PATH environmental variable. If you don’t have it configured, you will receive the error message, so you will need to modify the path variable and make sure the executable is available there.

$PATH environmental variable error

Questions & Answers

Question. What are the possibilities to customize the page size like A4, A3?

Answer. It will be based on how you customize your Dashboard. Currently, you cannot customize the page and select portrait or landscape, for instance.

Zabbix proxy performance tuning and troubleshooting

2021-05-04 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/zabbix-proxy-performance-tuning-and-troubleshooting/14013/

Most Zabbix users use proxies, and those running medium to large instances might have encountered some performance issues. From this post and the video, you will learn more about the most common troubleshooting steps to resolve any proxy issues and to detect them as sometimes you might be unaware of an ongoing issue, as well as basic performance tuning to prevent such issues in the future.

I. Zabbix proxy (1:36)
II. Proxy performance issues (5:35)
III. Selecting and tuning the DB backend (13:27)
VI. General performance tuning (16:59)
V. Proxy network connectivity troubleshooting (20:43)

Zabbix proxy

Zabbix proxy can be deployed and most of the time is used to monitor distributed IT infrastructures, for instance, on a remote location to prevent data loss in case of network outages as the proxy collects the data locally and it is then pushed/pulled to/from Zabbix server.

Zabbix proxy supports active and passive modes, so we can push the data to the Zabbix server or have the Zabbix server pull the data from the proxy. Even if we don’t have any remote locations and have a single data center, it is still a good practice to delegate most of your data collection to a proxy running next to your server, especially in medium-sized and large instances. This allows for offloading our data collection and preprocessing performance overhead from the server to the proxy.

Active vs. passive

Whether an active or a passive mode is better for your company at the end of the day will depend on your security policies. We can use passive mode with the server pulling the data from the proxy or active mode with the proxy establishing the connection to the Zabbix server and pushing the data.

Active mode is the default configuration parameter as it is a bit more simple to configure — almost all of the configuration can be done only on the proxy side. Then, we need to add the proxy on the frontend and we’re good to go.

### Option: ProxyMode
#   Proxy operating mode.
#   0 - proxy in the active mode
#   1 - proxy in the passive mode
#
# Mandatory: no
# Default:
# ProxyMode=0

In the case of a passive proxy, we have to make some changes in the Zabbix server configuration file, which would involve a restart of the Zabbix server and, as a consequence, downtime.

Finally, it is all going to boil down to our networking team and the network and security policies, for instance, allowing for passive or active mode only. If both modes are supported, then the active mode is a bit more elegant.

Proxy versions

Another common question is about the proxy version to install and the database backend to use.

The main point here is that the major proxy version should match the major version of the Zabbix server, while minor versions can differ. For instance, Proxy 5.0.4 can be used with Server 5.0.3 and Web 5.0.9 (in this example, the first and the second number should match). Otherwise, the proxy won’t be able to send the data to the server and you will see some error messages in your log files about version mismatch and data formatting not fitting your server requirements.
Proxies support: SQLite / MySQL/ PostgreSQL/ Oracle backends. To install the proxy, we need to select the proper package for either SQLite3, MySQL, PostgreSQL, or just compile proxy with Oracle database backend support.

— SQLite proxy package:

# yum install zabbix-proxy-sqlite3

— MySQL proxy package:

# yum install zabbix-proxy-mysql

— PostgreSQL proxy package:

# yum install zabbix-proxy-pgsql

For instance, if we do # yum install zabbix-proxy-sqlite3 or copy and paste the instructions from the Zabbix website for SQLite, we will later wonder why it is not working for MySQL as there are some unique dependencies for each of these packages.

NOTE. Don’t forget to select the proper package in relation to the proxy DB backend

Proxy performance issues

After we have installed everything and covered the basics of what needs to be done and how to set things up, we can start tuning or proxy and try to detect any potential performance issues.

Detecting proxy performance issues

How can we find out what the root cause of performance issues is or if we are having them at all?

First, we need to make sure that we are actually monitoring our proxy. So, we need to:

Create a host in Zabbix,
Assign this host to be monitored by the proxy. If the host is monitored by the server, it will report the wrong metrics — the Zabbix server metrics, not the Zabbix proxy metrics.

So, we need to create a host and configure it to be monitored by the proxy itself. Then we can use an out-of-the-box proxy monitoring template — Template Apps Zabbix Proxy.

Template App Zabbix Proxy

NOTE. Template Apps Zabbix Proxy gets updated on the git.zabbix page, when we add new components to Zabbix, new internal processes, new gathering processes, and so on, to support these new components.

If you are running an older version of Zabbix, for example, all the way back from version 2.0, make sure that you download the newer template from our git page not to stay in the blind about the newer internal component performance.

Once we have applied the template, we will see performance graphs with information about gathering processes, internal processes, cache usage, and proxy performance, and both the queue and the new values received per second. So, we can actually react to predefined triggers provided by the template, if there is an issue.

Performance graphs

Then, we need to have a look at the administration queue. A large or growing proxy-specific queue can be a sign of performance issues or a misconfiguration. We might have failed to allow our agents to communicate with our proxies or we might have some network issue on our proxy preventing us from collecting data from the proxy.

An issue on the proxy

In this case:

Check the proxy status, graphs, and log files. In the example above, the proxy has been down for over a year, so it should be decommissioned and removed from the Zabbix environment.
Check the agent logs for issues related to connecting to the proxy. For instance, the proxy might be trying to pull the data but have no rights to do so due to no permissions in the agent configuration file.

Lack of server resources

In some cases, we might simply try to monitor way too much on a really small server, for instance, an older version of a Raspberry Pi device. So, we should use tools such as sar or top to identify resource bottlenecks on the proxy server coming, for example, from the storage performance.

sar -wdp 3 5 > disk.perf.txt

sar is a part of a sysstat package, and this command can provide us with information about our storage performance, serialization, wait times, queues, input/output operations per second, and so on. sar can tell us when something might be overloaded especially if we have longer wait times.

NOTE. Don’t get confused by high %util, which is relevant on hard drives, but on an SSD or a RAID setup the utilization is normally very high. While hard drives can handle only one operation at a time, SSD disks or RAID setups support parallel operations. This can cause SSD or RAID util% to skyrocket, which might not necessarily be a sign of an issue.

Proxy queue

Another useful, though a bit hackish, indicator of the proxy performance is the proxy queue on the proxy database — the count of the metrics pending but not yet sent to the server.

We can observe this in real-time by queueing the proxy DB.
A constantly growing number means that we cannot catch up with our backlog — the network is down or there are some performance issues on the server or the proxy, so more data is getting backlogged than sent.
The list of unsent metrics is stored in proxy_history table.
The last sent metric is marked in the IDs table.

select count(*) from proxy_history where id>(select nextid from ids where
table_name="proxy_history");

This value will keep growing if the proxy is unable to send the data at all or due to performance issues. If the network is down, this is to be expected between the proxy and the server. However, if everything is working but the count still keeps growing, we need to investigate for any spamming items, thousands of log lines coming per second, or other performance issues with our storage and/or our database. There might be performance problems on the server due to the server being unable to ingest all of this data in time after a restart, a long downtime, etc. Such a problem should get resolved over time on its own. Otherwise, if there are no significant factors regarding the performance or any recent changes, we need to investigate deeper.

If this value is steadily decreasing, the proxy is actually catching up with the backlog and the incoming data, and is sending data to the server faster than it is collecting new metrics. So, this backlog will get resolved over time.

Configuration frequency

Don’t forget about the configuration frequency. Any configuration changes will be applied on the proxy after ConfigFrequency interval. By default, these changes get applied once an hour, so ConfigFrequency is 3600.

### Option: ConfigFrequency
#   How often proxy retrieves configuration data from Zabbix Server in seconds.
#   For a proxy in the passive mode this parameter will be ignored.
#
# Mandatory: no
# Range: 1-3600*24*7
# Default:
# ConfigFrequency=3600

On active proxies, we can force configuration cache reload by executing config_cache_reload for Zabbix proxy.

#zabbix_proxy -R config_cache_reload
#zabbix_proxy [1972]: command sent successfully

This is another good reason to use active proxies to pick up all of the configuration changes from the server. However, on passive proxies, the only thing we can do is a proxy restart to force a reload of the configuration changes, which is not a good idea. Otherwise, we have to wait for an hour or some other configuration interval until the changes are picked up by the proxy.

Selecting and tuning the DB backend

The next important step is a selection of the database.

SQLite

A common question, which has no clear answer is when to use SQLite and when should we switch to a more robust DB backend.

SQLite is perfect for small instances as it supports embedded hardware. So, if I were to run a proxy on Raspberry or an older desktop machine, I might use SQLite. Even embedded hardware aside, on smaller instances with fewer than 1,000 new values per second, SQLite backend should feel quite comfortable, though a lot will depend on the underlying hardware.

### Option: ConfigFrequency
#   How often proxy retrieves configuration data from Zabbix Server in seconds.
#   For a proxy in the passive mode this parameter will be ignored.
#
# Mandatory: no
# Range: 1-3600*24*7
# Default:
# ConfigFrequency=3600

So, in most cases, when proxies collect less than 1,000 NVPS per second, SQLite proxy DB backends are sufficient. With SQLite, you don’t need to additionally configure the database.

#zabbix_proxy -R config_cache_reload

#zabbix_proxy [1972]: command sent successfully

With SQLite, there’s no need to have additional database configuration, preparation, or tuning. In the proxy configuration file, we just point at the location of the SQLite file.
A single file is created at the proxy startup, which can be deleted if data cleanup is necessary.

### Option: DBName
#   Database name.
#   For SQLite3 path to database file must be provided. DBUser and DBPassword are ignored.
#   Warning: do not attempt to use the same database Zabbix server is using.
#
# Mandatory: yes
# Default:
# DBName=
DBName=/tmp/zabbix_proxy

All in all the SQLite backend comparatively easy to manage However, it comes with a set of negatives. If we need something more robust that we can tune and tweak, then SQLite won’t do. Essentially, if we reach over 1,000 new values per second, I would consider deploying something more robust — MySQL, PostgreSQL, or Oracle.

Other proxy DB backends

Any of the supported DB backends can be used for a proxy. In addition, the Zabbix server and Zabbix proxy can use different DB backends. The DB configuration parameters are very similar in Zabbix server and Zabbix proxy configuration files, so users should feel right at home with configuring the proxy DB backend.
DB and DB user should be created beforehand with the proper collation and permissions.

shell> mysql -uroot -p<password>
mysql> create database zabbix_proxy character set utf8 collate utf8_bin; 
mysql> create user 'zabbix'@'localhost' identified by '<password>'; 
mysql> grant all privileges on zabbix_proxy.* to 'zabbix'@'localhost'; 
mysql> quit;

DB schema import is also a prerequisite. The command for proxy schema import is very similar to the server import.

zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix_proxy

DB Tuning

Make sure to use the DB backend you are most familiar with.
The same tuning rules apply to the Zabbix proxy DB as to the Zabbix server DB.
Default configuration parameters of the backend will depend on the version of the backend used. For instance, different MySQL versions will have different default parameters, so we need to have a look at MySQL documentation, the default parameters, and the way to tune them.
For PostgreSQL, it is possible to use the online tuner — PGTune. Though it is not an ideal instrument, it is a good starting point not to leave the proxy hanging without any tuning as we might encounter issues sooner rather than later. With tuning, the database will be more robust and will last longer before we will have to add any resources and rescale the database config.

PGTune

General performance tuning

Proxy configuration tuning

Database aside, how we can tune the proxy itself?

Proxy configuration is similar to the configuration of the Zabbix server: we still need to take into account and tune our gathering processes, internal processes, such as preprocessors, and our cache sizes. So, we need to have a look at our gathering graphs, internal process graphs, and our cache graphs to see how busy the processes and how full the graphs are and adjust accordingly. This is a lot easier to do on the proxy than on the server since proxy restart is usually quicker and a lot less critical, and less impactful than the server downtime.

In addition, these will differ on each of the proxy servers depending on the proxy size and types of items. For instance, if on proxy A we are capturing SNMP traps, we need to enable the SNMP trapper process and configure our trap handler — Perl, snmptrapd, etc. If we are doing a lot of ICMP pings for another proxy, we’ll require many ICMP pingers. A really large proxy will need to have its History Syncers increased. So, each proxy will be different, and there is no one-fit-all configuration.

Since most of the time proxies handle fewer values since they are distributed and scaled out, we will have a lesser number of History Syncers on proxies in comparison with Zabbix server. In the vast majority of cases, the default number of History Syncers is more than sufficient. Though sometimes we might need to change the count of History Syncers on the proxy.

### Option: StartDBSyncers
#             Number of pre-forked instances of DB Syncers.
#
# Mandatory: no
# Range: 1-100
# Default:
# StartDBSyncers=4

There are always exceptions to the rule. For instance, we might want to have a single large-scale and robust proxy collecting the data from some very critical or very large location with many data points – such an infrastructure layout will still be supported.

If DB syncers do underperform on a seemingly small instance, chances are it is due to lack of hardware resources or, for SQLite, DB backend limitations

We need to monitor the resource usage via sar, or top, or any other tool to make sure that hardware resources aren’t overloaded.

Proxy data buffers

We also have the option to store the data on our proxies if the server is offline or store them even if the Zabbix server is reachable and the data has been sent to the server. we may want to keep our data in the proxy database and utilize it by other third-party tools or integrations.

On our proxies, we have a local buffer and an offline buffer, which determine for how long we can store the data. The size of Local and Offline buffers will affect the size and the performance of your database. The larger the time window for which we store the data, the larger the database is. So, the fewer resources we utilize, the better the performance is, the easier it is to scale up, etc.

Local buffer

### Option: ProxyLocalBuffer
#   Proxy will keep data locally for N hours, even if the data have already been synced with the server.
#
# Mandatory: no
# Range: 0-720
# Default:
# ProxyLocalBuffer=0

Offline buffer

### Option: ProxyOfflineBuffer
#   Proxy will keep data for N hours in case if no connectivity with Zabbix Server.
#   Older data will be lost.
#
# Mandatory: no
# Range: 1-720
# Default:
# ProxyOfflineBuffer=1

Proxy network connectivity troubleshooting

Detecting network issues

Sometimes we have network issues between proxies and the server: either the server cannot talk to proxies or proxies cannot talk to the server.

A good first step would be to test telnet connectivity to/from a proxy.

time telnet 192.168.1.101 10051

Another great method is to time your pings to see how long pinging takes or how long it takes to establish a telnet connection. This could point you towards network latency issues: slow networks, network outages, and so on.
Log file can help you figure out proxy connectivity issues.

125209:20210214:073505.803 cannot send proxy data to server at "192.168.1.101": ZBX_TCP_WRITE() timed out

Load balancers, Traffic inspectors, and other IDS/Firewall tools can hinder proxy traffic. Sometimes it can take hours troubleshooting an issue to find out that it boils down to a load balancer, a traffic inspector, or IDS/firewall tool.

Troubleshooting network issues

A great way to troubleshoot this would be to deploy a test proxy with a different firewall/load balancing configuration. From time to time, network connectivity drops seemingly for no reason. We can bring up another proxy with no load balancers or no traffic inspectors, and ideally, in the same network as the problematic proxies. We need to find out if the new proxy is experiencing the same problems, or if the issue is resolved after we remove the load balancers, IDS/firewall tools. If the problem gets resolved, then this might be a case of misconfigured firewall/IDS.
Another great approach of detecting networking issues due to transport problems, for instance, IDS/Firewalls cutting up our packets, is to perform a tcpdump on proxy and server to correlate network traffic with error messages in the log.

— tcpdump on the proxy:

tcpdump -ni any host -w /tmp/proxytoserver

— tcpdump on the server:

tcpdump -ni any host -w /tmp/servertoproxy

— Correlating retransmissions with errors in logs could signify a network issue.

Many retransmissions may be a sign of network issues. If there are a few of them, if we open Wireshark to find just a couple of retransmits, it might not be the root cause. However, if the majority of our packet capture result is read with duplicate packets, retransmits, acknowledges without data packets being received, etc., that can be a sign of an ongoing network issue.

Ideally, we could take a look at this packet capture and correlate it with our proxy log file to figure out if these error messages in our proxy logfile or server logfile (depending on the type of communication — active or passive) correlate with packet capture issues. If so, we can be quite sure that the networking issue is at fault and then we need to figure out what is causing it — IDS, load balancers, a shoddy network, or anything else.

User roles for the enterprise

2021-02-02 Arturs Lontons

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/user-roles-for-the-enterprise/12887/

In this post, we’ll talk about granular user roles introduced in Zabbix 5.2 and some scenarios where user roles should be used and where they give a great benefit to these specific environments.

I. Permissions granularity (0:40)
II. User Roles in 5.2 (5:16)
III. Example use cases (16:16)
IV. Questions & Answers (h2)

Permissions granularity

Let’s consider two roles: the NOC Team role and the Network Administrator role. These are quite different roles requiring different permission levels. Let’s not also forget that the people working in these roles usually have different skill sets, therefore the user experience is quite important for both of these roles: NOC Team probably wants to see only the most important, the most vital data, while the Network Administrators usually require permissions to view data in more detail and have access to more detailed and granular information overviews of what’s going on in your environment.

For our example, let’s first define the requirements for these roles.

NOC Team role:

They will definitely require access to dashboards and maps.
We will want to restrict unnecessary UI elements for them just to improve the UX. In this case – less is more. Removing the unused UI elements will make the day-to-day workflow easier for the NOC team members who aren’t as proficient with Zabbix as our Monitoring team members.
For security reasons we need to restrict API access because NOC team members will either use API very rarely or not at all. With roles we can restrict the API access either partially or completely.
The ability to modify the existing configuration will be restricted, as the NOC team will not be responsible for changing the Zabbix configuration.
The ability to close problems manually will be restricted, since the network admin team will be responsible for that.

Network Administrator role:

Similar to the NOC team, the Network Administrators also require access to dashboards and maps. what’s going on in your environment, the health of the environment.
They need to have access to configuration, since members of this team are responsible for making configuration changes.
Most likely, instead of disabling the API access for our network administrator role, we would want to restrict API access in some way. They might still need access to get or create methods, while access to everything else should be restricted.
For each of our roles we will be implementing a UI cleanup by restricting UI elements – we will hide the functionality that we have opted out of using.

Roles and multi-tenancy

Granular permissions are one of the key factors in multi-tenant environments. We could use permissions to segregate our environment per tenant, but in 5.2 that’s not the end of it:

Imagine multiple tenants where each has different monitoring requirements. Some want to use the services function for SLA calculation, others want to use inventory, or need the maps and the dashboards.
Restricting access to elements and actions per tenant is important. So, for example, some tenants wish to be able to close problems manually, others need to have restrictions on map or dashboard creations for a specific user group..
Permissions are still used to enable isolation between tenants on host group level

User Roles in 5.2

With Zabbix 5.2 these use cases, which require additional permission granularity, are now fully supported.

So, let’s take a look at how the User Role feature looks in a real environment.

User role

User roles in Zabbix 5.2 are something completely new. Each user will have a role assigned to them on top of their User Type:

User permissions

We end up having our User types being linked to User roles, and User roles linked to Users. This means that User types are linked to Users indirectly through the User roles.

User types

The User, Admin, and Super admin types are still in use. The role will be linked to one of these 3 user types.

User roles

Note that User type restrictions still apply.

Super admin has access to every section: Administration, Configuration, Reports, Inventory, and Monitoring.
Admin has access to Configuration, Reports, Inventory, and Monitoring.
User has access to Reports, Inventory, and Monitoring.

Frontend sections restricted by User type

Default User roles

Once we upgrade to 5.2 or install a fresh 5.2 instance, we will have a set of default user roles. The 4 pre-configured user roles are available under Administration > User roles:

Super admin,
Admin,
User, and
Guest.

Super admin role

The default Super admin role is static. It is set up by default once you upgrade or install a fresh instance. Users cannot modify this role.

All of the other default roles can be modified. In the Zabbix environment, we must have at least a single user with this Super admin role that has access to all of Zabbix functionality. This is similar to the root user in the Linux OS.

Newly created roles of either Super admin, Admin, or User types can be modified. For example, we can create another Super admin role, change the permissions. For instance, we can have a Super admin that doesn’t have access to Administration > General, but has access to everything else.

User role section

Once we open the User roles section, we will see a list of features and functions that we can restrict per user role.

When we create a new role or open a pre-created role they will have the maximum allowed permissions depending on the User type that is used for the role.

Each of the default roles contains the maximum allowed permissions per user type

UI element restriction

We can restrict access to UI elements for each role. If we wish to create a NOC role we can restrict them to have access only to Dashboards and maps. When we open the User up and go to Permissions we will see the available sections highlighted in green.

NOC user role that has access only to Dashboards and maps

Once we open up the Dashboards or the Monitoring section, we will see only the UI sections in our navigation menu that have been permitted for this specific user.

Global view: NOC user role that has access only to Dashboards and maps

Host group permissions

Note, that User Group access to Host Groups still has to be properly assigned. For instance, when we open the Dashboard, we still have to check if this user belongs to a user group, which has access to a specific host group. Then we will either display or hide the corresponding data.

User Group access to Host Group

Access to API

API access can also be restricted for each role. Depending on the Access to API “Enabled” checkbox the corresponding user of this specific role will be permitted or denied to access the API.

Used when creating API specific user roles

In addition to that, we can allow or restrict the execution of specific API methods. For this we can use an Allow or Deny list. For instance, we could create a user that has access only to get methods: they can read the data, but they cannot modify the data.

Restricting API method

Let’s use host.create method as an example. If I don’t have permission to do so, I will see an error message ‘no permissions to call’ and then the name of the call — host.create in this case.

Access to actions

Each role can have a specific list of actions that it can perform with respect to the role User type.

In this context, ‘Actions’ mean what this user can do within the UI: Do we wish for the user to be able to close problems, acknowledge them, create or edit maps.

Defining access to actions

NOTE. For the role of type ‘User’, the ‘Create and edit maintenance’ will be grayed out because the User type by default doesn’t have access to the Maintenance section. You cannot enable it for the role of User type, but you can enable or disable it for the Admin type role.

Restricting Actions example

Let’s restrict the role for acknowledging and closing problems. Once we define the restriction the acknowledgment and closing of problems will be grayed out in the frontend.

If we enable it (the checkboxes are editable), we can acknowledge and close problems.

Restricted role

Unrestricted role

Default access

We can also modify the Default access section. We can define that a role has default access to new actions, modules, and UI elements. For instance, if we are importing a new frontend module or upgrading our version 5.2 to version 6.0 in the future – if any new UI elements, modules or action types appear, do we want for this specific role to have access to it by default once it is created or should this role by default have restricted access to all of these new elements that we are creating?

This allows to give access to any new UI elements for our Super Admin users while disabling the for any other User roles.

Default access for new elements of different types can be enabled or disabled for user roles

If Default access is enabled, whenever a new element is added, the user belonging to this role will automatically have access to it.

Role assignment post-upgrade

How are these roles going to be assigned after migration to 5.2? I have my users of a specific User type, but what’s going to happen with roles? Will I have to assign them manually?

When you upgrade to 5.2 from, for example, 5.0, the users will have the pre-created default roles for Admin, User, and Super admin assigned for them based on their types.

Pre-created roles after migration

This allows us to keep things as they were before 5.2 or go ahead with creating new User roles.

Example use cases

The following example use cases will give you an idea of how you can implement this in your environment.

Read-only role

ANOC Team User role, with no ability to create or modify any elements:

read-only access to dashboards,
no access to problems,
no access to API, and
no permissions to execute frontend scripts.

When we are defining this new role, we will mark the corresponding checkboxes in the Monitoring section. The User type for this role is going to be ‘User’ because they don’t need to have access to Administration or Configuration.

User type and sections the role has access to

We will also restrict access to actions, the API, and decide on the new UI element and module permission logic. Default access to new actions and modules will be restricted. Read up on Zabbix release notes to see if any new UI elements have been added in future releases!

Read-only role

When we log in with this user and go to Dashboards, we will see that this user has no option to create or edit a dashboard because we have restricted such an action. The access is still granted based on the Dashboard permissions — depending on whether it is a public or a private dashboard. When they open it up, the data that they will see will depend on the User group to Host group relationship.

When this user opens up the frontend, he will see that access to the unnecessary UI elements is restricted (the restricted UI elements are hidden). Even though he has access to the Problem widget on the dashboard, they are unable to acknowledge or close the problem as we have restricted those actions.

Restricted UI elements hidden and ‘Acknowledge’ button unclickable for this Role

Restrict access to Administration section

Another very interesting use case — restricting access to Administration sections. Administration sections are available only for our Super admins, but, in this case, we want to have a separate role of type Super admin that has some restrictions.

Our Super admin type role that has no access to User сonfiguration and General Zabbix settings will need to be able to:

create and manage proxies,
define media types and frontend scripts, and
access the queue to check the health of our Zabbix instance.

But they won’t be able to create new User groups, Users, and so on.

So, we are opening our Administration > User roles section, creating a new role of type Super admin, and restricting all of the user-related sections, and also restricting access to Administration > General.

User type – Super admin. General and User sections are restricted for this role

When we log in, we can see that there is no access to Administration > General section because we have restricted the ability to change housekeeper settings, trigger severities, and other settings that are available in Administration > General.

But the Monitoring Super admin user still has the ability to create new Proxies, Media Types, Scripts and has access to the Queue section. This is a nice way to create different types of Super admins which was not possible before 5.2.

Access to Administration section elements

Roles for multi-tenant environment

Zabbix Dashboards and maps are used by multiple tenants to provide monitoring data.

In our example, we will imagine a customer portal that different tenants can access. They log in to Zabbix and based on their roles and permissions can access different elements. One of our Tenant requires a NOC role :

read-only access to dashboards,
read-only access to maps,
no access to API,
no access to configuration,
isolation per tenant so we won’t be able to see the host status of other tenants.

We will create a new role in Administration > User roles — new role of type User. We will restrict access only to the UI elements that need to be visible for the users belonging to this role.

User type role with very limited access to UI

Since we need to have isolation, we will also be using tag-based permissions to isolate our Hosts per tenant. We’ll go to Permissions section, add read-only or write permissions on a User group to a specific Host group. Then we will also define the tag-based permissions so that these users have access only to problems that are tagged with a specific tag.

Tag-based permissions to isolate our Hosts per tenant

Don’t forget to actually tag those problems and define these tags either on the trigger level or on the host level.

Tagging on the host level

Once we have implemented this, if we open up the UI, we go to Monitoring > Dashboards. We can see that:

The UI is restricted only to the required monitoring sections.
Tag-based permission ensure that we are seeing problems related to our specific tenant.

Isolation and role restriction have been implemented, and we can successfully have our multi-tenant environment.

Roles for multi-tenant environments

What’s next?

How would you proceed with upgrading to Zabbix 5.2 and implementing this? At the design stage, you need to understand that User roles can help you with a couple of things and you need to estimate and assign value to these things if you want to implement them in your environment.

User roles can improve auditing. Since you have restricted roles per each user it’s easier to audit who did what in your environment.
Restricting API access. We can not only enable or disable API access, but we can also restrict our users to specific methods. From the security and auditing perspective, this adds a lot of flexibility.
Restricting configuration. We can restrict users to specific actions or limit their access to specific Configuration sections as in the example with the custom Super admin role. This allows us to have multiple tiers of admins in our environment
Removing unwanted UI elements. By restricting access to only the necessary UI elements we can give Zabbix a much cleaner look and improve the UX of your users.

Thank you! I hope I gave you some insight into how roles can be used and how they will be implemented in Zabbix 5.2. I hope you aren’t too afraid to play around with this new set of features and implement them in your environment.

Questions & Answers

Question. Can we have a limited read-only user that will have access to all the hosts that are already in Zabbix and will be added in the future?

Answer. Yes, we can have access to all of the existing Host groups. But when you add a new Host Group, you will have to go to your Permissions section and assign User Group to Host Group permissions for the newly added group.

Question. So that means that now we can have a fully customizable multi-tenant environment?

Answer. Definitely. Fully customizable based both on our User group to Host group permissions and roles to make the actions and different UI sections available as per the requirements of our tenants.

Question. I want to create a user with only API access. Is that possible in 5.0 or 5.2?

Answer. It’s been possible for a while now. You can just disable the frontend access and leave the user with the respective permissions on specific Host groups. But with 5.2 you can make the API limitations more granular. So, you can say that this API-only user has access only to specific API methods

Question. Can we make a user who can see but cannot edit the configuration?

Answer. Partially. For read-only users, read-only access still works for the Monitoring section. But if we go to Configuration, if we want to see anything in the Configuration section, we need write access.You can use Monitoring > Hosts section, where you can see partial configuration. Configuration section unfortunately still is not available for read-only access.

Check out the video to learn how to configure Zabbix agent autoregistration.

Tips and best practices:

BSM Checklist

Define your business

Configuring SLAs

Track, solve and measure

Questions

Zabbix server High Availability cluster

Business service monitoring

New Audit log schema

Machine learning

New ways to visualize your data

Zabbix agent – improvements and new items

Custom Zabbix password complexity requirements

UI/UX improvements

New templates and integrations

Other changes and improvements

Questions and answers

Check out the video to learn how to deploy the Zabbix Agent in passive and active modes

Tips and best practices:

Check out the video to learn how to visualize your metrics by using the Graph widget.

Tips and best practices:

Deploying the Zabbix Server in AWS

Accessing the application

Accessing the database

Modifying the Zabbix Frontend timezone

Zabbix proxy

Zabbix 6.0 LTS release date

Zabbix 6.0 LTS Workshops

Upcoming events

Check out the video to learn how to define data storage periods on your Zabbix instance.

Tips and best practices::

1. Count entities with CountOutput

2. Use API to perform Configuration export/import

3. Expand trigger functions and macros with expand parameters

4. Obtaining additional LLD information for a discovered item

5. Searching through the matched entities with search parameters

Native Zabbix server cluster

Business service monitoring and root cause analysis

Audit log redesign

Geographical maps

Machine learning

Creating a new named Zabbix API token

Simplifying API tasks with the named API token

Utilizing Zabbix API scripts in Actions

Throttling and other preprocessing steps

Understanding “Do not keep history” option

Using trend functions with high-frequency monitoring

Updating your templates

The benefits of keeping templates up to date

Zabbix self-monitoring templates

Tracking template changes

Low-level discovery overrides

Multiple overrides in a single Low-level discovery rule

Network interface discovery override

Contents

Monitoring requirements of multi-tenant environments

Supported monitoring approaches

Agent

Latest improvements

Zabbix and multi-tenant environment

Zabbix proxy

Passive and active proxies

Data preprocessing — throttling

Permissions

User roles

User groups

High availability

Database replication

Database performance tuning

History table partitioning

History table partitioning with TimescaleDB

To-do list

Questions & Answers

Contents

Reporting in Zabbix 5.4

Scheduled reports

Creating a report

Receiving a report

Permissions