Tag Archives: scripts

Extending Zabbix: the power of scripting

Post Syndicated from Giedrius Stasiulionis original https://blog.zabbix.com/extending-zabbix-the-power-of-scripting/27401/

Scripts can extend Zabbix in various different aspects. If you know your ways around a CLI, you will be able to extend your monitoring capabilities and streamline workflows related to most Zabbix components.

What I like about Zabbix is that it is very flexible and powerful tool right out of the box. It has many different ways to collect, evaluate and visualize data, all implemented natively and ready to use.

However, in more complex environments or custom use cases, you will inevitably face situations when something can’t be collected (or displayed) in a way that you want. Luckily enough, Zabbix is flexible even here! It provides you with ways to apply your knowledge and imagination so that even most custom monitoring scenarios would be covered. Even though Zabbix is an open-source tool, in this article I will talk about extending it without changing its code, but rather by applying something on top, with the help of scripting. I will guide you through some examples, which will hopefully pique your curiosity and maybe you will find them interesting enough to experiment and create something similar for yourself.

Although first idea which comes to ones mind when talking about scripts in Zabbix is most likely data collection, it is not the only place where scripts can help. So I will divide those examples / ideas into three sub categories:

  • Data collection
  • Zabbix internals
  • Visualization

Data collection

First things first. Data collection is a starting point for any kind of monitoring. There are multiple ways how to collect data in “custom” ways, but the easiest one is to use UserParameter capabilities. Basics of it are very nicely covered by official documentation or in other sources, e.g. in this video by Dmitry Lambert, so I will skip the “Hello World” part and provide some more advanced ideas which might be useful to consider. Also, the provided examples use common scripting themes/scenarios and you can find many similar solutions in the community, so maybe this will serve better as a reminder or a showcase for someone who has never created any custom items before.

Data collection: DB checks

There is a lot of good information on how to setup DB checks for Zabbix, so this is just a reminder, that one of the ways to do it is via custom scripts. I personally have done it for various different databases: MySQL, Oracle, PostgreSQL, OpenEdge Progress. Thing is ODBC is not always a great or permitted way to go, since some security restrictions might be in place and you can’t get direct access to DB from just anywhere you want. Or you want to transform your retrieved data in a ways that are complex and could hardly be covered by preprocessing. Then you have to rely on Zabbix agent running those queries either from localhost where DB resides or from some other place which is allowed to connect to your DB. Here is an example how you can do it for PostgreSQL

#!/bin/bash

my_dir="$(dirname ${0})"
conf_file="${my_dir}/sms_queue.conf"

[[ ! -f $conf_file ]] && echo -1 && exit 1

. ${conf_file}

export PGPASSWORD="${db_pass}"

query="SELECT COUNT(*) FROM sms WHERE sms.status IN ('retriable', 'validated');"

psql -h "${db_host}" -p "${db_port}" -U "${db_user}" -d "${db}" -c "${query}" -At 2>/dev/null

[[ $? -ne 0 ]] && echo -1 && exit 1

exit 0

Now what’s left is to feed the output of this script into Zabbix via UserParameter. Similar approach can be applied to Oracle (via sqlplus) or MySQL.

Data collection: log delay statistics

I once faced a situation when some graphs which are based on log data started having gaps. It meant something was wrong either with data collection (Zabbix agent) or with data not being there at the moment of collection (so nothing to collect). Quick check suggested it was the second one, but I needed to prove it somehow.

Since these log lines had timestamps of creation, it was a logical step to try to measure, how much do they differ from “current time” of reading. And this is how I came up with the following custom script to implement such idea.

First of all, we need to read the file, say once each minute. We are talking about log with several hundreds of thousands lines per minute, so this script should be made efficient. It should read the file in portions created between two script runs. I have explained such reading in details here so now we will not focus on it.

Next what this script does is it greps timestamps only from each line and counts immediately number of unique lines with the same timestamp (degree of seconds). That is where it becomes fast – it doesn’t need to analyze each and every line individually but it can analyze already grouped content!

Finally, delay is calculated based on the difference between “now” and collected timestamps, and those counters are exactly what is then passed to Zabbix.

#!/bin/bash

my_log="${1}"

my_project="${my_log##*\/}"
my_project="${my_project%%.log}"

me="$(basename ${0})"
my_dir="/tmp/log_delays/${my_project}"

[[ ! -d ${my_dir} ]] && mkdir -p ${my_dir}

# only one instance of this script at single point of time
# this makes sure you don't damage temp files

me_running="${my_dir}/${me}.running"

# allow only one process
# but make it more sophisticated:
# script is being run each minute
# if .running file is here for more than 10 minutes, something is wrong
# delete .running and try to run once again

[[ -f $me_running && $(($(date +%s)-$(stat -c %Y $me_running))) -lt 600 ]] && exit 1

touch $me_running

[[ "${my_log}" == "" || ! -f "${my_log}" ]] && exit 1

log_read="${my_dir}/${me}.read"

# get current file size in bytes

current_size=$(wc -c < "${my_log}")

# remember how many bytes you have now for next read
# when run for first time, you don't know the previous

[[ ! -f "${log_read}" ]] && echo "${current_size}" > "${log_read}"

bytes_read=$(cat "${log_read}")
echo "${current_size}" > "${log_read}"

# if rotated, let's read from the beginning

if [[ ${bytes_read} -gt ${current_size} ]]; then
  bytes_read=0
fi



# get the portion

now=$(date +%s)

delay_1_min=0
delay_5_min=0
delay_10_min=0
delay_30_min=0
delay_45_min=0
delay_60_min=0
delay_rest=0

while read line; do

  [[ ${line} == "" ]] && continue

  line=(${line})

  ts=$(date -d "${line[1]}+00:00" +%s)

  delay=$((now-ts))

  if [[ ${delay} -lt 60 ]]; then
    delay_1_min=$((${delay_1_min}+${line[0]}))
  elif [[ ${delay} -lt 300 ]]; then
    delay_5_min=$((${delay_5_min}+${line[0]}))
  elif [[ ${delay} -lt 600 ]]; then
    delay_10_min=$((${delay_10_min}+${line[0]}))
  elif [[ ${delay} -lt 1800 ]]; then
    delay_30_min=$((${delay_30_min}+${line[0]}))
  elif [[ ${delay} -lt 2700 ]]; then
    delay_45_min=$((${delay_45_min}+${line[0]}))
  elif [[ ${delay} -lt 3600 ]]; then
    delay_60_min=$((${delay_60_min}+${line[0]}))
  else
    delay_rest=$((${delay_rest}+${line[0]}))
  fi

done <<< "$(tail -c +$((bytes_read+1)) "${my_log}" | head -c $((current_size-bytes_read)) | grep -Po "(?<=timestamp\":\")(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})(?=\.)" | sort | uniq -c | sort -k1nr)"

echo "delay_1_min=${delay_1_min}
delay_5_min=${delay_5_min}
delay_10_min=${delay_10_min}
delay_30_min=${delay_30_min}
delay_45_min=${delay_45_min}
delay_60_min=${delay_60_min}
delay_rest=${delay_rest}"



rm -f "${me_running}"

exit 0

Now on Zabbix side, there is an item running this script and 7 dependent items, representing the degree of delay. Since there are many logs for which this data is collected, it is all put into LLD based on contents of specific directory:

vfs.dir.get[/var/log/logs,".*log$",,,,,1000]

This LLD then provides two macros:

And item prototypes will look like:

Those dependent items have one simple preprocessing step which takes needed number out of the script output:

So the final result is the nice graph in dashboard, showing exactly when and what degree delays do appear:

So as you see, it is relatively easy to collect just about any data you wish, once you know how. As you can see from these examples, it might be something more complex but it can also be just a simple one-liner – in any case it should be obvious that possibilities are endless when talking about scripts in data collection. If something is executable from the CLI and has a valuable output, go ahead and collect it!

Zabbix internals

Another area where scripts can be really useful is adjusting how Zabbix behaves or controlling this behavior automatically. And in this case, we will employ Zabbix API, since it’s designed exactly for such or similar purposes.

Zabbix internals: automatically disabling problematic item

In our environment, we have many logs to be analyzed. And some of them sometimes go crazy – something that we intend to catch starts appearing there too often and requires attention – typically we would have to adjust the regexp, temporarily suppress some patterns and inform responsible teams about too extensive logging. If you don’t (or can’t) pay attention quick, it might kill Zabbix – history write cache starts filling up. So what we do is automatically detect such an item with most values received during some most recent short period of time and automatically disable it.

First of all there are two items – the one measuring history write cache and the other one extracting top item in the given table

[root@linux ~]# zabbix_agentd -t zabbix.db.max[history_log,30] 2>/dev/null
zabbix.db.max[history_log,30] [t|463 1997050]
[root@linux ~]#

First number here is values gathered during provided period, second one is item id. The script behind this item looks like this

[root@linux ~]# grep zabbix.db.max /etc/zabbix/zabbix_agentd.d/userparameter_mysql.conf
UserParameter=zabbix.db.max[*],HOME=/etc/zabbix mysql -BN -e "USE zabbix; SELECT count(*), itemid FROM $1 WHERE clock >= unix_timestamp(NOW() - INTERVAL $2 MINUTE) GROUP BY itemid ORDER BY count(*) DESC LIMIT 1;"
[root@linux ~]#

And now relying on the history write cache item values showing us drop, we construct a trigger:

And as a last step, such trigger invokes action, which is running the script that disables the item with given ID with the help of Zabbix API, method “item.update”

Now we are able to avoid unexpected behavior of our data sources affecting Zabbix performance, all done automatically – thanks to the scripts!

Zabbix internals: add host to group via frontend scripts

Zabbix maintenance mode is a great feature allowing us to reduce noise or avoid some false positive alerts once specific host is known to have issues. At some point we found it would be convenient to be able to add (or remove) specific host into (from) maintenance directly from “Problems” window. And that is possible and achieved via a frontend script, again with the help of Zabbix API, this time methods “host.get”, “hostgroup.get”, “hostgroup.massadd” and “hostgroup.massremove”

Data visualization

Zabbix has many different widgets that are able to cover various different ways of displaying your collected data. But in some cases, you might find yourself missing some small type of “something” which would allow your dashboards to shine even more – at least I constantly face it. Starting From version 6.4 Zabbix allows you to create your own widgets but it might be not such a straightforward procedure if you have little or no programming experience. However, you can employ two already existing widgets in order to customize your dashboard look in pretty easy way.

Data visualization: URL widget

First one example is done using the URL widget. You might feed just about any content there, so if you have any web development skills, you can easily create something which would look like custom widget. Here is an example. I need a clock but not the one already provided by Zabbix as a separate clock widget – I want to have a digital clock and I also want this clock to have a section, which would display the employee on duty now and in an upcoming shift. So with a little bit of HTML, CSS and JavaScript / AJAX, I have this

With styles properly chosen, such content can be smoothly integrated into dashboards, along with other widgets.

Data visualization: plain text widget with HTML formatting

Another useful widget which is often overlooked is the “Plain text” widget – in combination with the following parameters:

It becomes a very powerful tool to display nicely formatted data snapshots. Simple yet very good example here would be to display some content, which requires human readable structure – a table.

So again, integration with other dashboard widgets is so smooth – with just some custom HTML / CSS around your data you wrap it into something that looks like brand new “table” widget. Isn’t it awesome? And you are of course not limited to tables… Just use your imagination!

Conclusion

Although I personally prefer bash as the first option to solve things, there is no big difference regarding which scripting or programming languages to choose when extending Zabbix in these ways. Just try anything you feel most comfortable with.

I hope that examples shown here inspired you in some ways. Happy scripting!

The post Extending Zabbix: the power of scripting appeared first on Zabbix Blog.

Advanced Zabbix API – 5 API use cases to improve your API workfows

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/advanced-zabbix-api-5-api-use-cases-to-improve-your-api-workfows/16801/

As your monitoring infrastructures evolve, you might hit a point when there’s no avoiding using the Zabbix API. The Zabbix API can be used to automate a particular part of your day-to-day workflow, troubleshoot your monitoring or to simply analyze or get statistics about a specific set of entities.

In this blog post, we will take a look at some of the more advanced API methods and specific method parameters and learn how they can be used to improve your API workflows.

1. Count entities with CountOutput

Let’s start with gathering some statistics. Let’s say you have to count the number of some matching entities – here we can use the CountOutput parameter. For a more advanced use case – what if we have to count the number of events for some time period? Let’s combine countOutput with time_from and time_till (in unixtime) and get the number of events created for the month of November. Let’s get all of the events for the month of November that have the Disaster severity:

{
"jsonrpc": "2.0",
"method": "event.get",
"params": {
"output": "extend",
"time_from": "1635717600",
"time_till": "1638223200",
"severities": "5",
"countOutput": "true"
},
"auth": "xxxxxx",
"id": 1
}

2. Use API to perform Configuration export/import

Next, let’s take a look at how we can use the configuration.export method to export one of our templates in yaml:

{
"jsonrpc": "2.0",
"method": "configuration.export",
"params": {
"options": {
"templates": [
"10001"
]
},
"format": "yaml"
},
"auth": "xxxxxx",
"id": 1
}

Now let’s copy and paste the result of the export and import the template into another environment. It’s extremely important to remember that for this method to work exactly as we intend to, we need to include the parameters that specify the behavior of particular entities contained in the configuration string, such as items/value maps/templates, etc. For example, if I exclude the templates parameter here, no templates will be imported.

{
"jsonrpc": "2.0",
"method": "configuration.import",
"params": {
"format": "yaml",
"rules": {
"valueMaps": {
"createMissing": true,
"updateExisting": true
},
"items": {
"createMissing": true,
"updateExisting": true,
"deleteMissing": true
},
"templates": {
"createMissing": true,
"updateExisting": true
},

"templateLinkage": {
"createMissing": true
}
},
"source": "zabbix_export:\n version: '5.4'\n date: '2021-11-13T09:31:29Z'\n groups:\n -\n uuid: 846977d1dfed4968bc5f8bdb363285bc\n name: 'Templates/Operating systems'\n templates:\n -\n uuid: e2307c94f1744af7a8f1f458a67af424\n template: 'Linux by Zabbix agent active'\n name: 'Linux by Zabbix agent active'\n 
...
},
"auth": "xxxxxx",
"id": 1
}

3. Expand trigger functions and macros with expand parameters

Using trigger.get to obtain information about a particular set of triggers is a relatively common practice. One particular caveat that we have to consider is that by default macros in trigger name, expression or descriptions are not expanded. To expand the available macros we need to use the expand parameters:

{
"jsonrpc": "2.0",
"method": "trigger.get",
"params": {
"triggerids": "18135",
"output": "extend",
"expandExpression":"1",
"selectFunctions": "extend"
},
"auth": "xxxxxx",
"id": 1
}

4. Obtaining additional LLD information for a discovered item

If we wish to display additional LLD information for a discovered entity, in this case – an item, we can use the selectDiscoveryRule and selectItemDiscovery parameters.
While selectDiscoveryRule will provide the ID of the LLD rule that created the item, selectItemDiscovery can point us at the parent item prototype id from which the item was created, last discovery time, item prototype key, and more.

The example below will return the item details and will also provide the LLD rule and Item prototype IDs, the time when the lost item will be deleted and the last time the item was discovered:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"itemids":"36717",
"selectDiscoveryRule":"1",
"selectItemDiscovery":["lastcheck","ts_delete","parent_itemid"]
}, "auth":"xxxxxx",
"id": 1
}

5. Searching through the matched entities with search parameters

Zabbix API provides a couple of standard parameters for performing a search. With search parameter, we can search string or text fields and try to find objects based on a single or multiple entries. searchByAny parameter is capable of extending the search – if you set this as true, we will search by ANY of the criteria in the search array, instead of trying to find an entity that matches ALL of them (default behavior).

The following API call will find items that match agent and Zabbix keys on a particular template:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"output": "extend",
"templateids": "10001",
"search": {
"key_": ["agent.","zabbix"]
},
"searchByAny":"true",
"sortfield": "name"
},
"auth": "xxxxxx",
"id": 1
}

Feel free to take the above examples, change them around so they fit your use case and you should be able to quite easily implement them in your environment. There are many other use cases that we might potentially cover down the line – if you have a specific API use case that you wish for us to cover, feel free to leave a comment under this post and we just might cover it in one of the upcoming blog posts!