All posts by Brian van Baekel

Monitoring Juniper Mist wireless network

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/monitoring-juniper-mist-wireless-network/19093/

As Premium Zabbix partner, Opensource ICT Solutions is building Zabbix solutions all over the world. That means we have customers with a broad variety of requirements, thoughts on how to monitor things, which metrics are important and how to alert upon it. If one of those customers approaches us with a question concerning a task the likes of which we have never done before, it’s a challenge. And we love challenges! This blog post will cover one such challenge that we solved some time ago.

Quanza is a leading infrastructure operator offering a broad portfolio of services to completely take over the management of networks, data centers and cloud services. With more than 70 colleagues and at least as many specializations, everyone at Quanza works towards the same goal: designing, building, and operating an optimal IT infrastructure. Exactly like you would expect it… and then some. Quanza understands that you prefer to focus on your own innovation. By continuously mapping out your wishes, Quanza provides customized solutions that keep your network up and running 24×7. Today and in the future.

With a relentless focus on mission-critical environments, often of relevance to society, Quanza has an impressive line-up of customers. Some enterprises that chose to partner up with Quanza are SURF, Payvision, the Volksbank, and the Amsterdam Internet Exchange (AMS-IX), one of the world’s largest internet hubs.

Recently, customers started asking Quanza to embed Juniper MIST products for wired and wireless networks in their service portfolio. In order to fully support the network’s lifecycle (build, operate and innovate), the Juniper MIST products will need to be monitored by their 24×7 NOC. This is where we came into play, with our Zabbix knowledge.

We quickly decided to combine the knowledge Quanza has of the Juniper MIST equipment and API and our Zabbix knowledge to build the best possible monitoring solution.

SNMP or cloud?

The Juniper MIST solution is a cloud-based solution that provides a single pane of management for Juniper Networks products. As it’s cloud-based, it’s not a “traditional” network solution. As such, SNMP is not an option for device monitoring as they are communicating only with “the cloud” and we cannot access them directly like we used to do with traditional network equipment.

So, we started to investigate other options. One of the most common options right now is talking to some sort of API and pulling the metrics from that API. With Zabbix “HTTP agent” item key, this is no problem at all. Unfortunately, that’s not how the MIST API works. It’s pushing data instead of letting you pull it (actually, it does – but this doesn’t scale at all). Now, the Zabbix HTTP Agent item type allows trapping, but only in a specific Zabbix sender format. Of course, the MIST API does not allow that.

This means we have a problem. SNMP is not available. Pulling data is not a viable, scalable option. Pushing the data is an option, but Zabbix does not understand that.

Since we are not talking about some sort of proprietary monitoring tool which is completely closed and way too static, there is always a solution with Zabbix as long as you’re creative enough.

Getting data into Zabbix

We needed some middleware. Something that was able to receive that data from MIST and convert it into something that we can push into Zabbix.

That’s exactly what we did. We, together with Quanza, built a middleware that uses an API token to authenticate against the MIST API endpoint. Once the authentication is successful, the middleware is allowed to subscribe to certain “channels”. These channels provide event and performance data. You can compare it with MQTT, where a subscription to channels/topics is needed to get the information you are interested in.

Mist Middleware explained

  • Step 1:  Authenticate using an API token.
  • Step 2: Subscribe to channels
  • Step 3: Receive performance and event data
  • Step 4: Filter out only the relevant (performance) data for Zabbix
  • Step 5: Push into Zabbix

Once we had this in place, the MIST part was finished. We had our data and were able to push it into the monitoring solution.

Parsing in Zabbix

So, right now we have the data available for Zabbix. Time to find a neat way to use it. As the environments (both inventory and the types of equipment that are used) might be dynamic, we definitely do not want to apply any manual work to monitor newly added sites/equipment.

That means that low-level discovery rules are pretty much the only viable solution.

Here we go:

Describing host prototypes

 

 

Within Zabbix, we configure 1 host (the Discovery host) and apply a template on that host, with exactly 1 LLD rule: Query our middleware, and based on the information received, create new hosts (Host prototypes).

The data that is received looks like this:

{
"NODEID":"<NODEID>",
"NAME":"AP-<SITE>-<NUMBER",
"SITENAME":"<SITENAME>",
"SITEID":"<SITEID>",
"MAC":"<MAC ADDRESS>",
"ORGNAME":"<ORG NAME>"
},

Those new/discovered hosts will have the names of the AP and corresponding organization and location (in Mist: site). We also link a template to the discovered host and add it to a Host group with the variables we’ll need later, such as the organization, site name, siteID etc.

So, We need to parse those JSON elements. Luckily Zabbix provides, within the LLD rule config the option to parse this into LLD macros, so for example the Node id is parsed into {#ID} with the use of JSONPath $.NODEID:

LLD macro configuration

Once this process is complete, we have a new host per AP. Of course, there is no data on that host and querying the middleware or Mist is a bad idea. Scalability will be extremely problematic with more than a few organizations and sites configured in the Mist environment. As we’re building this with a big network integrator, scalability is a thing and we do not want to risk having a noticeable performance impact by using polling.

How about pushing data from the middleware into Zabbix? Once the data is received from Mist by the middleware, it’s parsed, filtered and then it pushes out whatever must be pushed out to Zabbix. We decided the best option is to push per host as we have those already available in Zabbix.

Now we should ensure two things:

    • do not overwhelm Zabbix with data being pushed in
    • Getting all the data with the least number of ‘pushes’ into Zabbix

Again, the flexibility of Zabbix is extremely useful here. On the AP hosts, there is a template with exactly 1 trapper item: receive performance data. From there, everything will be handled by the Zabbix ‘Master/Dependent’ item concept. We then extract data like temperatures, CPU load, memory usage, etc.

At the same time, we receive data regarding network usage (interface statistics) and radio information. As we do not know upfront how many network interfaces and radio’s there are on a particular Access Point, we do not want to hard-code such information. Here we are combining the concept of low-level discovery with dependent items (The following blog post covers the logic behind such an approach: Low-Level Discovery with Dependent items – Zabbix Blog)

Using ‘low-level discovery with dependent Items’, all relevant items are created ‘dynamically’ in such a way that a change on the MIST side (for example a new type of Access Point) doesn’t require changes on the Zabbix side. Monitoring starts within minutes and you’ll never miss any problem that might arise!
Just to give you an idea of the flow:
The Master Item gets a JSON format like this (and we’ve parsed only a small portion here) pushed into it from the middleware:

{
"mac":"<MAC ADDRESS>",
"model":"<MODEL>",
"port_stat":{
"eth0":{
"up":true,
"speed":1000,
"full_duplex":true,
"tx_bytes":37291245,
"tx_pkts":169443,
"rx_bytes":123742282,
"rx_pkts":779249,
"rx_errors":0,
"rx_peak_bps":14184,
"tx_peak_bps":5444
}
},
"cpu_util":2,
"cpu_user":652611,
"cpu_system":901455,
"radio_stat":{
"band_5":{
"num_clients":<CLIENTS>,
"channel":<CHANNEL>,
"bandwidth":0,
"power":0,
"tx_bytes":0,
"tx_pkts":0,
"rx_bytes":0,
"rx_pkts":0,
"noise_floor":<NOISE>,
"disabled":true,
"usage":"5",
"util_all":0,
"util_tx":0,
"util_rx_in_bss":0,
"util_rx_other_bss":0,
"util_unknown_wifi":0,
"util_non_wifi":0
}
"env_stat":{
"cpu_temp":<CPU TEMP>,
"ambient_temp":<AMBIENT TEMP>,
"humidity":0,
"attitude":0,
"pressure":0
}
}

Within the Master item, we’re basically not parsing anything, it’s just there to receive the values and push them into the Dependent items. In the dependent items, we start “cherry-picking” only those metrics that we would like to see. As it’s JSON format, preprocessing step “JSONPath” comes in handy. At the same time, we’re looking into efficiency, so a second step is added: discard unchanged with heartbeat (1d):

Example: Getting out the statistics of the 2.4Ghz band radio:

Item prototype proprocessing

Of course, this has to be done with all items.

So far, we’ve heavily focussed on the technical part, but Zabbix does have quite a few options to visualize the data as well. As we’re waiting on the next LTS release, we have only set up a very small dashboard with a few widgets. One of the better ones:

number_clients

Here we’re using the new graph type widget, but instead of plotting the number of clients per AP, we’re plotting a dataset with an “aggregate” function. Of course, if we look at the dashboard widgets, there are many more things that can be visualized…

Efficiency and security considerations

As we were building this, we had 2 main considerations:

    • Efficiency
    • Security

Efficiency, as we are anticipating that Quanza will be responsible for quite a few MIST environments on top of the current environments in the near future, combined with a strict limit of allowed API calls against the MIST API. As such, it is really important to keep those API calls as low as possible. Next to that, with every new Access Point added, the load on the Zabbix server is increasing. Now that is not really a problem, as Zabbix is perfectly capable of monitoring thousands of metrics simultaneously, though it has its limits. And you do not want to hit those limits in a production environment with the only solution being migration to beefier hardware.

Security-wise this challenge had a few things going on since we’re talking to an external exposed API. MIST can invoke webhooks. This might’ve been a bit easier (we explored it, but there were of course other things to keep in mind while going down that road), but the main concern was the requirement that Zabbix / an interface to Zabbix is exposed to the internet. That didn’t look too appealing and required a bit more maintenance. The preferable solution was to create that middleware where we have full control of what queries are executed, how the API token is protected, which connections are established etc. etc.

Conclusion

Although this question was challenging, together with Quanza we created a scalable, secure, and dynamic solution. Zabbix is flexible enough to facilitate the tricks required to provide reliable monitoring and alerting in an efficient and secure manner. We strongly believe the only limitation is your own creativity and this case proves that once again.

Quanza can now ensure the availability of their customer Juniper MIST-based networks, and in case something breaks their 24×7 manned NOC will be able to take whatever action is required to ensure the availability of the customers’ network – all thanks to the flexibility of Zabbix.

The post Monitoring Juniper Mist wireless network appeared first on Zabbix Blog.

Low-Level Discovery with Dependent items

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/low-level-discovery-with-dependent-items/13634/

The low-level discovery was introduced in Zabbix 2.0 and still belongs to one of the all-time favorites. Before LLD was available, adding items was all manual work. For example adding new disks, new interfaces, network ports on switches and everything else was all manual labor. And then LLD came around and suddenly we were able to ‘discover’ entities, and based on those discovered entities we can add new items, triggers, and such automatically.

Contents

  • Low-Level Discovery setup
  • Dependent items
  • Combing Low-Level Discovery and Dependent items
  • Conclusion

For a video guide, check out the Zabbix YouTube here: Zabbix: Low Level Discovery with Dependent items – YouTube

Low-Level Discovery setup

Let’s go over the idea of Low-Level Discovery first.

For the sake of clarity, we will stick with the default Zabbix agent item. Of course, as we will discover it’s only the format that matters for Zabbix to consider a response as LLD information. Let’s use built-in agent key: vfs.fs.discovery. Once we force the Zabbix agent to execute this item, it will reply with something like this:

[{"{#FSNAME}":"/sys","{#FSTYPE}":"sysfs"},{"{#FSNAME}":"/proc","{#FSTYPE}":"proc"},{"{#FSNAME}":"/dev","{#FSTYPE}":"devtmpfs"},{"{#FSNAME}":"/sys/kernel/security","{#FSTYPE}":"securityfs"},{"{#FSNAME}":"/dev/shm","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/dev/pts","{#FSTYPE}":"devpts"},{"{#FSNAME}":"/run","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/sys/fs/cgroup","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/sys/fs/cgroup/systemd","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/pstore","{#FSTYPE}":"pstore"},{"{#FSNAME}":"/sys/firmware/efi/efivars","{#FSTYPE}":"efivarfs"},{"{#FSNAME}":"/sys/fs/bpf","{#FSTYPE}":"bpf"},{"{#FSNAME}":"/sys/fs/cgroup/net_cls,net_prio","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/devices","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/hugetlb","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/memory","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/rdma","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/freezer","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/cpu,cpuacct","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/cpuset","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/perf_event","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/blkio","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/pids","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/kernel/tracing","{#FSTYPE}":"tracefs"},{"{#FSNAME}":"/sys/kernel/config","{#FSTYPE}":"configfs"},{"{#FSNAME}":"/","{#FSTYPE}":"xfs"},{"{#FSNAME}":"/sys/fs/selinux","{#FSTYPE}":"selinuxfs"},{"{#FSNAME}":"/proc/sys/fs/binfmt_misc","{#FSTYPE}":"autofs"},{"{#FSNAME}":"/dev/hugepages","{#FSTYPE}":"hugetlbfs"},{"{#FSNAME}":"/dev/mqueue","{#FSTYPE}":"mqueue"},{"{#FSNAME}":"/sys/kernel/debug","{#FSTYPE}":"debugfs"},{"{#FSNAME}":"/sys/fs/fuse/connections","{#FSTYPE}":"fusectl"},{"{#FSNAME}":"/boot","{#FSTYPE}":"ext4"},{"{#FSNAME}":"/boot/efi","{#FSTYPE}":"vfat"},{"{#FSNAME}":"/home","{#FSTYPE}":"xfs"},{"{#FSNAME}":"/run/user/0","{#FSTYPE}":"tmpfs"}]

When we put this in a more readable format (truncated) it will look like this:

[
{
"{#FSNAME}":"/sys",
"{#FSTYPE}":"sysfs"
},
{
"{#FSNAME}":"/proc",
"{#FSTYPE}":"proc"
},
{
"{#FSNAME}":"/dev",
"{#FSTYPE}":"devtmpfs"
},
{
"{#FSNAME}":"/sys/kernel/config",
"{#FSTYPE}":"configfs"
},
{
"{#FSNAME}":"/",
"{#FSTYPE}":"xfs"
},
{
"{#FSNAME}":"/boot",
"{#FSTYPE}":"ext4"
},
{
"{#FSNAME}":"/home",
"{#FSTYPE}":"xfs"
}
]

In this format it suddenly becomes clear, we have the {#FSNAME} macro, with the name of a filesystem, combined with the type, captured in {#FSTYPE}.

Perfect! We feed this information into Zabbix, and LLD magic will happen.
Based on the Item prototypes, new items per {#FSNAME} will be added, and monitoring will start on those items.

Looking at the Item prototypes, they look a lot like normal items:

So, we have one item prototype that is responsible for providing the LLD information, and then the created ‘normal’ items to query the filesystem statistics. As you can imagine, with just 5 filesystems and 1 metric per filesystem, queried once per minute, no problem. But what if we have 50 filesystems, 7 metrics per filesystem and they get queried every 10 seconds… That’s a lot of queries against the host! Not only does that add load to the Zabbix server, but obviously also to the monitored host. It works, but is it ideal? It certainly isn’t!

So we’ve basically just setup this:

Dependent items

But then Zabbix introduced dependent items. Let’s take a quick look at dependent items and what they are

We have one master item that gathers all information (in bulk) and propagates that information to all the dependent items. On those dependent items we just do the cherry picking and filtering of the relevant metrics. Let’s put this to work and see how that goes.

So we create an item, with, in this case, the http agent type, which will collect the following information regarding the server status in a single request:

ServerVersion: Apache/2.4.37 (centos)
ServerMPM: event
Server Built: Nov  4 2020 03:20:37
CurrentTime: Monday, 08-Mar-2021 14:35:20 CET
RestartTime: Monday, 08-Mar-2021 11:04:09 CET
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 12671
ServerUptime: 3 hours 31 minutes 11 seconds
Load1: 0.01
Load5: 0.03
Load15: 0.00
Total Accesses: 1182
Total kBytes: 10829
Total Duration: 95552
CPUUser: 5.01
CPUSystem: 7.34
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0974667
Uptime: 12671
ReqPerSec: .0932839
BytesPerSec: 875.14
BytesPerReq: 9381.47
DurationPerReq: 80.8393
BusyWorkers: 1
IdleWorkers: 99
Processes: 4
Stopping: 0
BusyWorkers: 1
IdleWorkers: 99
ConnsTotal: 4
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 0
ConnsAsyncClosing: 0
Scoreboard: _________________________________________________________________________________________W__________............................................................................................................................................................................................................................................................................................................

 

Now, we create some dependent items, that depend on that first item (which we will call the Master item). Every time the Master item receives information, the complete reply will be pushed to the dependent items, without any altering of that data. So the master and dependent items are identical when no preprocessing is applied. That’s why on the dependent items we apply preprocessing to filter relevant information, for example, the BusyWorkers:

Perfect. So querying a host once, getting all the metrics in bulk, and then parsing it in Zabbix using preprocessing. Say bye to excessive load on the monitored host… (and due to preprocessing processes within Zabbix, no problem on the Zabbix server side).

Combining Low-Level Discovery and Dependent items

Ok, and what if we combine these to concepts? LLD with Dependent items? Wouldn’t that be the ultimate goal? Automatically creating new items without putting extra load to the monitored host? Let’s get this going!

To stick with the first example of LLD, we will discover filesystems, but now without the vfs.fs.discovery key, but the newly introduced vfs.fs.get key. Once we force the agent to execute this key, we will see this reply:

[{"fsname":"/dev","fstype":"devtmpfs","bytes":{"total":1940963328,"free":1940963328,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":473868,"free":473487,"used":381,"pfree":99.919598,"pused":0.080402}},{"fsname":"/dev/shm","fstype":"tmpfs","bytes":{"total":1958469632,"free":1958469632,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478141,"used":1,"pfree":99.999791,"pused":0.000209}},{"fsname":"/run","fstype":"tmpfs","bytes":{"total":1958469632,"free":1892040704,"used":66428928,"pfree":96.608121,"pused":3.391879},"inodes":{"total":478142,"free":477519,"used":623,"pfree":99.869704,"pused":0.130296}},{"fsname":"/sys/fs/cgroup","fstype":"tmpfs","bytes":{"total":1958469632,"free":1958469632,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478125,"used":17,"pfree":99.996445,"pused":0.003555}},{"fsname":"/","fstype":"xfs","bytes":{"total":95516360704,"free":55329644544,"used":40186716160,"pfree":57.926877,"pused":42.073123},"inodes":{"total":46661632,"free":46535047,"used":126585,"pfree":99.728717,"pused":0.271283}},{"fsname":"/boot","fstype":"ext4","bytes":{"total":1023303680,"free":705544192,"used":247296000,"pfree":74.046435,"pused":25.953565},"inodes":{"total":65536,"free":65497,"used":39,"pfree":99.940491,"pused":0.059509}},{"fsname":"/home","fstype":"xfs","bytes":{"total":5358223360,"free":5286903808,"used":71319552,"pfree":98.668970,"pused":1.331030},"inodes":{"total":2621440,"free":2621428,"used":12,"pfree":99.999542,"pused":0.000458}},{"fsname":"/run/user/0","fstype":"tmpfs","bytes":{"total":391692288,"free":391692288,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478137,"used":5,"pfree":99.998954,"pused":0.001046}}]

And if we format it to be more readable, it will look like this (truncated):

[
  {
    "fsname":"/",
    "fstype":"xfs",
    "bytes":{
      "total":95516360704,
      "free":55329644544,
      "used":40186716160,
      "pfree":57.926877,
      "pused":42.073123
    },
    "inodes":{
      "total":46661632,
      "free":46535047,
      "used":126585,
      "pfree":99.728717,
      "pused":0.271283
    }
  },
  {
    "fsname":"/home",
    "fstype":"xfs",
    "bytes":{
      "total":5358223360,
      "free":5286903808,
      "used":71319552,
      "pfree":98.668970,
      "pused":1.331030
    },
    "inodes":{
      "total":2621440,
      "free":2621428,
      "used":12,
      "pfree":99.999542,
      "pused":0.000458
    }
  }
]

Per filesystem, we get the original information FSNAME and FSTYPE, but also the statistics of these filesystems… bulk metrics! So, we create a normal item (Which will serve as the master item) getting out all those metrics in a single query:

Once we’ve got this data in Zabbix, we feed it into the LLD rule, giving this LLD rule the dependent LLD type:

Of course there are no ready to use LLD macros in this data, but since it is in JSON format, it shouldn’t be too hard to create the LLD macros with the ‘LLD macros’ option in the frontend and the relevant JSONPath expression:

Note: Technically we do not need to create the {#FSTYPE} macro to get this working!

Once this is done, we should be ready to create the item prototypes for this LLD rule. The data is there, macros are available, nothing is going to stop us now!

Let’s move on to item prototypes. But of course, we do not want to poll that remote host again per discovered filesystem. That means we will make this item prototype of the dependent item type as well, pointing it back to the master item we’ve created.

For the first item prototype, we want to obtain the total size per filesystem:

But, as I mentioned earlier: a dependent item without any preprocessing is identical to the master item and of course that would be wrong in this case. We just want to see the total bytes per filesystem and not all the collected statistics. In the configuration above we already know what to get out, so the Type of information and Units are filled already. What is not visible on that screenshot is the preprocessing rule that we need. Here the ‘JSONPath’ preprocessing step comes in handy since we receive JSON data. We would like to get out this part for our item (truncated):

[
  {
    "fsname":"/",
    "fstype":"xfs",
    "bytes":{
      "total":95516360704,
      "free":55329644544,
       "used":40186716160,
      "pfree":57.926877,
      "pused":42.073123

So, if we try to get this information out using JSONPath, it should look like: $.bytes.total.first() but this will match on any filesystem, so we need to configure it a bit more specific like: $[?(@.fsname==’/’)].bytes.total.first() 

As you can see, the JSONPath is a bit more complex here. We are forcing it to match on @.fsname==’/’ and from that entity, get out the bytes.total. Now, to make it even more complex we shouldn’t configure the filesystem hardcoded in the JSONPath since we’re working with Item prototypes. It should be the LLD Macro {#FSNAME} instead!

Now we save this item prototype, grab a cup of coffee (or just force a config_cache_reload on the server) and just wait for the magic to happen.

We’ve now built this setup:

 

So the master item will get values (i.e. obtain bulk data every minute) and push it into the LLD rule. From there, as per item prototypes, items will be created and those are populated from the master item as well, filtering out only the relevant metrics using Preprocessing.

So far, so good, but we have one small problem to solve: We want to get metrics every minute or so, but since all those metrics will get pushed into the LLD rule, we might be adding unnecessary extra load due to the high frequency. Luckily, solving that problem is no too hard. Navigate to the discovery rule, go to the ‘Preprocessing tab’ and select ‘Discard unchanged with heartbeat’ parameter: 1h or even larger interval!

This is insane! With just one poll/query to a host, we will utilize the power of LLD and dependent items, getting all metrics without adding minimal extra load on that host.

 

Conclusion

That’s it. If you’ve setup everything correctly, you should now get out quite a few filesystem metrics without adding any extra performance overhead on the host by performing unnecessary data requests.

Of course, if you need help optimizing your Zabbix environment, support contracts, consultancy, or training, we from Opensource ICT Solutions are always available to assist you in every possible way, worldwide, 24×7.

Thanks for reading this blog post, see you in the next one.

Getting your notifications via Signal

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/getting-your-notifications-via-signal/13286/

Recently, Whatsapp pushed their new privacy policy where they announced to share more data with Facebook, causing an exodus to other platforms, where Signal is one of the more popular ones, among Telegram. Both are great alternatives, but I prefer Signal due to the open-source part, end to end encryption, and last but not least: their business model (living on donations instead of selling your data).

Typically, Zabbix is sending notifications to whatever medium you’ve chosen if a problem is detected. We all know the Email messages, the various webhook integrations with Slack/MS Teams/ Jira, etc, perhaps even some text message integrations and such. Now, if we’re migrating to Signal, we suddenly have access to the Signal API and can utilize it to receive Zabbix notifications. Nice!

There is only one drawback. You need a separate phone number to register against Signal. Don’t use your own phone number – unless you want to lose the ability to use Signal ;(

There are various ways to get a phone number for this purpose:

  • Use the phone number of your current SMS gateway
  • Use the company phone number (a lot of cloud PBX are providing the option to receive the verification email)
  • Purchase a prepaid phone number.
  • Use a service like Twilio

You just need to receive one text message, the rest of the communications will go via the internet

Time to get rid of Whatsapp and move to Signal! But… How to use Signal to get your notifications?

Signal-cli

Although we could built everything from scratch, talking to the API of Signal, there is a nice implementation available in order to talk to Signal within a few minutes: Signal-cli

Although this github page is very comprehensive in order to get Signal-cli installed, but of course it is not doing anything with Zabbix.

Configuration tasks

For this guide, we’re using:

  • Centos 8
  • Zabbix 5.2

signal-cli installation

First, lets install the Signal-cli utility, and in order to do so we need to resolve the dependency of Java by installing the openjdk application:

dnf -y install java-11-openjdk-devel.x86_64

After this installation, we should be good to continue with the installation of signal-cli. According to their installation guide, this should be sufficient:

export VERSION="0.7.3"
wget https://github.com/AsamK/signal-cli/releases/download/v"${VERSION}"/signal-cli-"${VERSION}".tar.gz
sudo tar xf signal-cli-"${VERSION}".tar.gz -C /opt
sudo ln -sf /opt/signal-cli-"${VERSION}"/bin/signal-cli /usr/local/bin/

At the time of writing, the most recent version is 0.7.3, and that’s what we’re installing here. If in the future a new version is released, of course you should install that!

If everything went as expected, we should be able to register ourself to Signal.

signal-cli registration

Since we want to execute these commands by Zabbix, we must make sure the registration is done with the correct user on the Zabbix server, otherwise you will get the following error message:

Unregistered user error

(ERROR App – User +19293771253 is not registered.)

In order to prevent this error, lets do the authentication against Signal as Zabbix user:

Important: The USERNAME (your phone number) must include the country calling code, i.e. the number must start with a “+” sign and you must replace everything between the  < > in the following examples with your own values

runuser -l zabbix -c 'signal-cli -u <NUMBER> register'

Now, check for incoming test messages on this phone number. Within seconds you should receive a 6 digit code in the following format: xxx-xxx

Once you’ve received the text, it’s time to complete the registration:

runuser -l zabbix -c 'signal-cli -u <NUMBER> verify <CODE>'

Since we’re running these commands as a different user, we won’t see the output of them. Let’s just test!

Sending messages from the command line is straight forward:

runuser -l zabbix -c 'signal-cli -u <NUMBER> send -m <MESSAGE> <RECEIVER NUMBER>'

You will see the message id as output. Simply ignore it, since it’s not relevant at this point.

Within seconds:

It works! Great.

So now we’ve got this part covered, time to get the AlertScript set up, before heading to the frontend.

Zabbix AlertScript setup

Ok, so now we’ve got the registration done, we need to make sure Zabbix can utilise it. In order to do so, we use a very old method. Although it would’ve made more sense to use the webhook option, that means I had to built the communication with Signal from scratch.

So AlertScripts it is. In your terminal/SSH session with the Zabbix server open a new file with this command: vi /usr/lib/zabbix/alertscripts/signal.sh and insert the following contents:

#!/bin/bash
signal-cli -u '+19293771253' send -m "$1" $2

 That’s right. just 2 lines. After saving the file, change the owner and set the permissions:

chown zabbix:zabbix /usr/lib/zabbix/alertscripts/signal.sh
chmod 7000 /usr/lib/zabbix/alertscripts/signal.sh

and it’s time to move to our frontend.

Zabbix mediatype configuration

In the frontend, go to Administration -> Mediatypes and create a new mediatype:

Signal Mediatype

Name: Signal
Type: Script
Script name: signal.sh
Script parameters:
    {ALERT.MESSAGE}
    {ALERT.SENDTO}

don’t forget to configure some Message templates as well (second tab in the Mediatype configuration). You can just use the defaults if you click on ‘add’

Zabbix media configuration

Next step. Navigate to Administration -> Users (or just open your own user profile) and create a new media:

new-media

Type: Signal
Sendto: <your number>
When active / severity as per needs

Important: The USERNAME (your phone number) must include the country calling code, i.e. the number must start with a “+” sign

We’re almost there, just some configuration on the actions

Zabbix action configuration

This step is only needed if you are sending notifications right now via a specific mediatype. If you configured the ‘send only to’ option to ‘- All -‘ there is nothing to change, and it will work straight away!

Otherwise, navigate to Configuration -> Actions and find the action you want to change, and in the Operations, Recovery operations and Update operations change the ‘send only to’ option to ‘Signal’

Save your action and it’s time to test – Generate some problem to confirm the implementation actually works.

Wrap up

That’s it. By now you should have a working implementation where Zabbix is sending notifications to Signal. The setup was extremely straight forward and easy to configure. Nevertheless, if you need help getting this going, we (Opensource ICT Solutions) offer consultancy services as well, and are more than happy to help you out!