Tag Archives: How-to

Monitoring Kubernetes with Zabbix

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/monitoring-kubernetes-with-zabbix/25055/

There are many options available for monitoring Kubernetes and cloud-native applications. In this multi-part blog series, we’ll explore how to use Zabbix to monitor a Kubernetes cluster and understand the metrics generated within Zabbix. We’ll also learn how to exploit Prometheus endpoints exposed by applications to monitor application-specific metrics.

Want to see Kubernetes monitoring in action? Watch the step-by-step Zabbix Kubernetes monitoring configuration and deployment guide.

Why Choose Zabbix to Monitor Kubernetes?

Before choosing Zabbix as a Kubernetes monitoring tool, we asked ourselves, “why would we choose to use Zabbix rather than Prometheus, Grafana, and alertmanager?” After all, they have become the standard monitoring tools in the cloud ecosystem. We decided that our minimum criteria for Zabbix would be that it was just as effective as Prometheus for monitoring both Kubernetes and cloud-native applications.

Through our discovery process, we concluded that Zabbix meets (and exceeds) this minimum requirement. Zabbix provides similar metrics and triggers as Prometheus, alert manager, and Grafana for Kubernetes, as they both use the same backend tools to do this. However, Zabbix can do this in one product while still maintaining flexibility and allowing you to monitor pretty much anything you can write code to collect. Regarding application monitoring, Zabbix can transform Prometheus metrics fed to it by Prometheus exporters and endpoints. In addition, because Zabbix can make calls to any HTTP endpoint, it can monitor applications that do not have a dedicated Prometheus endpoint, unlike Prometheus.

The Zabbix Helm Chart

Zabbix monitors Kubernetes by collecting metrics exposed via the Kubernetes API and kube-state-metrics. The components necessary to monitor a cluster are installed within the cluster using this helm chart provided by Zabbix. The helm chart includes the Zabbix agent installed as a daemon set and is used to monitor local resources and applications on each node. A Zabbix proxy is also installed to collect monitoring data and transfer it to the external Zabbix server.

Only the Zabbix proxy needs access to the Zabbix server, while the agents can send data to the proxy installed in the same namespace as each agent. A cluster role allows Zabbix to access resources in the cluster via the Kubernetes API. While the cluster role could be modified to restrict privileges given to Zabbix, this will result in some items becoming unsupported. We recommend keeping this the same if you want to get the most out of Kubernetes monitoring with Zabbix.

The Zabbix helm chart installs the kube-state-metrics project as a dependency. You may already be familiar with this project under the Kubernetes organization, which generates Prometheus format metrics based on the current state of the Kubernetes resources. In addition, if you have experience using Prometheus to monitor a cluster, you may already have this installed. If that is the case, you can point to this deployment rather than installing another one.

In this tutorial, we will install kube-state-metrics via the Zabbix helm chart.

For more information on skipping this step, refer to the values file in the Zabbix Kubernetes helm chart.

Installing the Zabbix Helm Chart

Now that we’ve explained how the Zabbix helm chart works, let’s go ahead and install it. In this example, we will assume that you have a running Zabbix 6.0 (or higher) instance that is reachable from the cluster you wish to monitor. I am running a 6.0 instance in a different cluster than the one we want to monitor. The server is reachable via the DNS name mdeforest.zabbix.atsgroup.io with a non-standard port of 31103.

We will start by installing the latest Zabbix helm chart. I recommend visiting zabbix.com/integrations/kubernetes to get any sources that may be referred to in this tutorial. There you will find a link to the Zabbix helm chart and templates. For the most part, we will follow the steps outlined in the readme.

 

Using a terminal window, I am going to make sure the active cluster is set to the cluster that I want to monitor:

kubectl config use-context <cluster context name>

I’m then going to add the Zabbix chart repo to my local helm repository:

helm repo add zabbix-chart-6.0 https://cdn.zabbix.com/zabbix/integrations/kubernetes-helm/6.0/

If you’re running Zabbix 6.2 or newer, change the references to 6.0 in this command to 6.2.

Depending on your circumstances, you will need to set a few values for the installation. In most cases, you only need to set a few environment variables for the Zabbix agent and the proxy. The complete list of values and environment variables is available in the helm chart repo, alongside the agent and proxy images on Docker Hub.

In this case, I’m setting the passive server environment variable for the agent to allow any IP to connect. For the proxy, I am setting the server host accessible from the proxy alongside the non-standard port. I’ve also set here some variables related to cache size. These variables may depend on your cluster size, so you may need to play around with them to find the correct values.

Now that I have the values file ready, I’m ready to install the chart. So, we’ll use the following command. Of course, the chart path might vary depending on what version of the chart you’re using.

helm install -f </path/to/values/file> [-n <namespace>] zabbix zabbix-chart-6.0/zabbix-helm-chart

You can also optionally add a namespace. You must wait until everything is running, so I’ll check just that with the following:

watch kubectl get pods

Now that everything is installed, we’re ready to set up hosts in Zabbix that will be associated with the cluster. The last step before we have all the information we need is to obtain the token created via the service account installed with the helm chart. We’ll get this by running the next command, which is the name of the service account that was created:

kubectl get secret -o jsonpath={.data.token} zabbix service-account | base64 -d

This will get the secret created for the service account and grab just the token from that, which is passed to the base64 utility to decode it. Be sure to copy that value somewhere because you’ll need it for later.

You’ll also need the Kubernetes API endpoint. In most cases, you’ll use the proxy installed rather than the server directly or a proxy outside the cluster. If this is the case, you can use the service DNS for the API. We should be able to reach it by pointing to https://kubernetes.default.svc.cluster.local:443/api.

If this is not the case, you can use the output from the command:

kubectl cluster-info

Now, let’s head over to the Zabbix UI. All the templates we need are shipped in Zabbix 6. If for some reason, you can’t find them, they are available for download and import by visiting the integrations page that I pointed out earlier on the Zabbix site.

Adding the Proxy

We will add our proxy by heading to Administration -> Proxies:

  1. Click Create Proxy. Because this is an active proxy by default, we only need to specify the proxy name. If you didn’t make any changes to the helm chart, this should default to zabbix-proxy. If you’d like to name this differently, you can change the environment variable zbx_hostname for the proxy in the helm chart. We’re going to leave it as the default for now. You’re going to enter this name and then click “Add.” After a few minutes, you’ll start to see that it says that the proxy has been seen.
  2. Create a Host Group to put hosts related to Kubernetes. For this example, let’s create one, which we’ll call Kubernetes.
  3. Head to the host page under configuration and click Create Host. The first host will collect metrics related to monitoring Kubernetes nodes, and we’ll discover nodes and create new hosts using Zabbix low-level discovery.
  4. Give this host the name Kubernetes Nodes. We’ll also assign this host to the Kubernetes host group we created and attach the template Kubernetes nodes by HTTP.
  5. Change the line “Monitored by proxy” to the proxy created earlier, called zabbix-proxy.
  6. Click the Macros tab and select “Inherited and host macros.” You should be able to see all the macros that may be set to influence what is monitored in your cluster. In this case, we need to change the first two macros. The first, {KUBE.API.ENDPOINT.URL}, should be set to the Kubernetes API endpoint. In our case, we can set it to what I mentioned earlier: default.svc.cluster.local:443/api. Next, the token should be set to the previously retrieved value from the command line.
  7. lick Add. After a few minutes, you should start seeing data on the latest data page and new hosts on the host page representing each node.

Creating an Additional Host

Now let’s create another host that will represent the metrics available via the Kubernetes API and the kube-state-metrics endpoint.

  1. Click Create Host again, name this host Kubernetes Cluster State, and add it to the Kubernetes group again.
  2. Let’s also attach the Kubernetes Cluster State template by HTTP. Again, we’re going to choose the proxy that we created earlier.
  3. In the Macro section, change the kube.api.url to the same thing we used before, but this time leave off the /api at the end. Simply: default.svc.cluster.local:443. Be sure to set the token as we did before.
  4. Assuming nothing else was changed in the installation of the helm chart, we can now add that host.

After a few minutes, you should receive metrics related to the cluster state, including hosts representing the kubelet on each node.

What’s Next?

Now you’re all set to start monitoring your Kubernetes cluster in Zabbix! Give it a try, and let us know your thoughts in the comments.

In the next blog post, we’ll look at what you can do with your newly monitored cluster and how to get the most out of it.

If you’d like help with any of this, ATS has advanced monitoring, orchestration, and automation skills to make this process a snap. Set up a 15-minute with our team to go through any questions you have.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group.  She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few.  As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group: The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT.  Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries.  They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

Use MSK Connect for managed MirrorMaker 2 deployment with IAM authentication

Post Syndicated from Tanner Pratt original https://aws.amazon.com/blogs/big-data/use-msk-connect-for-managed-mirrormaker-2-deployment-with-iam-authentication/

In this post, we show how to use MSK Connect for MirrorMaker 2 deployment with AWS Identity and Access Management (IAM) authentication. We create an MSK Connect custom plugin and IAM role, and then replicate the data between two existing Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters. The goal is to have replication successfully running between two MSK clusters that are using IAM as an authentication mechanism. It’s important to note that although we’re using IAM authentication in this solution, this can be accomplished using no authentication for the MSK authentication mechanism.

Solution overview

This solution can help Amazon MSK users run MirrorMaker 2 on MSK Connect, which eases the administrative and operational burden because the service handles the underlying resources, enabling you to focus on the connectors and data to ensure correctness. The following diagram illustrates the solution architecture.

Apache Kafka is an open-source platform for streaming data. You can use it to build building various workloads like IoT connectivity, data analytic pipelines, or event-based architectures.

Kafka Connect is a component of Apache Kafka that provides a framework to stream data between systems like databases, object stores, and even other Kafka clusters, into and out of Kafka. Connectors are the executable applications that you can deploy on top of the Kafka Connect framework to stream data into or out of Kafka.

MirrorMaker is the cross-cluster data mirroring mechanism that Apache Kafka provides to replicate data between two clusters. You can deploy this mirroring process as a connector in the Kafka Connect framework to improve the scalability, monitoring, and availability of the mirroring application. Replication between two clusters is a common scenario when needing to improve data availability, migrate to a new cluster, aggregate data from edge clusters into a central cluster, copy data between Regions, and more. In KIP-382, MirrorMaker 2 (MM2) is documented with all the available configurations, design patterns, and deployment options available to users. It’s worthwhile to familiarize yourself with the configurations because there are many options that can impact your unique needs.

MSK Connect is a managed Kafka Connect service that allows you to deploy Kafka connectors into your environment with seamless integrations with AWS services like IAM, Amazon MSK, and Amazon CloudWatch.

In the following sections, we walk you through the steps to configure this solution:

  1. Create an IAM policy and role.
  2. Upload your data.
  3. Create a custom plugin.
  4. Create and deploy connectors.

Create an IAM policy and role for authentication

IAM helps users securely control access to AWS resources. In this step, we create an IAM policy and role that has two critical permissions:

A common mistake made when creating an IAM role and policy needed for common Kafka tasks (publishing to a topic, listing topics) is to assume that the AWS managed policy AmazonMSKFullAccess (arn:aws:iam::aws:policy/AmazonMSKFullAccess) will suffice for permissions.

The following is an example of a policy with both full Kafka and Amazon MSK access:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:*",
                "kafka:*",
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

This policy supports the creation of the cluster within the AWS account infrastructure and grants access to the components that make up the cluster anatomy like Amazon Elastic Compute Cloud (Amazon EC2), Amazon Virtual Private Cloud (Amazon VPC), logs, and kafka:*. There is no managed policy for a Kafka administrator to have full access on the cluster itself.

After you create the KafkaAdminFullAccess policy, create a role and attach the policy to it. You need two entries on the role’s Trust relationships tab:

  • The first statement allows Kafka Connect to assume this role and connect to the cluster.
  • The second statement follows the pattern arn:aws:sts::(YOUR ACCOUNT NUMBER):assumed-role/(YOUR ROLE NAME)/(YOUR ACCOUNT NUMBER). Your account number should be the same account number where MSK Connect and the role are being created in. This role is the role you’re editing the trust entity on. In the following example code, I’m editing a role called MSKConnectExample in my account. This is so that when MSK Connect assumes the role, the assumed user can assume the role again to publish and consume records on the target cluster.

In the following example trust policy, provide your own account number and role name:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "kafkaconnect.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		},
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:sts::123456789101:assumed-role/MSKConnectExampleRole/123456789101"
			},
			"Action": "sts:AssumeRole"
		}
	]
}

Now we’re ready to deploy MirrorMaker 2.

Upload data

MSK Connect custom plugins accept a file or folder with a .jar or .zip ending. For this step, create a dummy folder or file and compress it. Then upload the .zip object to your Amazon Simple Storage Service (Amazon S3) bucket:

mkdir mm2 
zip mm2.zip mm2 
aws s3 cp mm2.zip s3://mytestbucket/

Because Kafka and subsequently Kafka Connect have MirrorMaker libraries built in, you don’t need to add additional JAR files for this functionality. MSK Connect has a prerequisite that a custom plugin needs to be present at connector creation, so we have to create an empty one just for reference. It doesn’t matter what the contents of the file are or what the folder contains, as long as there is an object in Amazon S3 that is accessible to MSK Connect, so MSK Connect has access to MM2 classes.

Create a custom plugin

On the Amazon MSK console, follow the steps to create a custom plugin from the .zip file. Enter the object’s Amazon S3 URI and for this post, and name the plugin Mirror-Maker-2.

custom plugin console

Create and deploy connectors

You need to deploy three connectors for a successful mirroring operation:

  • MirrorSourceConnector
  • MirrorHeartbeatConnector
  • MirrorCheckpointConnector

Complete the following steps for each connector:

  1. On the Amazon MSK console, choose Create connector.
  2. For Connector name, enter the name of your first connector.
    connector properties name
  3. Select the target MSK cluster that the data is mirrored to as a destination.
  4. Choose IAM as the authentication mechanism.
    select cluster
  5. Pass the config into the connector.
    connector config

Connector config files are JSON-formatted config maps for the Kafka Connect framework to use in passing configurations to the executable JAR. When using the MSK Connect console, we must convert the config file from a JSON config file to single-lined key=value (with no spaces) file.

You need to change some values within the configs for deployment, namely bootstrap.server, sasl.jaas.config and tasks.max. Note the placeholders in the following code for all three configs.

The following code is for MirrorHeartBeatConnector:

connector.class=org.apache.kafka.connect.mirror.MirrorHeartbeatConnector
source.cluster.alias=source
target.cluster.alias=target
clusters=source,target
source.cluster.bootstrap.servers=(SOURCE BOOTSTRAP SERVERS)
target.cluster.security.protocol=SASL_SSL
target.cluster.producer.security.protocol=SASL_SSL
target.cluster.consumer.security.protocol=SASL_SSL
target.cluster.sasl.mechanism=AWS_MSK_IAM
target.cluster.producer.sasl.mechanism=AWS_MSK_IAM
target.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
target.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role):role/mck-role" awsDebugCreds=true;
target.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.security.protocol=SASL_SSL
source.cluster.producer.security.protocol=SASL_SSL
source.cluster.consumer.security.protocol=SASL_SSL
source.cluster.sasl.mechanism=AWS_MSK_IAM
source.cluster.producer.sasl.mechanism=AWS_MSK_IAM
source.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
source.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
topics=.*
topics.exclude=.*[-.]internal, .*.replica, __.*, .*-config, .*-status, .*-offset
groups.exclude=console-consumer-.*, connect-.*, __.*
refresh.groups.enabled=true
refresh.groups.interval.seconds=60
emit.checkpoints.enabled=true
consumer.auto.offset.reset=earliest
producer.linger.ms=500
producer.retry.backoff.ms=1000
producer.max.block.ms=10000
replication.factor=3
tasks.max=1
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter

The following code is for MirrorCheckpointConnector:

connector.class=org.apache.kafka.connect.mirror.MirrorCheckpointConnector
source.cluster.alias=source
target.cluster.alias=target
clusters=source,target
source.cluster.bootstrap.servers=(Source Bootstrap Servers)
target.cluster.bootstrap.servers=(Target Bootstrap Servers)
target.cluster.security.protocol=SASL_SSL
target.cluster.producer.security.protocol=SASL_SSL
target.cluster.consumer.security.protocol=SASL_SSL
target.cluster.sasl.mechanism=AWS_MSK_IAM
target.cluster.producer.sasl.mechanism=AWS_MSK_IAM
target.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
target.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.security.protocol=SASL_SSL
source.cluster.producer.security.protocol=SASL_SSL
source.cluster.consumer.security.protocol=SASL_SSL
source.cluster.sasl.mechanism=AWS_MSK_IAM
source.cluster.producer.sasl.mechanism=AWS_MSK_IAM
source.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
source.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::(Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
topics=.*
topics.exclude=.*[-.]internal, .*.replica, __.*, .*-config, .*-status, .*-offset
groups.exclude=console-consumer-.*, connect-.*, __.*
refresh.groups.enabled=true
refresh.groups.interval.seconds=60
emit.checkpoints.enabled=true
consumer.auto.offset.reset=earliest
producer.linger.ms=500
producer.retry.backoff.ms=1000
producer.max.block.ms=10000
replication.factor=3
tasks.max=1
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
sync.group.offsets.interval.seconds=5

The following code is for MirrorSourceConnector:

connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector
# See note below about the recommendations
tasks.max=(NUMBER OF TASKS)
clusters=source,target
source.cluster.alias=source
target.cluster.alias=target
source.cluster.bootstrap.servers=(SOURCE BOOTSTRAP-SERVER)
source.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.producer.security.protocol=SASL_SSL
source.cluster.producer.sasl.mechanism=AWS_MSK_IAM
source.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
source.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.consumer.security.protocol=SASL_SSL
source.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
source.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
source.cluster.sasl.mechanism=AWS_MSK_IAM
source.cluster.security.protocol=SASL_SSL
source.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.bootstrap.servers=(TARGET BOOTSTRAP-SERVER)
target.cluster.security.protocol=SASL_SSL
target.cluster.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.producer.sasl.mechanism=AWS_MSK_IAM
target.cluster.producer.security.protocol=SASL_SSL
target.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.consumer.security.protocol=SASL_SSL
target.cluster.consumer.sasl.mechanism=AWS_MSK_IAM
target.cluster.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
target.cluster.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam:: (Your Account Number):role/(Your IAM role)" awsDebugCreds=true;
target.cluster.sasl.mechanism=AWS_MSK_IAM
target.cluster.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
refresh.groups.enabled=true
refresh.groups.interval.seconds=60
refresh.topics.interval.seconds=60
topics.exclude=.*[-.]internal,.*.replica,__.*,.*-config,.*-status,.*-offset
emit.checkpoints.enabled=true
topics=.*
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
producer.max.block.ms=10000
producer.linger.ms=500
producer.retry.backoff.ms=1000
sync.topic.configs.enabled=true
sync.topic.configs.interval.seconds=60
refresh.topics.enabled=true
groups.exclude=console-consumer-.*,connect-.*,__.*
consumer.auto.offset.reset=earliest
replication.factor=3

A general guideline for the number of tasks for a MirrorSourceConnector is one task per up to 10 partitions to be mirrored. For example, if a Kafka cluster has 15 topics with 12 partitions each for a total partition count of 180 partitions, we deploy at least 18 tasks for mirroring the workload.

Exceeding the recommended number of tasks for the source connector may lead to offsets that aren’t translated (negative consumer group offsets). For more information about this issue and its workarounds, refer to MM2 may not sync partition offsets correctly.

  1. For the heartbeat and checkpoint connectors, use provisioned scale with one worker, because there is only one task running for each of them.
  2. For the source connector, we set the maximum number of workers to the value decided for the tasks.max property.
    Note that we use the defaults of the auto scaling threshold settings for now.
    worker properties
  3. Although it’s possible to pass custom worker configurations, let’s leave the default option selected.
    worker config
  4. In the Access permissions section, we use the IAM role that we created earlier that has a trust relationship with kafkaconnect.amazonaws.com and kafka-cluster:* permissions. Warning signs display above and below the drop-down menu. These are to remind you that IAM roles and attached policies is a common reason why connectors fail. If you never get any log output upon connector creation, that is a good indicator of an improperly configured IAM role or policy permission problem.
    connect iam role
    On the bottom of this page is a warning box telling us not to use the aptly named AWSServiceRoleForKafkaConnect role. This is an AWS managed service role that MSK Connect needs to perform critical, behind-the-scenes functions upon connector creation. For more information, refer to Using Service-Linked Roles for MSK Connect.
  5. Choose Next.
    Depending on the authorization mechanism chosen when aligning the connector with a specific cluster (we chose IAM), the options in the Security section are preset and unchangeable. If no authentication was chosen and your cluster allows plaintext communication, that option is available under Encryption – in transit.
  6. Choose Next to move to the next page.
    access and encryption
  7. Choose your preferred logging destination for MSK Connect logs. For this post, I select Deliver to Amazon CloudWatch Logs and choose the log group ARN for my MSK Connect logs.
  8. Choose Next.
    logs properties
  9. Review your connector settings and choose Create connector.

A message appears indicating either a successful start to the creation process or immediate failure. You can now navigate to the Log groups page on the CloudWatch console and wait for the log stream to appear.

The CloudWatch logs indicate when connectors are successful or have failed faster than on the Amazon MSK console. You can see a log stream in your chosen log group get created within a few minutes after you create your connector. If your log stream never appears, this is an indicator that there was a misconfiguration in your connector config or IAM role and it won’t work.

cloudwatch

Verify that the connector launched successfully

In this section, we walk through two confirmation steps to determine a successful launch.

Check the log stream

Open the log stream that your connector is writing to. In the log, you can check if the connector has successfully launched and is publishing data to the cluster. In the following screenshot, we can confirm data is being published.

cloudwatch logs

Mirror data

The second step is to create a producer to send data to the source cluster. We use the console producer and consumer that Kafka ships with. You can follow Step 1 from the Apache Kafka quickstart.

  1. On a client machine that can access Amazon MSK, download Kafka from https://kafka.apache.org/downloads and extract it:
    tar -xzf kafka_2.13-3.1.0.tgz
    cd kafka_2.13-3.1.0

  2. Download the latest stable JAR for IAM authentication from the repository. As of this writing, it is 1.1.3:
    cd libs/
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.3/aws-msk-iam-auth-1.1.3-all.jar

  3. Next, we need to create our client.properties file that defines our connection properties for the clients. For instructions, refer to Configure clients for IAM access control. Copy the following example of the client.properties file:
    security.protocol=SASL_SSL
    sasl.mechanism=AWS_MSK_IAM
    sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
    sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

    You can place this properties file anywhere on your machine. For ease of use and simple referencing, I place mine inside kafka_2.13-3.1.0/bin.
    After we create the client.properties file and place the JAR in the libs directory, we’re ready to create the topic for our replication test.

  4. From the bin folder, run the kafka-topics.sh script:
    ./kafka-topics.sh --bootstrap-server $bss --create --topic MirrorMakerTest --replication-factor 2 --partitions 1 --command-config client.properties

    The details of the command are as follows:
    –bootstrap-server – Your bootstrap server of the source cluster.
    –topic – The topic name you want to create.
    –create – The action for the script to perform.
    –replication-factor – The replication factor for the topic.
    –partitions – Total number of partitions to create for the topic.
    –command-config – Additional configurations needed for successful running. Here is where we pass in the client.properties file we created in the previous step.

  5. We can list all the topics to see that it was successfully created:
    ./kafka-topics.sh --bootstrap-server $bss --list --command-config client.properties

    When defining bootstrap servers, it’s recommended to use one broker from each Availability Zone. For example:

    export bss=broker1:9098,broker2:9098,broker3:9098

    Similar to the create topic command, the preceding step simply calls list to show all topics available on the cluster. We can run this same command on our target cluster to see if MirrorMaker has replicated the topic.
    With our topic created, let’s start the consumer. This consumer is consuming from the target cluster. When the topic is mirrored with the default replication policy, it will have a source. prefixed to it.

  6. For our topic, we consume from source.MirrorMakerTest as shown in the following code:
    ./kafka-console-consumer.sh --bootstrap-server $targetcluster --topic source.MirrorMakerTest --consumer.config client.properties

    The details of the code are as follows:
    –bootstrap-server – Your target MSK bootstrap servers
    –topic – The mirrored topic
    –consumer.config – Where we pass in our client.properties file again to instruct the client how to authenticate to the MSK cluster
    After this step is successful, it leaves a consumer running all the time on the console until we either close the client connection or close our terminal session. You won’t see any messages flowing yet because we haven’t started producing to the source topic on the source cluster.

  7. Open a new terminal window, leaving the consumer open, and start the producer:
    ./kafka-console-producer.sh --bootstrap-server $bss --topic MirrorMakerTest --producer.config client.properties

    The details of the code are as follows:
    –bootstrap-server – The source MSK bootstrap servers
    –topic – The topic we’re producing to
    –producer.config – The client.properties file indicating which IAM authentication properties to use

    After this is successful, the console returns >, which indicates that it’s ready to produce what we type. Let’s produce some messages, as shown in the following screenshot. After each message, press Enter to have the client produce to the topic.

    producer input

    Switching back to the consumer’s terminal window, you should see the same messages being replicated and now showing on your console’s output.

    consumer output

Clean up

We can close the client connections now by pressing Ctrl+C to close the connections or by simply closing the terminal windows.

We can delete the topics on both clusters by running the following code:

./kafka-topics.sh --bootstrap-server $bss --delete --topic MirrorMakerTest --command-config client.properties

Delete the source cluster topic first, then the target cluster topic.

Finally, we can delete the three connectors via the Amazon MSK console by selecting them from the list of connectors and choosing Delete.

Conclusion

In this post, we showed how to use MSK Connect for MM2 deployment with IAM authentication. We successfully deployed the Amazon MSK custom plugin, and created the MM2 connector along with the accompanying IAM role. Then we deployed the MM2 connector onto our MSK Connect instances and watched as data was replicated successfully between two MSK clusters.

Using MSK Connect to deploy MM2 eases the administrative and operational burden of Kafka Connect and MM2, because the service handles the underlying resources, enabling you to focus on the connectors and data. The solution removes the need to have a dedicated infrastructure of a Kafka Connect cluster hosted on Amazon services like Amazon Elastic Compute Cloud (Amazon EC2), AWS Fargate, or Amazon EKS. The solution also automatically scales the resources for you (if configured to do so), which eliminates the need for the administers to check if the resources are scaling to meet demand. Additionally, using the Amazon managed service MSK Connect allows for easier compliance and security adherence for Kafka teams.

If you have any feedback or questions, please leave a comment.


About the Authors

tannerTanner Pratt is a Practice Manager at Amazon Web Services. Tanner is leading a team of consultants focusing on Amazon streaming services like Managed Streaming for Apache Kafka, Kinesis Data Streams/Firehose and Kinesis Data Analytics.

edberezEd Berezitsky is a Senior Data Architect at Amazon Web Services.Ed helps customers design and implement solutions using streaming technologies, and specializes on Amazon MSK and Apache Kafka.

Backups to the rescue!

Post Syndicated from Nathan Liefting original https://blog.zabbix.com/backups-to-the-rescue/23442/

In this blog post, you will learn how to set up backups for your Zabbix environment. There’s a wide variety of different options when it comes to taking backups of our Zabbix environment, for us, it will just be a matter of choosing the right fit.

 

Introduction

Monitoring is an important part of our IT infrastructure and often times when our monitoring isn’t working for a certain period, we feel like we are blind as to what is going on with our different IT components. As such, taking backups of our Zabbix environment is an important part of running a production Zabbix environment, as we do want to be prepared for a possible issue that might corrupt or even lose our data. It’s always a possibility and as such we should be prepared.

For Zabbix, there are a few different methods on how to take backups and it all starts at the database level. Both the Zabbix frontend as well as the Zabbix server write their data into the Zabbix database as we can see in the illustration below:

This means that both our configuration as well as all of our collected values are present in the same Zabbix database and if we take a database backup, we back up (almost) everything we need. So, let’s start there and have a look at how we can make a database backup.

How to

MySQL backups

Let’s start with the most used variant of Zabbix databases: MySQL and it’s forks like MariaDB and Percona. All of them can easily be backed up using built-in functionality like the MySQL Dump command and we can then use other industry standards to get things going. First, we have to understand the tables in our database though. Most of the tables in your Zabbix environment contain configuration data and as such, they are all important to backup. There are a few tables that we need to consider, however, as they can contain Giga or even Terabytes of data. These are the History, Trends and Events tables:

It is possible to omit these tables from your backup and make smaller, more manageable backups. To make the backup we can then start using tools like MySQL Dump:

Once we have taken a backup, we can easily import that back into our environment using the MySQL Import command or simply using the cat command:

Do not forget, taking and importing large backups can take a long time. This completely depends on your MySQL database performance tuning settings as well as the underlying resources like CPU, Memory and Disk I/O. Also, make sure to check out the MySQL documentation:

MySQL Dump:  https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html / https://mariadb.com/kb/en/making-backups-with-mysqldump/

MySQL Import: https://dev.mysql.com/doc/refman/8.0/en/mysqlimport.html / https://mariadb.com/kb/en/mysqlimport/

 

Alternatively, it’s also possible to create backups using tools like xtrabackup and mariadbbackup.

PostgreSQL backups

We can actually use the same kinds of methods for the PostgreSQL backups. Keep the required tables in mind and fire away with the built-in tools:

 

Then we can restore it by loading the file into postgres:

What about the configuration files?

Once we have a database backup, everything is backed up, right? Well, almost everything. With just a database backup we are quite safe, but (and this is oftentimes overlooked) there are a lot of configuration files and perhaps even custom scripts we need to take into account! There are three parts to this story – the Zabbix server, the Zabbix frontend, and also the Zabbix additional components. All of them have their own set of configuration files and locations that are used for storing custom scripts.

The Zabbix frontend location and configuration files can be different, depending on the environment, as we have a few choices to make. Are we running Apache or Nginx? On what Linux distribution? All of these have to be considered when making configuration backups. In general, the locations for the configuration would be:

/etc/nginx/
/etc/httpd/
/etc/apache2

There’s also a symlink to the Zabbix frontend configuration file located in /etc/zabbix/ but we will get to that one in a bit.

Then we have the Zabbix server itself, which keeps its configuration in /etc/zabbix/ and if we’re following best practices any script should be placed in /usr/lib/zabbix. So we need:

/etc/zabbix/
/usr/lib/zabbix

Let’s add them to the list and find a method to back up these files. Crontab is a built-in tool that we can use, but there are definitely other (perhaps better) solutions out there. Let’s add the following to cron:

I also added a find command here, which will serve as our roll-over or rotation toll. It will find files older than 180 days and delete them from /mnt/backup/config_files/. Make sure to pick a good (network) folder to store these files as it’s important to keep these safe. Feel free to change the number of days you’d like to store the files for.

What about the additional components like Zabbix proxy, Zabbix Java gateway and Zabbix web service (used for PDF reporting)?. Well, these have configuration files as well. Make sure to run a backup on the devices running these additional components. As for Zabbix proxies – they have the same file locations as Zabbix server:

For Zabbix Java gateway and Zabbix web service, we can omit the /usr/lib/zabbix/ folder.

Don’t forget the import/export files!

In general, database backups are slow to make, but also slow to import back unless we do not include the history/trends in the backup. But even then, restoring an entire database simply because someone made an error on a single template is a hassle. Zabbix ships with the built-in frontend export functionality, allowing us to export (and then import) entire parts of the configuration instantly! We can use these for a number of different parts of the configuration:

  • Hosts
  • Templates
  • Media types
  • Maps
  • images
  • Host groups (API ONLY)
  • Template groups (API ONLY)

All of these are available through the Zabbix API allowing us to choose whether we do a manual configuration backup from the frontend, as well as providing us with automation options using that API. You could even manage and update your Zabbix configuration from GIT entirely if you write the right scripts for this.

Frontend backups

To run an export from the frontend simply go to one of the supported sections like Configuration | Templates and select the export data format. When selecting multiple entities, keep in mind that they will all be exported to a single file.

We can then make our edits and import files from the frontend as well:

For Templates this will even result in a nice diff pop-up window, detailing all the changes, deletes and additions to the templates:

 

API backups

For the API things get a little more complicated as we need to select a mode of execution. Of course, it’s possible to do a curl command from the CLI or even use something like Postman:

Request body

The response will then look something like this:

But this feature really starts to shine once we combine it with our own automation scripts. Use it wisely!

High availability

So, what about high availability? Isn’t that some form of a backup?

Well yes and no. High availability is not an “IT backup” in the form of making sure we can recover something that is broken. But it is a backup in the way that if a Zabbix server instance fails, another one takes over for it. HA is somewhat out of scope for this blog post, but it’s still worth mentioning. There are several solutions to set up Zabbix as a full high availability cluster. For MySQL we can use a Primary/Primary setup, for the frontend we can use load balancing techniques like HAProxy and for the Zabbix server, we can use the built-in high availability method. Combine all of these together and you’ll definitely be able to serve your every (production ready!) need.

Conclusion

To conclude, there are many options to start taking backups of our Zabbix environment. It all starts at the database and these backups are definitely vital to keep things safe in case of disaster. When making the backups, do not forget about the configuration files and custom scripts as well as the frontend backup option. Combining all of these solutions will safeguard our environment, but if that isn’t enough – do not forget about industry standards like snapshots. Even further safeguarding our environment on multiple levels.

I hope you enjoyed reading this blog post. If you have any questions or need help configuring anything on your Zabbix setup feel free to contact me and the team at Opensource ICT Solutions. We build a ton of cool integrations like this and much more!

Nathan Liefting

https://oicts.com

A close up of a logo Description automatically generated

The post Backups to the rescue! appeared first on Zabbix Blog.

How to enable Private Access Tokens in iOS 16 and stop seeing CAPTCHAs

Post Syndicated from João Tomé original https://blog.cloudflare.com/how-to-enable-private-access-tokens-in-ios-16-and-stop-seeing-captchas/

How to enable Private Access Tokens in iOS 16 and stop seeing CAPTCHAs

How to enable Private Access Tokens in iOS 16 and stop seeing CAPTCHAs

You go to a website or service, but before access is granted, there’s a visual challenge that forces you to select bikes, buses or traffic lights in a set of images. That can be an exasperating experience. Now, if you have iOS 16 on your iPhone, those days could be over and are just a one-time toggle enabled away.

CAPTCHA = “Completely Automated Public Turing test to tell Computers and Humans Apart”

In 2021, we took direct steps to end the madness that wastes humanity about 500 years per day called CAPTCHAs, that have been making sure you’re human and not a bot. In August 2022, we announced Private Access Tokens. With that, we’re able to eliminate CAPTCHAs on iPhones, iPads and Macs (and more to come) with open privacy-preserving standards.

On September 12, iOS 16 became generally available (iPad 16 and macOS 13 should arrive in October) and on the settings of your device there’s a toggle that can enable the Private Access Token (PAT) technology that will eliminate the need for those CAPTCHAs, and automatically validate that you are a real human visiting a site. If you already have iOS 16, here’s what you should do to confirm that the toggle is “on” (usually it is):

Settings > Apple ID > Password & Security > Automatic Verification (should be enabled)

How to enable Private Access Tokens in iOS 16 and stop seeing CAPTCHAs

What will you get? A completely invisible, private way to validate yourself, and for a website, a way to automatically verify that real users are visiting the site without the horrible CAPTCHA user experience.

Visitors using operating systems that support these tokens, including the upcoming versions of iPad and macOS, can now prove they’re human without completing a CAPTCHA or giving up personal data.

Let’s recap from our August 2022 announcement blog post what this means for different users:

If you’re an Internet user:

  • We’re helping make your mobile web experience more pleasant and more private.
  • You won’t see a CAPTCHA on a supported iOS or Mac device (other devices coming soon!) accessing the Cloudflare network.

If you’re a web or application developer:

  • You’ll know your users are humans coming from an authentic device and signed application, verified by the device vendor directly.
  • And you’ll validate users without maintaining a cumbersome SDK.

If you’re a Cloudflare customer:

  • You don’t have to do anything! Cloudflare will automatically ask for and use Private Access Tokens when using Managed Challenge.
  • Your visitors won’t see a CAPTCHA.

It’s all about simplicity, without compromising on privacy. The work done over a year was a collaboration between Cloudflare and Apple, Google, and other industry leaders to extend the Privacy Pass protocol with support for a new cryptographic token.

These tokens simplify application security for developers and security teams, and obsolete legacy, third-party SDK-based approaches for determining if a human is using a device. They work for browsers, APIs called by browsers, and APIs called within apps. After Apple announced in August that PATs would be incorporated into iOS 16, iPad 16, and macOS 13, the process of ending CAPTCHAs got a big boost. And we expect additional vendors to announce support in the near future.

Cloudflare has already incorporated PATs into our Managed Challenge platform, so any customer using this feature will automatically take advantage of this new technology to improve the browsing experience for supported devices.

In our August in-depth blog post about PATs, you can learn more about how CAPTCHAs don’t work in mobile environments and PATs remove the need for them, and how when sites can’t challenge a visitor with a CAPTCHA, they collect private data.

Improved privacy

In that blog post, we also explain how Private Access Tokens vastly improve privacy by validating without fingerprinting. So, by partnering with third parties like device manufacturers, who already have the data that would help us validate a device, we are able to abstract portions of the validation process, and confirm data without actually collecting, touching, or storing that data ourselves. Rather than interrogating a device directly, we ask the device vendor to do it for us.

Most customers won’t have to do anything to utilize Private Access Tokens. Why? To take advantage of PATs, all you have to do is choose Managed Challenge rather than Legacy CAPTCHA as a response option in a Firewall rule. More than 65% of Cloudflare customers are already doing this.

Now, if you have iOS 16 on your iPhone, it’s your turn.

Implementing long running deployments with AWS CloudFormation Custom Resources using AWS Step Functions

Post Syndicated from DAMODAR SHENVI WAGLE original https://aws.amazon.com/blogs/devops/implementing-long-running-deployments-with-aws-cloudformation-custom-resources-using-aws-step-functions/

AWS CloudFormation custom resource provides mechanisms to provision AWS resources that don’t have built-in support from CloudFormation. It lets us write custom provisioning logic for resources that aren’t supported as resource types under CloudFormation. This post focusses on the use cases where CloudFormation custom resource is used to implement a long running task/job. With custom resources, you can manage these custom tasks (which are one-off in nature) as deployment stack resources.

The routine pattern used for implementing custom resources is via AWS Lambda function. However, when using the Lambda function as the custom resource provider, you must consider its trade-offs, such as its 15 minute timeout. Tasks involved in the provisioning of certain AWS resources can be long running and could span beyond the Lambda timeout. In these scenarios, you must look beyond the conventional Lambda function-based approach for custom resources.

In this post, I’ll demonstrate how to use AWS Step Functions to implement custom resources using AWS Cloud Development Kit (AWS CDK). Step Functions allow complex deployment tasks to be orchestrated as a step-by-step workflow. It also offers direct integration with any AWS service via AWS SDK integrations. By default the CloudFormation stack waits for 1 hour before timing out. The timeout can be increased to maximum 12 hours using wait conditions. In this post, you’ll also see how to use wait conditions with custom resource to run long running deployment tasks as part of a CloudFormation stack.

Prerequisites

Before proceeding any further, you must identify and designate an AWS account required for the solution to work. You must also create an AWS account profile in ~/.aws/credentials for the designated AWS account, if you don’t already have one. The profile must have sufficient permissions to run an AWS CDK stack. It should be your private profile and only be used during the course of this post. Therefore, it should be fine if you want to use admin privileges. Don’t share the profile details, especially if it has admin privileges. I recommend removing the profile when you’re finished with this walkthrough. For more information about creating an AWS account profile, see Configuring the AWS CLI.

Services and frameworks used in the post include CloudFormation, Step Functions, Lambda, DynamoDB, Amazon S3, and AWS CDK.

Solution overview

The following architecture diagram shows the application of Step Functions to implement custom resources.

Architecture diagram

Figure 1. Architecture diagram

  1. The user deploys a CloudFormation stack that includes a custom resource implementation.
  2. The CloudFormation custom resource triggers a Lambda function with the appropriate event which can be CREATE/UPDATE/DELETE.
  3. The custom resource Lambda function invokes Step Functions workflow and offloads the event handling responsibility. The CloudFormation event and context are wrapped inside the Step Function input at the time of invocation.
  4. The custom resource Lambda function returns SUCCESS back to CloudFormation stack indicating that the custom resource provisioning has begun. CloudFormation stack then goes into waiting mode where it waits for a SUCCESS or FAILURE signal to continue.
  5. In the interim, Step Functions workflow handles the custom resource event through one or more steps.
  6. Step Functions workflow prepares the response to be sent back to CloudFormation stack.
  7. Send Response Lambda function sends a success/failure response back to CloudFormation stack. This propels CloudFormation stack out of the waiting mode and into completion.

Solution deep dive

In this section I will get into the details of several key aspects of the solution

Custom Resource Definition

Following code snippet shows the custom resource definition which can be found here. Please note that we also define AWS::CloudFormation::WaitCondition and AWS::CloudFormation::WaitConditionHandle alongside the custom resource. AWS::CloudFormation::WaitConditionHandle resource sets up a pre-signed URL which is passed into the CallbackUrl property of the Custom Resource.

The final completion signal for the custom resource i.e. SUCCESS/FAILURE is received over this CallbackUrl. To learn more about wait conditions please refer to its user guide here. Note that, when updating the custom resource, you cannot use the existing WaitCondition-WaitConditionHandle resource pair. You need to create a new pair for tracking each update/delete operation on the custom resource.

/************************** Custom Resource Definition *****************************/
// When you intend to update CustomResource make sure that a new WaitCondition and 
// a new WaitConditionHandle resource is created to track CustomResource update.
// The strategy we are using here is to create a hash of Custom Resource properties.
// The resource names for WaitCondition and WaitConditionHandle carry this hash.
// Anytime there is an update to the custom resource properties, a new hash is generated,
// which automatically leads to new WaitCondition and WaitConditionHandle resources.
const resourceName: string = getNormalizedResourceName('DemoCustomResource');
const demoData = {
    pk: 'demo-sfn',
    sk: resourceName,
    ts: Date.now().toString()
};
const dataHash = hash(demoData);
const wcHandle = new CfnWaitConditionHandle(
    this, 
    'WCHandle'.concat(dataHash)
)
const customResource = new CustomResource(this, resourceName, {
    serviceToken: customResourceLambda.functionArn,
    properties: {
        DDBTable: String(demoTable.tableName),
        Data: JSON.stringify(demoData),
        CallbackUrl: wcHandle.ref
    }
});
        
// Note: AWS::CloudFormation::WaitCondition resource type does not support updates.
new CfnWaitCondition(
    this,
    'WC'.concat(dataHash),
    {
        count: 1,
        timeout: '300',
        handle: wcHandle.ref
    }
).node.addDependency(customResource)
/**************************************************************************************/

Custom Resource Lambda

Following code snippet shows how the custom resource lambda function passes the CloudFormation event as an input into the StepFunction at the time of invocation. CloudFormation event contains the CallbackUrl resource property I discussed in the previous section.

private async startExecution() {
    const input = {
        cfnEvent: this.event,
        cfnContext: this.context
    };
    const params: StartExecutionInput = {
        stateMachineArn: String(process.env.SFN_ARN),
        input: JSON.stringify(input)
    };
    let attempt = 0;
    let retry = false;
    do {
        try {
            const response = await this.sfnClient.startExecution(params).promise();
            console.debug('Response: ' + JSON.stringify(response));
            retry = false;

Custom Resource StepFunction

The StepFunction handles the CloudFormation event based on the event type. The CloudFormation event containing CallbackUrl is passed down the stages of StepFunction all the way to the final step. The last step of the StepFunction sends back the response over CallbackUrl via send-cfn-response lambda function as shown in the following code snippet.

/**
 * Send response back to cloudformation
 * @param event
 * @param context
 * @param response
 */
export async function sendResponse(event: any, context: any, response: any) {
    const responseBody = JSON.stringify({
        Status: response.Status,
        Reason: "Success",
        UniqueId: response.PhysicalResourceId,
        Data: JSON.stringify(response.Data)
    });
    console.debug("Response body:\n", responseBody);
    const parsedUrl = url.parse(event.ResourceProperties.CallbackUrl);
    const options = {
        hostname: parsedUrl.hostname,
        port: 443,
        path: parsedUrl.path,
        method: "PUT",
        headers: {
            "content-type": "",
            "content-length": responseBody.length
        }
    };
    await new Promise(() => {
        const request = https.request(options, function(response: any) {
	    console.debug("Status code: " + response.statusCode);
	    console.debug("Status message: " + response.statusMessage);
	    context.done();
    	})
	request.on("error", function(error) {
	    console.debug("send(..) failed executing https.request(..): " + error);
	    context.done();
	});
	request.write(responseBody);
	request.end();
    });
    return;
}

Demo

Clone the GitHub repo cfn-custom-resource-using-step-functions and navigate to the folder cfn-custom-resource-using-step-functions. Now, execute the script script-deploy.sh by passing the name of the AWS profile that you created in the prerequisites section above. This should deploy the solution. The commands are shown as follows for your reference. Note that if you don’t pass the AWS profile name ‘default’ the profile will be used for deployment.

git clone 
cd cfn-custom-resource-using-step-functions
./script-deploy.sh "<AWS- ACCOUNT-PROFILE-NAME>"

The deployed solution consists of 2 stacks as shown in the following screenshot

  1. cfn-custom-resource-common-lib: Deploys common components
    • DynamoDB table that custom resources write to during their lifecycle events
    • Lambda layer used across the rest of the stacks
  2. cfn-custom-resource-sfn: Deploys Step Functions backed custom resource implementation
CloudFormation stacks deployed

Figure 2. CloudFormation stacks deployed

For demo purposes, I implemented a custom resource that inserts data into the DynamoDB table. When you deploy the solution for the first time, like you just did in the previous step, it initiates a CREATE event resulting in the creation of a new custom resource using Step Functions. You should see a new record with unix epoch timestamp in the DynamoDB table, indicating that the resource was created as shown in the following screenshot. You can find the DynamoDB table name/arn from the SSM Parameter Store /CUSTOM_RESOURCE_PATTERNS/DYNAMODB/ARN

DynamoDB record indicating custom resource creation

Figure 3. DynamoDB record indicating custom resource creation

Now, execute the script script-deploy.sh again. This should initiate an UPDATE event, resulting in the update of custom resources. The code also automatically creates new WaitConditionHandle and WaitCondition resources required to wait for the update event to finish. Now you should see that the records in the DynamoDb table have been updated with new values for lastOperation and ts attributes as follows.

DynamoDB record indicating custom resource update

Figure 4. DynamoDB record indicating custom resource update

Cleaning up

To remove all of the stacks, run the script script-undeploy.sh as follows.

./script-undeploy.sh "<AWS- ACCOUNT-PROFILE-NAME>"

Conclusion

In this post I showed how to look beyond the conventional approach of building CloudFormation custom resources using a Lambda function. I discussed implementing custom resources using Step Functions and CloudFormation wait conditions. Try this solution in scenarios where you must execute a long running deployment task/job as part of your CloudFormation stack deployment.

 

 

About the author:

Damodar Shenvi

Damodar Shenvi Wagle is a Cloud Application Architect at AWS Professional Services. His areas of expertise include architecting serverless solutions, CI/CD and automation.

Docker Container Monitoring With Zabbix

Post Syndicated from Dmitry Lambert original https://blog.zabbix.com/docker-container-monitoring-with-zabbix/20175/

In this blog post, I will cover Docker container monitoring with Zabbix. We will use the official Docker by Zabbix agent 2 template to make things as simple as possible. The template download link and configuration steps can be found on the Zabbix Integrations page. If you require a visual guide, I invite you to check out my video covering this topic.

Importing the official Docker template

Importing the Docker by Zabbix agent 2 template

Since we will be using the official Docker by Zabbix agent 2 template, first, we need to make sure that the template is actually available in our Zabbix instance. The template is available for Zabbix versions 5.0, 5.4, and 6.0. If you cannot find this template under Configuration – Templates, chances are that you haven’t imported it into your environment after upgrading Zabbix to one of the aforementioned versions. Remember that Zabbix does not modify or import any templates during the upgrade process, so we will have to import the template manually. If that is so, simply download the template from the official Zabbix git page (or use the link in the introduction) and import it into your Zabbix instance by using the Import button in the Configuration – Templates section.

Installing and configuring Zabbix agent 2

Before we get started with configuring our host, we first have to install Zabbix agent 2 and configure it according to the template guidelines. Follow the steps in the download section of the Zabbix website and install the zabbix-agent2 package. Feel free to use any other agent deployment methods if you want to (like compiling the agent from the source files)

Installing Zabbix agent2 from packages takes just a few simple steps:

Install the Zabbix repository package:

rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm

Install the Zabbix agent 2 package:

dnf install zabbix-agent2

Configure the Server parameter by populating it with your Zabbix server/proxy address

vi /etc/zabbix/zabbix_agent2.conf
### Option: Server
# List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers and Zabbix proxies.
# Incoming connections will be accepted only from the hosts listed here.
# If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally
# and '::/0' will allow any IPv4 or IPv6 address.
# '0.0.0.0/0' can be used to allow any IPv4 address.
# Example: Server=127.0.0.1,192.168.1.0/24,::1,2001:db8::/32,zabbix.example.com
#
# Mandatory: yes, if StartAgents is not explicitly set to 0
# Default:
# Server=

Server=192.168.50.49

Plugin specific Zabbix agent 2 configuration

Zabbix agent 2 provides plugin-specific configuration parameters. Mostly these are optional parameters related to a specific plugin. You can find the full list of plugin-specific configuration parameters in the Zabbix documentation. In the newer versions of Zabbix agent 2, the plugin-specific parameters are defined in separate plugin configuration files, located in /etc/zabbix/zabbix_agent2.d/plugins.d/, while in older versions, they are defined directly in the zabbix_agent2.conf file.

For the Zabbix agent 2 Docker plugin, we have to provide the Docker daemon unix-socket location. This can be done by specifying the following plugin parameter:

### Option: Plugins.Docker.Endpoint
# Docker API endpoint.
#
# Mandatory: no
# Default: unix:///var/run/docker.sock
# Plugins.Docker.Endpoint=unix:///var/run/docker.sock

The default socket location will be correct for your Docker environment – in that case, you can leave the configuration file as-is.

Once we have made the necessary changes in the Zabbix agent 2 configuration files, start and enable the agent:

systemctl enable zabbix-agent2 --now

Check if the Zabbix agent2 is running:

tail -f /var/log/zabbix/zabbix_agent2.log

Before we move on to Zabbix frontend, I would like to point your attention to the Docker socket file permission – the zabbix user needs to have access to the Docker socket file. The zabbix user should be added to the docker group to resolve the following error messages.

[Docker] cannot fetch data: Get http://1.28/info: dial unix /var/run/docker.sock: connect: permission denied
ZBX_NOTSUPPORTED: Cannot fetch data.

You can add the zabbix user to the Docker group by executing the following command:

usermod -aG docker zabbix

Configuring the docker host

Configuring the host representing our Docker environment

After importing the template, we have to create a host which will represent our Docker instance. Give the host a name and assign it to a Host group – I will assign it to the Linux servers host group. Assign the Docker by Zabbix agent 2 template to the host. Since the template uses Zabbix agent 2 to collect the metrics, we also have to add an agent interface on this host. The address of the interface should point to the machine running your Docker containers. Finish up the host configuration by clicking the Add button.

Docker by Zabbix agent 2 template

Regular docker template items

The template contains a set of regular items for the general Docker instance metrics, such as the number of available images, Docker architecture information, the total number of containers, and more.

Docker tempalte Low-level discovery rules

On top of that, the template also gathers container and image-specific information by using low-level discovery rules.

Once Zabbix discovers your containers and images, these low-level discovery rules will then be used to create items, triggers, and graphs from prototypes for each of your containers and images. This way, we can monitor container or image-specific metrics, such as container memory, network information, container status, and more.

Docker templates Low-level discovery item prototypes

Verifying the host and template configuration

To verify that the agent and the host are configured correctly, we can use Zabbix get command-line tool and try to poll our agent. If you haven’t installed Zabbix get, do so on your Zabbix server or Zabbix proxy host:

dnf install zabbix-get

Now we can use zabbix-get to verify that our agent can obtain the Docker-related metrics. Execute the following command:

zabbix_get -s docker-host -k docker.info

Use the -s parameter to specify your agent host’s host name or IP address. The -k parameter specifies the item key for which we wish to obtain the metrics by polling the agent with Zabbix get.

zabbix_get -s 192.168.50.141 -k docker.info

{"Id":"SJYT:SATE:7XZE:7GEC:XFUD:KZO5:NYFI:L7M5:4RGO:P2KX:QJFD:TAVY","Containers":2,"ContainersRunning":2,"ContainersPaused":0,"ContainersStopped":0,"Images":2,"Driver":"overlay2","MemoryLimit":true,"SwapLimit":true,"KernelMemory":true,"KernelMemoryTCP":true,"CpuCfsPeriod":true,"CpuCfsQuota":true,"CPUShares":true,"CPUSet":true,"PidsLimit":true,"IPv4Forwarding":true,"BridgeNfIptables":true,"BridgeNfIP6tables":true,"Debug":false,"NFd":39,"OomKillDisable":true,"NGoroutines":43,"LoggingDriver":"json-file","CgroupDriver":"cgroupfs","NEventsListener":0,"KernelVersion":"5.4.17-2136.300.7.el8uek.x86_64","OperatingSystem":"Oracle Linux Server 8.5","OSVersion":"8.5","OSType":"linux","Architecture":"x86_64","IndexServerAddress":"https://index.docker.io/v1/","NCPU":1,"MemTotal":1776848896,"DockerRootDir":"/var/lib/docker","Name":"localhost.localdomain","ExperimentalBuild":false,"ServerVersion":"20.10.14","ClusterStore":"","ClusterAdvertise":"","DefaultRuntime":"runc","LiveRestoreEnabled":false,"InitBinary":"docker-init","SecurityOptions":["name=seccomp,profile=default"],"Warnings":null}

In addition, we can also use the low-level discovery key – docker.containers.discovery[false] to check the result of the low-level discovery.

zabbix_get -s 192.168.50.141 -k docker.containers.discovery[false]

[{"{#ID}":"a1ad32f5ee680937806bba62a1aa37909a8a6663d8d3268db01edb1ac66a49e2","{#NAME}":"/apache-server"},{"{#ID}":"120d59f3c8b416aaeeba50378dee7ae1eb89cb7ffc6cc75afdfedb9bc8cae12e","{#NAME}":"/mysql-server"}]

We can see that Zabbix will discover and start monitoring two containers – apache-server and mysql-server. Any agent low-level discovery rule or item can be checked with Zabbix get.

Docker template in action

Discovered items on our Docker host

Now that we have configured our agent and host, applied the Docker template, and verified that everything is working, we should be able to see the discovered entities in the frontend.

Collected Docker container metrics

In addition, our metrics should have also started coming in. We can check the Latest data section and verify that they are indeed getting collected.

Macros inherited from the Docker template

Lastly, we have a few additional options for further modifying the template and the results of our low-level discovery. If you open the Macros section of your host and select Inherited and host macros, you will notice that there are 4 macros inherited from the Docker template. These macros are responsible for filtering in/out the discovered containers and images. Feel free to modify these values if you wish to filter in/out the discovery of these entities as per your requirements.

Notice that the container discovery item also has one parameter, which is defined as false on the template:

  • docker.containers.discovery[false] – Discover only running containers
  • docker.containers.discovery[true] – Discover all containers, no matter their state.

And that’s it! We successfully imported the template, installed and configured Zabbix agent 2, created a host, and applied the Docker template. Finally – our Zabbix instance is now monitoring our Docker environment! If you have any other questions or comments, feel free to leave a response in the comments section of this post.

 

The post Docker Container Monitoring With Zabbix appeared first on Zabbix Blog.

Webhooks in Zabbix

Post Syndicated from Andrey Biba original https://blog.zabbix.com/webhooks-in-zabbix/19935/

Zabbix is not only a flexible and versatile monitoring system but also a convenient tool for generating alerts and integrating with existing service desks. Among the various integration methods, webhooks have become the most popular. In this blog post, we will take a look at what are webhooks, how they can be used to integrate Zabbix with an external solution, and also take a look at some use case examples for webhook integrations.

What is a webhook?

Generally speaking, a webhook is a method of augmenting or altering the behavior of a web page or web application with custom callbacks. But to put it simply, a webhook is an automatic reaction to an event. If an event occurs (for example, a problem appears), then the webhook makes a call (via HTTP / HTTPS) to a third-party service to notify it about the event. Many existing solutions provide an API that allows you to interact with them via webhooks.

The webhook in Zabbix is implemented using JavaScript, so writing code does not require knowledge of a specific Zabbix syntax, and due to the prevalence of the JavaScript language, you can find many examples, tips, and guides on the Internet.

How does a webhook work?

Essentially, a webhook is code that makes a sequence of calls to achieve some result. In the case of Zabbix, a JavaScript code is executed that accesses the service API and transfers, updates, and retrieves data from there. For example, we need to open a ticket at the service desk and leave a comment on the ticket, which will contain information about the problem. For this we need:

  • Log in to the service and get a token
  • Make a request with the token to create a ticket
  • Create a comment on the newly created ticket using a token

In different services, the details may differ, but the general idea will be preserved from service to service.

How to use it?

Our integration team constantly communicates with the community and monitors the most popular services to develop official out-of-the-box integrations for them. At the moment, Zabbix provides a vast selection of out-of-the-box webhooks for the most popular services, and we review new ones and improve current ones every day.

In most cases, setting up a ready-made webhook comes down to 3-4 steps, which are described in the README file in the Zabbix repository. Usually, it is necessary to generate an API key in the service, set it in Zabbix, set the URL to the service endpoint URL, and specify a couple of parameters required for the webhook to work.

In addition to ready-made solutions, there is a Github community repository where custom templates and webhooks are laid out! If you are the author of a webhook or a template, please share it with the community by submitting it to this repository!

Example – Telegram webhook

The theory is good, but we are all interested in how it is implemented in practice. Let’s look at a Telegram webhook as an example. Now this messenger is very popular and it will be relevant to use it as an example.

First of all, let’s go to the Zabbix repository or navigate to the Zabbix website integrations section to read the setup instructions. In the repository, all templates and notification methods are located in the /templates folder, and for each of them, there is a README file with a detailed description.

From the Telegram side, we need to create a bot and get its token following the instructions and set it in the Token parameter.

After that, we create a user, set up a Telegram media type for this user, and in the “Send to” field we write the id of the user or group chat.

Voila! Your webhook is set up and ready to send notifications or event information!

As you may have noticed, the setup did not take much time and did not require deep knowledge. Naturally, for finer tuning, it is possible to edit the content of messages, the type of problems, intervals, and other parameters. But even without additional changes, notifications are already ready to go.

Is it difficult to write a webhook on your own?

Of course, creating a webhook requires certain skills.

First of all, knowledge of JavaScript is required. The language itself is not difficult and can be mastered relatively quickly. The Zabbix documentation site has a guideline for writing webhooks with recommendations and best practices.

Secondly, understanding how Zabbix works. This does not require an in-depth understanding of Zabbix and the ability to follow basic instructions will be enough. You can read more about setting up notification methods in the official documentation. It is important to properly configure the webhook itself, grant rights to users, and set up a notification action for the necessary triggers.

And thirdly, study the documentation of the service for which the webhook will be written. Although all APIs work on the same principle, they can differ greatly from each other in methods and request structure. It is also necessary to understand the service itself to understand how it works. It is difficult to write an integration if it is not clear how Zabbix should properly interact with the service being integrated.

Summarizing

Webhook is a modern and flexible way of integration that allows Zabbix to be a universal solution. Since the realities of our world imply a large number of different systems, and as a result – many people working together – webhooks are becoming an indispensable tool in notification automation. A properly written and configured webhook is an effective solution for flexible notifications.

In the next article, together we will learn the basic methods and requests that are needed to send alerts, receive updates and assign tags. For this purpose, we will completely inspect some webhook in close detail.

Questions

Q: We have a ready-made notification system built on scripts. Does it make sense to rewrite it to a webhook?

A: Certainly. Firstly, the webhook is executed natively in Zabbix, which will be much more productive than in an external script. Secondly, the webhook is much more flexible, more functional, and much easier to make changes to.

 

Q: We have a service for which we would like to write an integration, but we do not have qualified specialists who could do it. Is it possible to request such integration from Zabbix?

A: Yes, if you are a Zabbix partner, you can leave a request to create such integration.

 

The post Webhooks in Zabbix appeared first on Zabbix Blog.

Tags in Zabbix 6.0 LTS – Usage, subfilters and guidelines

Post Syndicated from Andrey Biba original https://blog.zabbix.com/tags-in-zabbix-6-0-lts-usage-subfilters-and-guidelines/19565/

Starting from Zabbix 5.4, item tags have completely replaced applications. This design decision has allowed us to implement many new usability improvements – from providing additional information and classification to the tagged entities, to defining action conditions and security permissions by referencing specific tags and their values. Let’s take a look at how tags are defined in the official Zabbix templates and some of the potential tag use cases when configuring actions and access permissions.

Tag usage in Zabbix 6.0

The outdated “applications” have been replaced by tags, which I wanted to talk about in more detail today.

The main difference between tags and applications is that tags are defined using a name and a value, which greatly expands their scope of usage. Now tags are used in items, triggers, hosts, services, user groups for permission configuration, actions, and more. I am sure that their scope will expand with each new release. 

Due to the structural difference between “applications” and tags, filtering tools had to be adapted. For example, in the “Latest data” section in Zabbix 6.0, sub-filters have been redesigned to support tags and provide granular filtering options. Grouping tags by name allowed to save space and made using sub-filters more intuitive.

To optimize the work with tags, we have developed several standards for different template elements. 

Template

Now each template contains the mandatory class and target tags. Using these tags will allow distribution templates by class, such as application, database, network, etc., and by the target.

Items

Mandatory component tag that describes whether the data element belongs to a particular system or type. If a metric belongs to several types at once, it is necessary to use several component tags to describe the relevant component assignment as best as possible. 

Custom tags are also allowed for low-level discovery data elements using LLD macros. 

Triggers

The scope tag is assigned to the trigger based on the issue type. The general idea is to organize triggers into 5 groups: availability, performance, notification, security and capacity

Hosts

For a host, the service tag is used, which defines a single service or multiple services running on this host. 

Example of tagging on a ClickHouse by HTTP template 

Let’s start with the tags of the template itself. It has class: database and target: clickhouse tags assigned to it. You shouldn’t assign too many tags on the template level, because each of these tags will be inherited by template elements, which can create unnecessary redundancy, and as a result, a “mess” of tags. 

Let’s take a look at a few metrics and triggers from this template.

The “ClickHouse: Check port availability” metric is assigned the component: health and component: network tags, as it contains information about the health of the service and the checks are performed over the network. Problems on this metric can be displayed to the group responsible for the network 

The “ClickHouse: Get information about dictionaries” metric has a tag component: dictionaries because it explicitly refers to dictionaries, and a tag component: raw, because it is a master metric, and dependent metrics get data from it.

The metrics from the low-level discovery “Replicas” rule contain the component: replication, database: {#DB}, and table: {#TABLE} tags. LLD metrics allow custom tags as they allow the use of low-level discovery macros for grouping flexibility. 

 

Trigger “ClickHouse: Version has changed (new version: {ITEM.VALUE}” with scope: notice tag implies a simple notification that does not contain critical information related to system unavailability and performance. At the same time, trigger “ClickHouse: Port {$CLICKHOUSE. PORT} is unavailable” means the system is unavailable and has the tag scope: availability. 

 

How to use tags?

As I wrote earlier, right now we are using tags for the majority of Zabbix components, so they become a functional and flexible tool for managing monitoring. One of the latest such implementations is Services – now they can also have tags assigned to them. 

Of course, one of the most obvious use cases is the logical grouping of some elements. This allows filtering triggers and metrics by given parameters.

The next use case is also of significant importance – it’s the extension of the rights management functionality. With the help of tags, it is possible to add a layer of granularity so a Zabbix user can view problems for a particular service. For example, we need to provide access to Nginx servers that are in the Webservers group. To do this, just add the read permissions for the Webservers group in the Permissions section of a User group and select the Webservers group in the Tag Filter section and add the service: nginx tag. You can find more information about user groups on our official Zabbix documentation page.

Using tags in permissions

Let’s look at the use of rights with a practical example. Suppose there are 3 user groups: 

  • Hardware team – a team of administrators that is responsible for hardware 
  • Network team – a team of administrators that is responsible for the network and network hardware
  • Software team – a team of administrators that is responsible for software 

For each group assign the following permissions: 

  • for the Hardware team, set the read permissions for the Hardware group
    • In the tag filters, set the tag and tag value to scope: availability in the tag filter because we want the team to see only availability problems.
  • for the Network team, set the read permissions for the Database, Hardware, Linux servers, Network groups
    • In the tag filters for the Database, Hardware and Linux servers set the tag and tag value to component: network in the tag filter, because for these groups it is necessary to see only problems related to the network.
    • in the tag filters for the Network host group, we have to set “All tags” since we’re interested in seeing all of the problems related to hosts in this host group.
  • for the Software team, set the read permissions for the Databases and Linux servers groups
    • In the tag filters set, the tag and tag value to class: software for each group to see events exclusively related to software. 

With this configuration, each user group will see only those problems that fall under the respective permissions and tag filters. 

Remember, that  a Super admin user will see all of the problems created in Zabbix

While users belonging to User roles of type Administrator or User will see a restricted set of problems based on their permissions:

  • Users from the Hardware team group will only see problems for hosts from the Hardware group and triggers with the tag scope: availability.

  • A user who is in the Network team will see all problems with the component: network tag and all triggers for the Network host group.

  • And users of the Software team only have access to problems with the class: software tag.

Using tags in actions

And of course the use of tags in actions. Pretty often required to set up quite complex conditions which may become confusing and hard to maintain. Tags, as a universal tool, add another entity that you can use when creating actions.

For example, if we want to send notifications about network availability problems with a severity greater than Warning to network administrators, we can specify the following conditions for our action:

  • value of tag class equals network
  • value of tag scope equals availability
  • Trigger severity is greater than or equal to Warning

Questions

Q: Are tags a full-featured replacement for “applications” or are there any downsides? 

A: Of course, they not only replace the functional “applications” but also extend the functionality of using tags in various aspects of Zabbix. 

 

Q: Is the current implementation of tags finalized or is there more to come?

A: No, we are working every day to improve the experience gained from using Zabbix, and tags in particular, so we are listening to the opinion of the community and adapting the functionality for the best result. If you have any ideas or comments, please use the official Zabbix forum and the Zabbix support portal to share them with us!

 

Q: Is there a document describing the best practice approach of using tags?

A: Yes, there is a guideline section on the documentation site that contains recommendations for best tags usage in Zabbix.

The post Tags in Zabbix 6.0 LTS – Usage, subfilters and guidelines appeared first on Zabbix Blog.

Creating computing quotas on AWS Outposts rack with EC2 Capacity Reservations sharing

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/creating-computing-quotas-on-aws-outposts-rack-with-ec2-capacity-reservation-sharing/

This post is written by Yi-Kang Wang, Senior Hybrid Specialist SA.

AWS Outposts rack is a fully managed service that delivers the same AWS infrastructure, AWS services, APIs, and tools to virtually any on-premises datacenter or co-location space for a truly consistent hybrid experience. AWS Outposts rack is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies. In addition to these benefits, we have started to see many of you need to share Outposts rack resources across business units and projects within your organization. This blog post will discuss how you can share Outposts rack resources by creating computing quotas on Outposts with Amazon Elastic Compute Cloud (Amazon EC2) Capacity Reservations sharing.

In AWS Regions, you can set up and govern a multi-account AWS environment using AWS Organizations and AWS Control Tower. The natural boundaries of accounts provide some built-in security controls, and AWS provides additional governance tooling to help you achieve your goals of managing a secure and scalable cloud environment. And while Outposts can consistently use organizational structures for security purposes, Outposts introduces another layer to consider in designing that structure. When an Outpost is shared within an Organization, the utilization of the purchased capacity also needs to be managed and tracked within the organization. The account that owns the Outpost resource can use AWS Resource Access Manager (RAM) to create resource shares for member accounts within their organization. An Outposts administrator (admin) can share the ability to launch instances on the Outpost itself, access to the local gateways (LGW) route tables, and/or access to customer-owned IPs (CoIP). Once the Outpost capacity is shared, the admin needs a mechanism to control the usage and prevent over utilization by individual accounts. With the introduction of Capacity Reservations on Outposts, we can now set up a mechanism for computing quotas.

Concept of computing quotas on Outposts rack

In the AWS Regions, Capacity Reservations enable you to reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration you need. On May 24, 2021, Capacity Reservations were enabled for Outposts rack. It supports not only EC2 but Outposts services running over EC2 instances such as Amazon Elastic Kubernetes (EKS), Amazon Elastic Container Service (ECS) and Amazon EMR. The computing power of above services could be covered in your resource planning as well. For example, you’d like to launch an EKS cluster with two self-managed worker nodes for high availability. You can reserve two instances with Capacity Reservations to secure computing power for the requirement.

Here I’ll describe a method for thinking about resource pools that an admin can use to manage resource allocation. I’ll use three resource pools, that I’ve named reservation pool, bulk and attic, to effectively and extensibly manage the Outpost capacity.

A reservation pool is a resource pool reserved for a specified member account. An admin creates a Capacity Reservation to match member account’s need, and shares the Capacity Reservation with the member account through AWS RAM.

A bulk pool is an unreserved resource pool that is used when member accounts run out of compute capacity such as EC2, EKS, or other services using EC2 as underlay. All compute capacity in the bulk pool can be requested to launch until it is exhausted. Compute capacity that is not under a reservation pool belongs to the bulk pool by default.

An attic is a resource pool created to hold the compute capacity that the admin wouldn’t like to share with member accounts. The compute capacity remains in control by admin, and can be released to the bulk pool when needed. Admin creates a Capacity Reservation for the attic and owns the Capacity Reservation.

The following diagram shows how the admin uses Capacity Reservations with AWS RAM to manage computing quotas for two member accounts on an Outpost equipped with twenty-four m5.xlarge. Here, I’m going to break the idea into several pieces to help you understand easily.

  1. There are three Capacity Reservations created by admin. CR #1 reserves eight m5.xlarge for the attic, CR #2 reserves four m5.xlarge instances for account A and CR #3 reserves six m5.xlarge instances for account B.
  2. The admin shares Capacity Reservation CR #2 and CR #3 with account A and B respectively.
  3. Since eighteen m5.xlarge instances are reserved, the remaining compute capacity in the bulk pool will be six m5.xlarge.
  4. Both Account A and B can continue to launch instances exceeding the amount in their Capacity Reservation, by utilizing the instances available to everyone in the bulk pool.

Concept of defining computing quotas

  1. Once the bulk pool is exhausted, account A and B won’t be able to launch extra instances from the bulk pool.
  2. The admin can release more compute capacity from the attic to refill the bulk pool, or directly share more capacity with CR#2 and CR#3. The following diagram demonstrates how it works.

Concept of refilling bulk pool

Based on this concept, we realize that compute capacity can be securely and efficiently allocated among multiple AWS accounts. Reservation pools allow every member account to have sufficient resources to meet consistent demand. Making the bulk pool empty indirectly sets the maximum quota of each member account. The attic plays as a provider that is able to release compute capacity into the bulk pool for temporary demand. Here are the major benefits of computing quotas.

  • Centralized compute capacity management
  • Reserving minimum compute capacity for consistent demand
  • Resizable bulk pool for temporary demand
  • Limiting maximum compute capacity to avoid resource congestion.

Configuration process

To take you through the process of configuring computing quotas in the AWS console, I have simplified the environment like the following architecture. There are four m5.4xlarge instances in total. An admin account holds two of the m5.4xlarge in the attic, and a member account gets the other two m5.4xlarge for the reservation pool, which results in no extra instance in the bulk pool for temporary demand.

Prerequisites

  • The admin and the member account are within the same AWS Organization.
  • The Outpost ID, LGW and CoIP have been shared with the member account.

Architecture for configuring computing quotas

  1. Creating a Capacity Reservation for the member account

Sign in to AWS console of the admin account and navigate to the AWS Outposts page. Select the Outpost ID you want to share with the member account, choose Actions, and then select Create Capacity Reservation. In this case, reserve two m5.4xlarge instances.

Create a capacity reservation

In the Reservation details, you can terminate the Capacity Reservation by manually enabling or selecting a specific time. The first option of Instance eligibility will automatically count the number of instances against the Capacity Reservation without specifying a reservation ID. To avoid misconfiguration from member accounts, I suggest you select Any instance with matching details in most use cases.

Reservation details

  1. Sharing the Capacity Reservation through AWS RAM

Go to the RAM page, choose Create resource share under Resource shares page. Search and select the Capacity Reservation you just created for the member account.

Specify resource sharing details

Choose a principal that is an AWS ID of the member account.

Choose principals that are allowed to access

  1. Creating a Capacity Reservation for attic

Create a Capacity Reservation like step 1 without sharing with anyone. This reservation will just be owned by the admin account. After that, check Capacity Reservations under the EC2 page, and the two Capacity Reservations there, both with availability of two m5.4xlarge instances.

3.	Creating a Capacity Reservation for attic

  1. Launching EC2 instances

Log in to the member account, select the Outpost ID the admin shared in step 2 then choose Actions and select Launch instance. Follow AWS Outposts User Guide to launch two m5.4xlarge on the Outpost. When the two instances are in Running state, you can see a Capacity Reservation ID on Details page. In this case, it’s cr-0381467c286b3d900.

Create EC2 instances

So far, the member account has run out of two m5.4xlarge instances that the admin reserved for. If you try to launch the third m5.4xlarge instance, the following failure message will show you there is not enough capacity.

Launch status

  1. Allocating more compute capacity in bulk pool

Go back to the admin console, select the Capacity Reservation ID of the attic on EC2 page and choose Edit. Modify the value of Quantity from 2 to 1 and choose Save, which means the admin is going to release one more m5.4xlarge instance from the attic to the bulk pool.

Instance details

  1. Launching more instances from bulk pool

Switch to the member account console, and repeat step 4 but only launch one more m5.4xlarge instance. With the resource release on step 5, the member account successfully gets the third instance. The compute capacity is coming from the bulk pool, so when you check the Details page of the third instance, the Capacity Reservation ID is blank.

6.	Launching more instances from bulk pool

Cleaning up

  1. Terminate the three EC2 instances in the member account.
  2. Unshare the Capacity Reservation in RAM and delete it in the admin account.
  3. Unshare the Outpost ID, LGW and CoIP in RAM to get the Outposts resources back to the admin.

Conclusion

In this blog post, the admin can dynamically adjust compute capacity allocation on Outposts rack for purpose-built member accounts with an AWS Organization. The bulk pool offers an option to fulfill flexibility of resource planning among member accounts if the maximum instance need per member account is unpredictable. By contrast, if resource forecast is feasible, the admin can revise both the reservation pool and the attic to set a hard limit per member account without using the bulk pool. In addition, I only showed you how to create a Capacity Reservation of m5.4xlarge for the member account, but in fact an admin can create multiple Capacity Reservations with various instance types or sizes for a member account to customize the reservation pool. Lastly, if you would like to securely share Amazon S3 on Outposts with your member accounts, check out Amazon S3 on Outposts now supports sharing across multiple accounts to get more details.

Deploying Zabbix in Amazon Web Services cloud platform

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/deploying-zabbix-in-amazon-web-services-cloud-platform/17283/

With the rapid evolution and proliferation of different cloud services, many organizations have decided to move parts of their infrastructures from on-prem to cloud. As an essential part of your infrastructure, Zabbix is no exception – you always have the option to either deploy Zabbix on-prem or select from one of the many supported cloud service providers to deploy your Zabbix Server or Zabbix Proxy on.

In this blog post, let’s look at how we can quickly deploy Zabbix Server and Zabbix Proxy nodes in Amazon Web Services cloud platform.

Deploying the Zabbix Server in AWS

Let’s begin with the Zabbix download page. Under the Zabbix Cloud Images section, select the AWS cloud vendor and then the Cloud Image you wish to deploy. Let’s start with Zabbix Server 5.0 with MySQL DB backend and Nginx Web server backend for our frontend.

Next, we will be redirected to the AWS marketplace, where we will have to subscribe to the Zabbix Server 5.0 image.

Once we have subscribed to the Zabbix Server image, we can continue with the deployment configuration.

Next, we must select our Region, Zabbix minor version (usually the latest available), and the Fulfillment option. Once that is done, we can finalize the launch configuration.

Select the preferred Launch option, EC2 Instance Type, VPC, and Subent settings on the Launch page.

Next – We have to select or create a security group.

We also have to select or generate EC 2 Key pair – make sure to save your private key in a safe location!

Note that creating a security group based on seller settings does not guarantee that the group will have an inbound SSH access rule! Make sure to double-check the security group and manually add the SSH inbound rule if it hasn’t yet been added. We will need to access this instance via SSH to obtain the initial frontend login credentials!

Once you click on the Launch button, the deployment process for your Zabbix application will be initiated.

Accessing the application

Let’s open up the Instances section and open our newly deployed Zabbix instance

We can access the Zabbix Frontend by opening the Public IPv4 address or Public IPv4 DNS of the Zabbix instance

Note that the Zabbix frontend password is still unknown to us. Recall how I mention that we will need to access the instance via SSH to obtain the frontend password. Let’s do so now.

Write down the login credentials and use them to log in to the Zabbix instance.

Accessing the database

In case we wish to access the Zabbix database backend, we can do so from the command line. Zabbix database can be accessed by using the root user. By default, it can be used without a password.

The MySQL root password is stored in /root/.my.cnf configuration file.

Modifying the Zabbix Frontend timezone

By default, the Zabbix frontend uses the “UTC” timezone. If you need to change it, edit php_value[date.timezone] PHP variable in /etc/php-fpm.d/zabbix.conf and restart php-fpm process:

systemctl restart php-fpm

Zabbix proxy

If you wish to deploy a Zabbix proxy instance in your AWS cloud, the deployment steps are very much the same. Most likely, you will still require SSH access if you wish to perform some configuration changes in the Zabbix proxy configuration file.

Note, that by default, the SQLite proxy database is stored in /tmp/zabbix_proxy.sqlite3

As always, don’t forget the point the proxy at your Zabbix server instance by modifying the Server parameter in the Zabbix proxy configuration file, located in /etc/zabbix/zabbix_proxy.conf

And that’s all! With just a few clicks, we are able to deploy a fully functional Zabbix instance or a small Zabbix proxy to distribute or scale our monitoring. Don’t forget that AWS is just one of the many cloud service providers you can use with Official Zabbix images. If you have any questions about the AWS deployment – you are very much encouraged to leave a comment under this blog post.

If you wish to learn more about the Zabbix Monitoring solution, check out the official documentation https://www.zabbix.com/documentation/current/manual/quickstart.

Advanced Zabbix API – 5 API use cases to improve your API workfows

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/advanced-zabbix-api-5-api-use-cases-to-improve-your-api-workfows/16801/

As your monitoring infrastructures evolve, you might hit a point when there’s no avoiding using the Zabbix API. The Zabbix API can be used to automate a particular part of your day-to-day workflow, troubleshoot your monitoring or to simply analyze or get statistics about a specific set of entities.

In this blog post, we will take a look at some of the more advanced API methods and specific method parameters and learn how they can be used to improve your API workflows.

1. Count entities with CountOutput

Let’s start with gathering some statistics. Let’s say you have to count the number of some matching entities – here we can use the CountOutput parameter. For a more advanced use case – what if we have to count the number of events for some time period? Let’s combine countOutput with time_from and time_till (in unixtime) and get the number of events created for the month of November. Let’s get all of the events for the month of November that have the Disaster severity:

{
"jsonrpc": "2.0",
"method": "event.get",
"params": {
"output": "extend",
"time_from": "1635717600",
"time_till": "1638223200",
"severities": "5",
"countOutput": "true"
},
"auth": "xxxxxx",
"id": 1
}

2. Use API to perform Configuration export/import

Next, let’s take a look at how we can use the configuration.export method to export one of our templates in yaml:

{
"jsonrpc": "2.0",
"method": "configuration.export",
"params": {
"options": {
"templates": [
"10001"
]
},
"format": "yaml"
},
"auth": "xxxxxx",
"id": 1
}

Now let’s copy and paste the result of the export and import the template into another environment. It’s extremely important to remember that for this method to work exactly as we intend to, we need to include the parameters that specify the behavior of particular entities contained in the configuration string, such as items/value maps/templates, etc. For example, if I exclude the templates parameter here, no templates will be imported.

{
"jsonrpc": "2.0",
"method": "configuration.import",
"params": {
"format": "yaml",
"rules": {
"valueMaps": {
"createMissing": true,
"updateExisting": true
},
"items": {
"createMissing": true,
"updateExisting": true,
"deleteMissing": true
},
"templates": {
"createMissing": true,
"updateExisting": true
},

"templateLinkage": {
"createMissing": true
}
},
"source": "zabbix_export:\n version: '5.4'\n date: '2021-11-13T09:31:29Z'\n groups:\n -\n uuid: 846977d1dfed4968bc5f8bdb363285bc\n name: 'Templates/Operating systems'\n templates:\n -\n uuid: e2307c94f1744af7a8f1f458a67af424\n template: 'Linux by Zabbix agent active'\n name: 'Linux by Zabbix agent active'\n 
...
},
"auth": "xxxxxx",
"id": 1
}

3. Expand trigger functions and macros with expand parameters

Using trigger.get to obtain information about a particular set of triggers is a relatively common practice. One particular caveat that we have to consider is that by default macros in trigger name, expression or descriptions are not expanded. To expand the available macros we need to use the expand parameters:

{
"jsonrpc": "2.0",
"method": "trigger.get",
"params": {
"triggerids": "18135",
"output": "extend",
"expandExpression":"1",
"selectFunctions": "extend"
},
"auth": "xxxxxx",
"id": 1
}

4. Obtaining additional LLD information for a discovered item

If we wish to display additional LLD information for a discovered entity, in this case – an item, we can use the selectDiscoveryRule and selectItemDiscovery parameters.
While selectDiscoveryRule will provide the ID of the LLD rule that created the item, selectItemDiscovery can point us at the parent item prototype id from which the item was created, last discovery time, item prototype key, and more.

The example below will return the item details and will also provide the LLD rule and Item prototype IDs, the time when the lost item will be deleted and the last time the item was discovered:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"itemids":"36717",
"selectDiscoveryRule":"1",
"selectItemDiscovery":["lastcheck","ts_delete","parent_itemid"]
}, "auth":"xxxxxx",
"id": 1
}

5. Searching through the matched entities with search parameters

Zabbix API provides a couple of standard parameters for performing a search. With search parameter, we can search string or text fields and try to find objects based on a single or multiple entries. searchByAny parameter is capable of extending the search – if you set this as true, we will search by ANY of the criteria in the search array, instead of trying to find an entity that matches ALL of them (default behavior).

The following API call will find items that match agent and Zabbix keys on a particular template:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"output": "extend",
"templateids": "10001",
"search": {
"key_": ["agent.","zabbix"]
},
"searchByAny":"true",
"sortfield": "name"
},
"auth": "xxxxxx",
"id": 1
}

Feel free to take the above examples, change them around so they fit your use case and you should be able to quite easily implement them in your environment. There are many other use cases that we might potentially cover down the line – if you have a specific API use case that you wish for us to cover, feel free to leave a comment under this post and we just might cover it in one of the upcoming blog posts!

Simplifying Zabbix API workflows with named Zabbix API tokens

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/simplifying-zabbix-api-workflows-with-named-zabbix-api-tokens/16653/

Zabbix API enables you to collect any and all information from your Zabbix instance by using a multitude of API methods. You can even utilize Zabbix API calls in your HTTP items. For example, this can be used to monitor the number of particular sets of metrics and visualize their growth over time. With named Zabbix API tokens, such use cases are a lot more simple to implement.

Before Zabbix 5.4 we had to perform the user.login API call to obtain the authentication token. Once the user session was closed, we had to relog, obtain the new authentication token and use it in the subsequent API calls.

With the pre-defined named Zabbix API tokens, you don’t have to constantly check if the authentication token needs to be updated. Starting from Zabbix 5.4 you can simply create a new named Zabbix API token with an expiration date and use it in your API calls.

Creating a new named Zabbix API token

The Zabbix API token creation process is extremely simple. All you have to do is navigate to Administration – General – API tokens and create a new API token. The named API tokens are created for a particular user and can have an optional expiration date and time – otherwise, the tokens are defined without an expiry date.

You can create a named API token in the API tokens section, under Administration – General

Once the Token has been created, make sure to store it somewhere safe, since you won’t be able to recover it afterward. If the token is lost – you will have to recreate it.

Make sure to store the auth token!

Don’t forget, that when defining a role for the particular API user, we can restrict which API methods this user has access to.

Simplifying API tasks with the named API token

There are many different use cases where you could implement Zabbix API calls to collect some additional information. For this blog post, I will create an HTTP item that uses item.get API call to monitor the number of unsupported items.

To achieve that, I will create an HTTP item on a host (This can be the default Zabbix server host or a host dedicated to collecting metrics via Zabbix API calls) and provide the API call in the request body. Since the named API token now provides a static authentication token until it expires, I can simply use it in my API call without the need to constantly keep it updated.

An HTTP agent item that uses a Zabbix API call in its request body

{
    "jsonrpc": "2.0",
    "method": "item.get",
    "params": {
			"countOutput":"1",
			 "filter": {
 "state": "1"
 }
    },
    "id": 2,
    "auth": "b72be8cf163438aacc5afa40a112155e307c3548ae63bd97b87ff4e98b1f7657"
}

HTTP item request body, which returns a count of unsupported items

I will also use regular expression preprocessing to obtain the numeric value from the API call result – otherwise, we won’t be able to graph our value or calculate trends for it.

Regular expression preprocessing step to obtain a numeric value from our Zabbix API call result

Utilizing Zabbix API scripts in Actions

In one of our previous blog posts, we covered resolving problems automatically with the event.acknowledge API method. The logic defined in the blog post was quite complex since we needed to keep an eye out for the authentication tokens and use a custom script to keep them up to date. With named Zabbix API tokens, this use case is a lot more simple.

All I have to do is create an Action operation script containing my API call and pass it to an action operation.

Action operation script that invokes Zabbix event.acknowledge API method

curl -sk -X POST -H "Content-Type: application/json" -d "
{
\"jsonrpc\": \"2.0\",
\"method\": \"event.acknowledge\",
\"params\": {
\"eventids\": \"{EVENT.ID}\",
\"action\": 1,
\"message\": \"Problem resolved.\"
},
\"auth\": \"<Place your authentication token here>",
\"id\": 2
}" <Place your Zabbix API endpoint URL here>

Problem remediation script example

Now my problems will get closed automatically after the time period which I have defined in my action.

Action operation which runs our event.acknowledge Zabbix API script

These are but a few examples that we can now achieve by using API tokens. A lot of information can be obtained and filtered in a unique way via Zabbix API, thus providing you with a granular analysis of your monitored environment. If you have recently upgraded to Zabbix 5.4 or plan to upgrade to Zabbix 6.0 LTS in the future, I would recommend implementing named Zabbix API tokens to simplify your day-to-day workflow and consider the possibilities that this new feature opens up for you.

If you have any questions or if you wish to share your particular use case for data collection or task automation with Zabbix API – feel free to share them in the comments section below!

Combining preprocessing with storing only trend data for high-frequency monitoring

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/combining-preprocessing-with-storing-only-trend-data-for-high-frequency-monitoring/16568/

There are many design choices to consider when we build our monitoring environment for high-frequency monitoring. How to minimize performance impact? What are the data retention policies with storage space in mind? What are the available out-of-the-box features to solve these potential problems?
In this blog post, we will discuss when you should use preprocessing and when it is better to use the “Do not keep history” option for your metrics, and what are the pros and cons for both of these approaches.

Throttling and other preprocessing steps

We’ve discussed throttling previously as the go-to approach for high-frequency monitoring. Indeed, with throttling, you can discard repeated values and do so with a heartbeat. This is extremely useful with metrics that come as discreet values – services states, network port statuses, and so on.
Example of throttling with and without heartbeat
In addition, since starting from Zabbix 4.2 all preprocessing is also performed by Zabbix proxies. This means we can discard the repeated values before they reach the Zabbix server. This can help us both with the performance (fewer metrics to insert in the Zabbix server DB) and reduce the DB size (Fewer metrics stored in the DB. This also helps with improving overall Zabbix performance)
There are a few caveats with this approach – since metrics get discarded before they reach the Zabbix server, the triggers will not react on these metrics (This is where having a heartbeat is useful) and, since trends are calculated by Zabbix server based on the received history data, there could be a lack of trend information for these metrics. Keep in mind that this applies not only to throttling preprocessing rules – any preprocessing can be done on the proxy and any preprocessing rules can be used to transform your data.

Understanding “Do not keep history” option

The behavior of “Do not keep history” which we can define when configuring an item is a bit different though. If we collect an item by a Proxy and configure the item with “Do not keep history”, the history won’t always get discarded! There are a couple of reasons for this.
  • First off, let’s not forget that some of our values can populate host inventory! If the particular item is configured to populate an inventory field – it will be forwarded to the Zabbix server, but it will not get stored in the history tables.
  • If the item does not populate an inventory field – the text data such as character, log and text will indeed get discarded before reaching the Zabbix server, but Numeric values – both float and integer, will get forwarded to the server. The reason for that is deriving trend information from the numeric values. Mind that the numeric data will still not get stored in the history tables, only trends will be available for these items.

Note: This behavior has been properly implemented starting from Zabbix 5.2. See ZBX-17548

Setting the “Do not keep history” option for an item

Using trend functions with high-frequency monitoring

With the specifics of “Do not keep history” in mind, we should now recall that starting from Zabbix 5.2 we have trend functions available at our disposal!
History functions such as trendavg, trendcount, trendmax, trendmin, trendsum allow us to perform different kinds of trend calculations – from counting the number of trend values to retrieving min/max/avg trend values for a time period.
This means, that if we require only the metric trend for specific time periods (hours, days, weeks, etc) we can use these trend functions together with “Do not keep history” option, thus discarding unnecessary data and improving our Zabbix server performance!
There are two approaches two using trend functions:
  • If you wish to collect and display the trend data, you need to create the item which will collect the metrics (say, a net.if.in Agent item for collecting incoming network traffic) and then create a separate calculated item that uses the trend function to calculate the avg/min/max value for the trend over a time period. The original item can then have “Do not keep history” option selected for it.

trendavg item for calculating hourly trends from the net.if.in[ifHCInOctets.5] item

 

  • If you wish to simply define triggers and react on long-term trends and are not required to collect the trend values, then we can skip the creation of the calculated item and simply use the trend function on the original item in the trigger.

This trigger fires if the hourly average trend value exceeds 100M.
Note: In this case only the original item is required.

By combining these approaches in our environment – using preprocessing when we wish to discard or transform the data and also implementing opting out of storing the history data, whenever this is appropriate, we can minimize the performance impact on our Zabbix instance. Add a layer of distributed Zabbix proxies on top of this and you can truly achieve a large, scalable Zabbix infrastructure optimized for high-frequency ingestion and processing of your data.

Keeping your Zabbix templates up to date

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/keeping-your-zabbix-templates-up-to-date/16412/

Have you recently updated your Zabbix environment but are still wondering – why haven’t the templates been updated? Where can I obtain the latest official Zabbix templates, and how should I update them? In this blog post, we will discuss why it is vital to keep your templates up to date and how we the template update process looks like.

Updating your templates

“Will updating Zabbix also update my templates?” is a question that I receive quite often. The answer to that question is – no changes are made to your templates whenever you update your Zabbix instance – be it a minor or a major update. The reasoning behind that is quite simple – we always recommend that you tune the out-of-the-box templates as per your particular requirements. That may consist of changing update intervals, disabling items/triggers, or even changing the existing trigger expressions or adding whole new entities to the template.

This is where the current behavior with template updates starts to make more sense. If Zabbix were to automatically update your templates, there could be a chance of overwriting your custom changes and could potentially disrupt the monitoring of your environment. That is something that we definitely wish to avoid.

The question still stands – Then how am I supposed to update my templates?

The answer – you can find the latest official Zabbix templates on our official git page – https://git.zabbix.com/

First, navigate to the Zabbix repository and open the Templates folder. Then, select the release branch that matches your Zabbix instance version. Here you can find all of our official templates and also the official media types under the media folder. All you have to do now is open the template up and download the raw template file.

Zabbix 5.4 release git templates/db folder

Once that is done, we can import the template into our Zabbix environment

Template template_db_oracle_agent2 import

Don’t forget to back up your existing templates, especially if you have made some custom changes to them! Ideally – add a prefix to their names, so the new and old templates can live side by side, and you can then manually copy over the changes from the latest official template to your custom template.

The benefits of keeping templates up to date

But what is the point of updating your templates – what do you get out of it? Well, that varies on the specific fixes or improvements we make to the particular template over time. Sometimes the updated template will provide improved trigger expressions or preprocessing logic. Other times the updated template will provide extra value to your monitoring with completely new items and triggers. In the case of Webhook media types – the updates usually contain fixes or improvements for some particular use cases, for example – fixing a compatibility issue for a specific OS.

You can always track these changes either in the release notes of a particular Zabbix version or by looking up a specific bug or a feature request in our bug tracker – https://support.zabbix.com

Some of the template changes in Zabbix 5.4 major update

Zabbix self-monitoring templates

Another key aspect of why it’s important to keep your templates up to date is so you can implement the changes made to the Zabbix self-monitoring templates. For example, if we compare Zabbix 5.0 to Zabbix 5.4, there are multiple new Zabbix processes and caches added to Zabbix 5.4, such as report writer/manager process, availability manager process, trend function caches, and other new components.

Zabbix server health template version 5.0 and 5.4 difference in the number of entities

So, if you update from Zabbix 5.0 to 5.4 (or Zabbix 6.0 if you’re sticking with LTS versions), you WILL NOT be monitoring these processes and caches if you don’t update your Zabbix server and Zabbix proxy templates to the current Zabbix versions. This means that you will be completely unaware of any potential performance issues related to these processes or caches.

Tracking template changes

With Zabbix 5.4 and later, you will notice some great improvements to the template import process. If you’re wondering what has changed when comparing an older template version with a newer one, you will now be able to see the changes during the import process. The added and removed elements will be highlighted in red or green accordingly.

Preview of the changes made during the template import process

How often should you update the templates? Ideally, you would follow the Zabbix update release notes and take note of any changes made to the templates that are of use in your environment. At the very least – definitely check for changes in the self-monitoring templates when moving to a newer major version of Zabbix. Otherwise, you risk losing track of potential issues in your Zabbix environment.

Now that you know the answer to the question “How can I update my Zabbix templates?” try and think back to when you last updated your Zabbix instance to a new version – did you also check the official templates for updates? If not, then don’t hesitate and visit https://git.zabbix.com/ to find the latest templates for your Zabbix version. Chances are that you will be pleasantly surprised with a set of new and updated templates for your monitoring endpoints and new webhook media types to help you integrate Zabbix with your existing systems.

Copy large datasets from Google Cloud Storage to Amazon S3 using Amazon EMR

Post Syndicated from Andrew Lee original https://aws.amazon.com/blogs/big-data/copy-large-datasets-from-google-cloud-storage-to-amazon-s3-using-amazon-emr/

Many organizations have data sitting in various data sources in a variety of formats. Even though data is a critical component of decision-making, for many organizations this data is spread across multiple public clouds. Organizations are looking for tools that make it easy and cost-effective to copy large datasets across cloud vendors. With Amazon EMR and the Hadoop file copy tools Apache DistCp and S3DistCp, we can migrate large datasets from Google Cloud Storage (GCS) to Amazon Simple Storage Service (Amazon S3).

Apache DistCp is an open-source tool for Hadoop clusters that you can use to perform data transfers and inter-cluster or intra-cluster file transfers. AWS provides an extension of that tool called S3DistCp, which is optimized to work with Amazon S3. Both these tools use Hadoop MapReduce to parallelize the copy of files and directories in a distributed manner. Data migration between GCS and Amazon S3 is possible by utilizing Hadoop’s native support for S3 object storage and using a Google-provided Hadoop connector for GCS. This post demonstrates how to configure an EMR cluster for DistCp and S3DistCP, goes over the settings and parameters for both tools, performs a copy of a test 9.4 TB dataset, and compares the performance of the copy.

Prerequisites

The following are the prerequisites for configuring the EMR cluster:

  1. Install the AWS Command Line Interface (AWS CLI) on your computer or server. For instructions, see Installing, updating, and uninstalling the AWS CLI.
  2. Create an Amazon Elastic Compute Cloud (Amazon EC2) key pair for SSH access to your EMR nodes. For instructions, see Create a key pair using Amazon EC2.
  3. Create an S3 bucket to store the configuration files, bootstrap shell script, and the GCS connector JAR file. Make sure that you create a bucket in the same Region as where you plan to launch your EMR cluster.
  4. Create a shell script (sh) to copy the GCS connector JAR file and the Google Cloud Platform (GCP) credentials to the EMR cluster’s local storage during the bootstrapping phase. Upload the shell script to your bucket location: s3://<S3 BUCKET>/copygcsjar.sh. The following is an example shell script:
#!/bin/bash
sudo aws s3 cp s3://<S3 BUCKET>/gcs-connector-hadoop3-latest.jar /tmp/gcs-connector-hadoop3-latest.jar
sudo aws s3 cp s3://<S3 BUCKET>/gcs.json /tmp/gcs.json
  1. Download the GCS connector JAR file for Hadoop 3.x (if using a different version, you need to find the JAR file for your version) to allow reading of files from GCS.
  2. Upload the file to s3://<S3 BUCKET>/gcs-connector-hadoop3-latest.jar.
  3. Create GCP credentials for a service account that has access to the source GCS bucket. The credentials should be named json and be in JSON format.
  4. Upload the key to s3://<S3 BUCKET>/gcs.json. The following is a sample key:
{
   "type":"service_account",
   "project_id":"project-id",
   "private_key_id":"key-id",
   "private_key":"-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
   "client_email":"service-account-email",
   "client_id":"client-id",
   "auth_uri":"https://accounts.google.com/o/oauth2/auth",
   "token_uri":"https://accounts.google.com/o/oauth2/token",
   "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
   "client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}
  1. Create a JSON file named gcsconfiguration.json to enable the GCS connector in Amazon EMR. Make sure the file is in the same directory as where you plan to run your AWS CLI commands. The following is an example configuration file:
[
   {
      "Classification":"core-site",
      "Properties":{
         "fs.AbstractFileSystem.gs.impl":"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
         "google.cloud.auth.service.account.enable":"true",
         "google.cloud.auth.service.account.json.keyfile":"/tmp/gcs.json",
         "fs.gs.status.parallel.enable":"true"
      }
   },
   {
      "Classification":"hadoop-env",
      "Configurations":[
         {
            "Classification":"export",
            "Properties":{
               "HADOOP_USER_CLASSPATH_FIRST":"true",
               "HADOOP_CLASSPATH":"$HADOOP_CLASSPATH:/tmp/gcs-connector-hadoop3-latest.jar"
            }
         }
      ]
   },
   {
      "Classification":"mapred-site",
      "Properties":{
         "mapreduce.application.classpath":"/tmp/gcs-connector-hadoop3-latest.jar"
      }
   }
]

Launch and configure Amazon EMR

For our test dataset, we start with a basic cluster consisting of one primary node and four core nodes for a total of five c5n.xlarge instances. You should iterate on your copy workload by adding more core nodes and check on your copy job timings in order to determine the proper cluster sizing for your dataset.

  1. We use the AWS CLI to launch and configure our EMR cluster (see the following basic create-cluster command):
aws emr create-cluster \
--name "My First EMR Cluster" \
--release-label emr-6.3.0 \
--applications Name=Hadoop \
--ec2-attributes KeyName=myEMRKeyPairName \
--instance-type c5n.xlarge \
--instance-count 5 \
--use-default-roles

  1. Create a custom bootstrap action to be performed at cluster creation to copy the GCS connector JAR file and GCP credentials to the EMR cluster’s local storage. You can add the following parameter to the create-cluster command to configure your custom bootstrap action:
--bootstrap-actions Path="s3://<S3 BUCKET>/copygcsjar.sh"

Refer to Create bootstrap actions to install additional software for more details about this step.

  1. To override the default configurations for your cluster, you need to supply a configuration object. You can add the following parameter to the create-cluster command to specify the configuration object:
--configurations file://gcsconfiguration.json

Refer to Configure applications when you create a cluster for more details on how to supply this object when creating the cluster.

Putting it all together, the following code is an example of a command to launch and configure an EMR cluster that can perform migrations from GCS to Amazon S3:

aws emr create-cluster \
--name "My First EMR Cluster" \
--release-label emr-6.3.0 \
--applications Name=Hadoop \
--ec2-attributes KeyName=myEMRKeyPairName \
--instance-type c5n.xlarge \
--instance-count 5 \
--use-default-roles \
--bootstrap-actions Path="s3:///copygcsjar.sh" \
--configurations file://gcsconfiguration.json

Submit S3DistCp or DistCp as a step to an EMR cluster

You can run the S3DistCp or DistCp tool in several ways.

When the cluster is up and running, you can SSH to the primary node and run the command in a terminal window, as mentioned in this post.

You can also start the job as part of the cluster launch. After the job finishes, the cluster can either continue running or be stopped. You can do this by submitting a step directly via the AWS Management Console when creating a cluster. Provide the following details:

  • Step type – Custom JAR
  • NameS3DistCp Step
  • JAR locationcommand-runner.jar
  • Argumentss3-dist-cp --src=gs://<GCS BUCKET>/ --dest=s3://<S3 BUCKET>/
  • Action of failure – Continue

We can always submit a new step to the existing cluster. The syntax here is slightly different than in previous examples. We separate arguments by commas. In the case of a complex pattern, we shield the whole step option with single quotation marks:

aws emr add-steps \
--cluster-id j-ABC123456789Z \
--steps 'Name=LoadData,Jar=command-runner.jar,ActionOnFailure=CONTINUE,Type=CUSTOM_JAR,Args=s3-dist-cp,--src=gs://<GCS BUCKET>/, --dest=s3://<S3 BUCKET>/'

DistCp settings and parameters

In this section, we optimize the cluster copy throughput by adjusting the number of maps or reducers and other related settings.

Memory settings

We use the following memory settings:

-Dmapreduce.map.memory.mb=1536
-Dyarn.app.mapreduce.am.resource.mb=1536

Both parameters determine the size of the map containers that are used to parallelize the transfer. Setting this value in line with the cluster resources and the number of maps defined is key to ensuring efficient memory usage. You can calculate the number of launched containers by using the following formula:

Total number of launched containers = Total memory of cluster / Map container memory

Dynamic strategy settings

We use the following dynamic strategy settings:

-Ddistcp.dynamic.max.chunks.tolerable=4000
-Ddistcp.dynamic.split.ratio=3 -strategy dynamic

The dynamic strategy settings determine how DistCp splits up the copy task into dynamic chunk files. Each of these chunks is a subset of the source file listing. The map containers then draw from this pool of chunks. If a container finishes early, it can get another unit of work. This makes sure that containers finish the copy job faster and perform more work than slower containers. The two tunable settings are split ratio and max chunks tolerable. The split ratio determines how many chunks are created from the number of maps. The max chunks tolerable setting determines the maximum number of chunks to allow. The setting is determined by the ratio and the number of maps defined:

Number of chunks = Split ratio * Number of maps
Max chunks tolerable must be > Number of chunks

Map settings

We use the following map setting:

-m 640

This determines the number of map containers to launch.

List status settings

We use the following list status setting:

-numListstatusThreads 15

This determines the number of threads to perform the file listing of the source GCS bucket.

Sample command

The following is a sample command when running with 96 core or task nodes in the EMR cluster:

hadoop distcp
-Dmapreduce.map.memory.mb=1536 \
-Dyarn.app.mapreduce.am.resource.mb=1536 \
-Ddistcp.dynamic.max.chunks.tolerable=4000 \
-Ddistcp.dynamic.split.ratio=3 \
-strategy dynamic \
-update \
-m 640 \
-numListstatusThreads 15 \
gs://<GCS BUCKET>/ s3://<S3 BUCKET>/

S3DistCp settings and parameters

When running large copies from GCS using S3DistCP, make sure you have the parameter fs.gs.status.parallel.enable (also shown earlier in the sample Amazon EMR application configuration object) set in core-site.xml. This helps parallelize getFileStatus and listStatus methods to reduce latency associated with file listing. You can also adjust the number of reducers to maximize your cluster utilization. The following is a sample command when running with 24 core or task nodes in the EMR cluster:

s3-dist-cp -Dmapreduce.job.reduces=48 --src=gs://<GCS BUCKET>/--dest=s3://<S3 BUCKET>/

Testing and performance

To test the performance of DistCp with S3DistCp, we used a test dataset of 9.4 TB (157,000 files) stored in a multi-Region GCS bucket. Both the EMR cluster and S3 bucket were located in us-west-2. The number of core nodes that we used in our testing varied from 24–120.

The following are the results of the DistCp test:

  • Workload – 9.4 TB and 157,098 files
  • Instance types – 1x c5n.4xlarge (primary), c5n.xlarge (core)
Nodes Throughput Transfer Time Maps
24 1.5GB/s 100 mins 168
48 2.9GB/s 53 mins 336
96 4.4GB/s 35 mins 640
120 5.4GB/s 29 mins 840

The following are the results of the S3DistCp test:

  • Workload – 9.4 TB and 157,098 files
  • Instance types – 1x c5n.4xlarge (primary), c5n.xlarge (core)
Nodes Throughput Transfer Time Reducers
24 1.9GB/s 82 mins 48
48 3.4GB/s 45 mins 120
96 5.0GB/s 31 mins 240
120 5.8GB/s 27 mins 240

The results show that S3DistCP performed slightly better than DistCP for our test dataset. In terms of node count, we stopped at 120 nodes because we were satisfied with the performance of the copy. Increasing nodes might yield better performance if required for your dataset. You should iterate through your node counts to determine the proper number for your dataset.

Using Spot Instances for task nodes

Amazon EMR supports the capacity-optimized allocation strategy for EC2 Spot Instances for launching Spot Instances from the most available Spot Instance capacity pools by analyzing capacity metrics in real time. You can now specify up to 15 instance types in your EMR task instance fleet configuration. For more information, see Optimizing Amazon EMR for resilience and cost with capacity-optimized Spot Instances.

Clean up

Make sure to delete the cluster when the copy job is complete unless the copy job was a step at the cluster launch and the cluster was set up to stop automatically after the completion of the copy job.

Conclusion

In this post, we showed how you can copy large datasets from GCS to Amazon S3 using an EMR cluster and two Hadoop file copy tools: DistCp and S3DistCp.

We also compared the performance of DistCp with S3DistCp with a test dataset stored in a multi-Region GCS bucket. As a follow-up to this post, we will run the same test on Graviton instances to compare the performance/cost of the latest x86 based instances vs. Graviton 2 instances.

You should conduct your own tests to evaluate both tools and find the best one for your specific dataset. Try copying a dataset using this solution and let us know your experience by submitting a comment or starting a new thread on one of our forums.


About the Authors

Hammad Ausaf is a Sr Solutions Architect in the M&E space. He is a passionate builder and strives to provide the best solutions to AWS customers.

Andrew Lee is a Solutions Architect on the Snap Account, and is based in Los Angeles, CA.

Back to the future with Zabbix

Post Syndicated from TimSmit original https://blog.zabbix.com/back-to-the-future-with-zabbix/15701/

In this blog post, we would like to show you a new theme that might seem a little familiar to you. Let’s take a little nostalgic trip down memory lane and take a look at this special frontend theme for Zabbix 5.4 reminiscent of Zabbix 1.8.

How it looks

Although the old Zabbix 1.8 design understandably has an outdated look, the colors have a nice feel to them, especially the old shade of blue.

The Zabbix 5.4 design is up to date with current standards and has a modern look and feel to it. It is meant to be simplistic with a lot of white and barely any color except for the navigation bar and widgets like the problems and availability.

My theme combines the two. It is up to date with modern standards, and it has a trusted feel to it with the old color scheme. The old color scheme gives us a nostalgic look, which will hopefully bring some joy both to the veterans and newcomers to Zabbix!

How it works

The way I made this was fairly simple. I copied a css file from the styles folder, placed it in the same folder, and renamed it. I also put the image used to change the appearance into the same folder so it is easy to find.

blue-theme.css
dark-theme.css
hc-dark.css
hc-dark.css
oldisnew.css
table_head2.gif

 

To figure out which changes should be made, I had the Zabbix GUI open in a web browser with the element inspect function. in the css file I would find the same tag/name/class/etc as in the web browser and change it to look like the appearance of version 1.8.

.menu-main > li {
line-height: 16px; }
.menu-main > li.is-selected > a {
background: url(../styles/table_head2.gif) repeat-x top left;
border-left-color: #87d1ff;
color: #ffffff; }
.menu-main > li.is-expanded > a, .menu-main > li.is-expanded > a:focus {
border-left-color: transparent;
color: #ffffff; }
.menu-main > li:not(.is-expanded) .submenu {
max-height: 0 !important; }
.menu-main > li > a {
background:url(../styles/table_head2.gif) repeat-x top left;
color: #ffffff; }

For example here is where I changed the background of the dropdowns from the side menu into the image from Zabbix 1.8 which gives the menu an appealing new look.

For the theme to show up in the User settings it uses an add-on for the APP.php.

class APP extends ZBase {
   public static function getThemes() {
   return array_merge(parent::getThemes(),
   ['oldisnew' => _('Old is New')]);
}
}

The add-on is included in our Github Repository, you can copy and paste it into the APP.php file.

I hope you like the theme. It is available for download at the Opensource ICT Solutions Github. Just follow the link below:

https://github.com/OpensourceICTSolutions/zabbix-5-old-version-1_8-theme .

The Github contains an up-to-date Readme for installation, but, here is a short explanation on how to install it:

1. Navigate to the link above.
2. Download the three files: (oldisnew.css),(tablehead2.gif) and (APP_add_on.text).
3. On your Zabbix server CLI there should be a directory called /usr/share/zabbix/assets/styles/. Put the (oldsinew.css) and (tablehead2.gif) files here.
4. For (APP_add_on.txt) add the text to the APP.php located at the directory /usr/share/zabbix/include/classes/core/ inside of (class APP extends ZBase). (this allows you to actually see the theme inside of the dropdown.)
5. Now, at the Zabbix GUI navigate to Profile under User settings and change the theme.
6. Enjoy your new theme!

Monitoring MongoDB nodes and clusters with Zabbix

Post Syndicated from Dmitry Lambert original https://blog.zabbix.com/monitoring-mongodb-nodes-and-clusters-with-zabbix/16031/

Zabbix Agent 2 enables our users to monitor a whole set of new systems with minimal configuration required on the monitored systems. Forget about writing custom monitoring scripts, deploying additional packages, or configuring ODBC. A great use-case for Zabbix Agent 2  is monitoring one of the most popular NoSQL DB backends – MongoDB. Below, you can read a detailed description and step-by-step guide through the use case or refer to the video available here.

Zabbix MongoDB template

For this example, we will be using Zabbix version 5.4, but MongoDB monitoring by Zabbix Agent 2 is supported starting from version 5.0. If you have a fresh deployment of Zabbix version 5.0 or newer, you will be able to find the MongoDB template in your ‘Configuration‘ – ‘Templates‘ section.

MongoDB Node and Cluster templates

On the other hand, if you have an instance that you deployed before the release of Zabbix 5.0 and then upgraded to Zabbix 5.0 or newer, you will have to import the template manually from our git page. Let’s remember that Zabbix DOES NOT apply new templates or modify existing templates during an upgrade. Therefore, newly released templates have to be IMPORTED MANUALLY!

We can see that we have two MongoDB templates – ‘MongoDB cluster by Zabbix Agent 2’ and ‘MongoDB node by Zabbix agent 2’. Depending on your MongoDB setup – individual nodes or a cluster, apply the corresponding template. Note that the MongoDB cluster template can automatically create hosts for your config servers and shards and apply the MongoDB node template on these hosts.

Host prototypes for config servers and shards

Deploying Zabbix Agent 2 on your Host

Since the data collection is done by Zabbix Agent 2, first, let’s deploy Zabbix Agent 2 on our MongoDB node or cluster host. Let’s start with adding the Zabbix 5.4 repository and install the Zabbix Agent 2 via a  package.

Add the Zabbix 5.4 repository:

rpm -Uvh https://repo.zabbix.com/zabbix/5.4/rhel/8/x86_64/zabbix-release-5.4-1.el8.noarch.rpm

Install Zabbix Agent 2:

yum install zabbix-agent2

What if you already have the regular Zabbix Agent running on this machine? In this case, we have two options for how we can proceed. We can simply remove the regular Zabbix Agent and Deploy Zabbix Agent 2. In this case, make sure you make a backup of the Zabbix Agent configuration file and migrate all of the changes to the Zabbix Agent 2 configuration file.

The second option is running both of the Zabbix Agents in parallel. In this case, we need to make sure that both agents – Zabbix Agent and Zabbix Agent 2 are listening on their own specific ports because, by default, both agents are listening for connections on port 10050. This can be configured in the Zabbix Agent configuration file by changing the ‘ListenPort’ parameter.

Don’t forget to specify the ‘Server‘ parameter in the Zabbix Agent 2 configuration file. This parameter should contain your Zabbix Server address or DNS name. By defining it here, you will allow Zabbix Agent 2 to accept the metric poll requests from Zabbix Server.

After you have made the configuration changes in the Zabbix Agent 2 configuration file, don’t forget to restart Zabbix Agent 2 to apply the changes:

systemctl restart zabbix-agent2

Creating a MongoDB user for monitoring

Once the agent has been deployed and configured, you need to ensure that you have a MongoDB database user that we can use for monitoring purposes. Below you can find a brief example of how you can create a MongoDB user:

Access the MongoDB shell:

mongosh

Switch to the MongoDB admin database:

use admin

Create a user with ‘userAdminAnyDatabase‘ permissions:

db.createUser(
... {
..... user: "zabbix_mon",
..... pwd: "zabbix_mon",
..... roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
..... }
... )

The username for the newly created user is ‘zabbix_mon’. The password is also ‘zabbix_mon‘ – feel free to change these as per your security policy.

Creating and configuring a MongoDB host

Next, you need to open your Zabbix frontend and create a new host representing your MongoDB node. You can see that in our example, we called our node ‘MongoDB’ and assigned it to a ‘MongoDB Servers’ host group. You can use more detailed naming in a production environment and use your own host group assignment logic. But remember – a host needs to belong to AT LEAST a single host group! 

Since the metrics are collected by Zabbix Agent 2, you must also create an Agent interface on the host. Zabbix Server will connect to this interface and request the metrics from the Zabbix Agent 2. Define the IP address or DNS name of your MongoDB host, where you previously deployed Zabbix Agent 2. Mind the port – by default, we have port 10050 defined over here, but if you have modified the ‘ListenPort’ parameter in the Zabbix Agent 2 config and changed the value from the default one (10050) to something else, you also need to use the same port number here.

MongoDB host configuration example

Next, navigate to the ‘Templates’ tab and assign the corresponding template – either ‘MongoDB node by Zabbix agent 2’ or ‘MongoDB cluster by Zabbix Agent 2’. In our example, we will assign the MongoDB node template.

Before adding the host, you also need to provide authentication and connection parameters by editing the corresponding User Macros. These User Macros are used by the items that specify which metrics should we be collecting. Essentially, we are forwarding the connectivity and authentication information to Zabbix Agent 2, telling it to use these values when collecting the metrics from our MongoDB instance.

To do this, navigate to the ‘Macros’ tab in the host configuration screen. Then, select ‘Inherited and host macros’ to display macros inherited from the MongoDB template.

We can see a bunch of macros here – some of them are related to trigger thresholds and discovery filters, but what we’re interested in right now are the following macros:

  • {$MONGODB.PASSWORD} –  MongoDB username. For our example, we will set this to zabbix_mon
  • {$MONGODB.USER} – MongoDB password. For our example, we will set this to zabbix_mon
  • {$MONGODB.CONNSTRING} – MongoDB connection string. Specify the MongoDB address and port here to which the Zabbix Agent 2 should connect and perform the metric collection

Now we are ready to add the host. Once the host has been added, we might have to wait for a minute or so before Zabbix begins to monitor the host. This is because Zabbix Server doesn’t instantly pick up the configuration changes. By default, Zabbix Server updates the Configuration Cache once a minute.

Fine-tuning MongoDB monitoring

At this point, we should see a green ZBX Icon next to our MongoDB host.

Data collection on the MongoDB host has started – note the green ‘ZBX’ icon.

This means that the Zabbix Server has successfully connected to our Zabbix Agent 2, and the metric collection has begun. You can now navigate to the ‘Monitoring’ – ‘Latest data’ section, filter the view by your MongoDB host, and you should see all of the collected metrics here.

MongoDB metrics in ‘Monitoring’ – ‘Latest data’

The final task is to tune the MongoDB monitoring on your hosts, collecting only the required metrics. Navigate to ‘Configuration’ –Hosts’, find your MongoDB hosts, and go through the different entity types on the host – items, triggers, discovery rules. See an item that you don’t wish to collect metrics for? Feel free to disable it. Open up the discovery rules – change the update intervals on them or disable the unnecessary ones.

Note: Be careful not to disable master items. Many of the items and discovery rules here are of type ‘Dependent item’ which means, that they require a so-called ‘Master item’. Feel free to read more about dependent items here.

Remember the ‘Macros’ section in the host configuration? Let’s return to it. here we can see some macros which are used in our trigger thresholds, like:

  • {$MONGODB.REPL.LAG.MAX.WARN} – Maximum replication lag in seconds
  • {$MONGODB.CURSOR.OPEN.MAX.WARN} – Maximum number of open cursors

Feel free to change these as per your problem threshold requirements.

One last thing here – we can filter which elements get discovered by our discovery rules. This is once again defined by user macros like:

  • {$MONGODB.LLD.FILTER.DB.MATCHES} – Databases that should be discovered (By default, the value here is ‘.*’, which will match everything)
  • {$MONGODB.LLD.FILTER.DB.NOT_MATCHES} – Databases that should be excluded from the discovery

And that’s it! After some additional tuning has been applied, we are good to go – our MongoDB entities are being discovered, metrics are getting collected, and problem thresholds have been defined. And all of it has been done with the native Zabbix Agent 2 functionality and an out-of-the-box MongoDB template!

Zabbix frontend as a control panel for your devices

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/zabbix-frontend-as-a-control-panel-for-your-devices/15545/

The ability to define and execute scripts on different Zabbix components in different scenarios can be extremely powerful. There are many different use cases where we can execute these scripts – to remediate an issue, forward our alerts to an external system, and much more. In this post, we will cover one of the lesser-known use cases – creating a control panel of sorts in which we can execute different scripts directly from our frontend.

 

Configuration cache

Let’s use two very popular Zabbix runtime commands for our use case –  ‘zabbix_server -R config_cache_reload’ and ‘zabbix_proxy -R config_cache_reload’. These commands can be used to force the Zabbix server and Zabbix proxy components to load the configuration changes on demand.

First, let’s discuss how these commands work:

It all starts with the configuration cache frequency, which is configured for the central Zabbix server. Have a look at the output:

grep CacheUpdateFrequency= /etc/zabbix/zabbix_server.conf

And on the Zabbix proxy side, there is a similar setting. Let’s take a look:

grep ConfigFrequency= /etc/zabbix/zabbix_proxy.conf

With a stock installation we have ‘CacheUpdateFrequency=60‘ for ‘zabbix-server‘ and we have ‘ConfigFrequency=3600‘ for ‘zabbix-proxy‘. This parameter represents how fast the Zabbix component will pick up the configuration changes that we have made in the GUI.

Apart from the frequency, we have also another variable which is: how long it actually takes to run one configuration sync cycle. To find the precise time value, we can use this command:

ps auxww | egrep -o "[s]ynced.*sec"

The output will produce a line like:

synced configuration in 14.295782 sec, idle 60 sec

This means that it takes approximately 14 seconds to load the configuration cache from the database. Then there is a break for the next 60 seconds. After that, the process repeats.

When the monitoring infrastructure gets big, we might need to start using larger values for ‘CacheUpdateFrequency‘ and ‘ConfigFrequency‘. By reducing the configuration reload frequency, we can offload our database. The best possible configuration performance-wise is to install ‘CacheUpdateFrequency=3600‘ in ‘zabbix_server.conf‘ and use ‘ConfigFrequency=3600‘ (it’s the default value) in ‘zabbix_proxy.conf‘.

Some repercussions arise with such a configuration. When we use values that are this large, there will be a delay of one hour until newly created entities are monitored or changes are applied to the existing entities.

Setting up the scripts

I would like to introduce a way we can force the configuration to be reloaded via GUI.
Some prerequisites must be configured:

1) Make sure the  ‘Zabbix server‘ host belongs to the “Zabbix servers” host group.

2) On the server where service ‘zabbix-server‘ runs, install a new sudoers rule:

cd /etc/sudoers.d
echo 'zabbix ALL=(ALL) NOPASSWD: /usr/sbin/zabbix_server -R config_cache_reload' | sudo tee zabbix_server_config_cache_reload
chmod 0440 zabbix_server_config_cache_reload

The sudoers file is required because out of the box the service ‘zabbix-server‘ runs with user ‘zabbix‘ which does not have access to interact with the local system.

3) We will also create Zabbix hosts representing our Zabbix proxies. These hosts must belong to the ‘Zabbix proxies’ host group.

Notice that in the screenshot the host ‘127.0.0.1′ is using ‘Monitored by proxy‘. This is extremely important since we do not care about the agent interface in the use case with proxies – the interface can contain an arbitrary address/DNS name. What we care about is the ‘Monitored by proxy’ field. Our command will be executed on the proxy that we select here.

4) On the server where service ‘zabbix-proxy‘ runs, install a new sudoers rule:

cd /etc/sudoers.d
echo 'zabbix ALL=(ALL) NOPASSWD: /usr/sbin/zabbix_proxy -R config_cache_reload' | sudo tee zabbix_proxy_config_cache_reload
chmod 0440 zabbix_proxy_config_cache_reload

5) Make the following changes in the ‘/etc/zabbix/zabbix_proxy.conf‘ proxy configuration file: ‘EnableRemoteCommands=1‘. Restart the ‘zabbix-proxy’ service afterwards.

6) Open ‘Administration’ => ‘Scripts’ and define the following commands:
For the ‘Zabbix servers’ host group:

sudo /usr/sbin/zabbix_server -R config_cache_reload	

Since this is a custom command that we will execute, the type of the script will be ‘Script’. The first script will be executed on the Zabbix server – we are forcing the central Zabbix server to reload its configuration cache. In this example, all users with at least ‘Read’ access to the Zabbix server host will be able to execute the script. You can limit this as per your internal Zabbix policies.

Below you can see how it should look:

For the ‘Zabbix proxies’ host group:

sudo /usr/sbin/zabbix_proxy -R config_cache_reload	

The only thing that we change for the proxy script is the ‘Command’ and ‘Execute on’ parameters, since now the command will be executed on the Zabbix proxy which is monitoring the target host:

Frontend as a control panel

I prefer to add an additional host group “Control panel” which contains the central Zabbix server and all Zabbix proxies.

Now when we need to reload our configuration cache, we can open ‘Monitoring’ => ‘Hosts‘ and filter out host group ‘Control panel’. Then click on the proxy host in question and select ‘config cache reload proxy’:

It takes 5 seconds to complete and then we will see the result of script execution. In this case – ‘command sent successfully’:

By the way, we can bookmark this page too 😉

With this approach, you can create ‘Control panel’ host groups and scripts for different types of tasks that you can execute directly from the Zabbix frontend! This allows us to use our Zabbix frontend not just for configuration and data overview, but also as a control panel of sorts for our hosts.
If you have any questions, comments, or wish to share your use cases for using scripts in the frontend – leave us a comment! Your use case could be the one to inspire many other Zabbix community members to give it a try.

Agentless Oracle database monitoring with ODBC

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/agentless-oracle-database-monitoring-with-odbc/15589/

Did you know that Zabbix has an out-of-the-box template for collecting Oracle database metrics? With this template, we can collect data like database, tablespace, ASM, and many other metrics agentlessly, by using ODBC. This blog post will guide you on how to set up ODBC monitoring for Oracle 11.2, 12.1, 18.5, or 19.2 database servers. This post can serve as the perfect set of guidelines for deploying Oracle database monitoring in your environment.

Download Instant client and SQLPlus

The provided commands apply for the following operating systems: CentOS 8, Oracle Linux 8, or Rocky Linux.

First we have to download the following packages:
oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm
oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

Here we are downloading

Oracle instant client – required, to establish connectivity to an Oracle database
SQLPlus  – A tool that we can use to test the connectivity to an Oracle database
Oracle ODBC package – contains the required ODBC drivers and configuration scripts to enable ODBC connectivity to an Oracle database

Upload the packages to the Zabbix server (or proxy, if you wish to monitor your Oracle DB on a proxy) and place it in:

/tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
/tmp/oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm
/tmp/oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

Solve OS dependencies

Install ‘libaio’ and ‘libnsl’ library:

dnf -y install libaio-devel libnsl

Otherwise, we will receive errors:

# rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
error: Failed dependencies:
        libaio is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64
        libnsl.so.1()(64bit) is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64
# rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
error: Failed dependencies:
        libnsl.so.1()(64bit) is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64

Check if Oracle components have been previously deployed on the system. The commands below should provide an empty output:

rpm -qa | grep oracle
ldconfig -p | grep oracle

Install Oracle Instant Client

rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm

Make sure that the package ‘oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64’ is installed:

rpm -qa | grep oracle

LD config

The official Oracle template page at git.zabbix.com talks about the method to configure Oracle ENV Usage for the service. For this version 19.12 of instant client, it is NOT REQUIRED to create a ‘/etc/sysconfig/zabbix-server’ file with content:

export ORACLE_HOME=/usr/lib/oracle/19.12/client64
export PATH=$PATH:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib64:/usr/lib:$ORACLE_HOME/bin
export TNS_ADMIN=$ORACLE_HOME/network/admin

While we did install the rpm package, the Oracle 19.12 client package did auto-configure LD path at the global level – it means every user on the system can use the Oracle instant client. We can see the LD path have been configured under:

cat /etc/ld.so.conf.d/oracle-instantclient.conf

This will print:

/usr/lib/oracle/19.12/client64/lib

To ensure that the required Oracle libraries are recognized by the OS, we can run:

ldconfig -p | grep oracle

It should print:

liboramysql19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/liboramysql19.so
libocijdbc19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libocijdbc19.so
libociei.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libociei.so
libocci.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libocci.so.19.1
libnnz19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libnnz19.so
libmql1.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libmql1.so
libipc1.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libipc1.so
libclntshcore.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so.19.1
libclntshcore.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so
libclntsh.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntsh.so.19.1
libclntsh.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntsh.so

Note: If for some reason the ldconfig command shows links to other dynamic libraries – that’s when we might have to create a separate ENV file for Zabbix server/Proxy, which would link the Zabbix application to the correct dynamic libraries, as per the example at the start of this section.

Check if the Oracle service port is reachable

To save us some headache down the line, let’s first check the network connectivity to our Oracle database host. Let’s check if we can reach the default Oracle port at the network level. In this example, we will try to connect to the default Oracle database port, 1521. Depending on which port your Oracle database is listening for connections, adjust accordingly,. Make sure the output says ‘Connected to 10.1.10.15:1521’:

nc -zv 10.1.10.15 1521

Test connection with SQLPlus

We can simulate the connection to the Oracle database before moving on with the ODBC configuration. Make sure that the Oracle username and password used in the command are correct. For this task, we will first need to install the SQLPlus package.:

rpm -ivh /tmp/oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm

To simulate the connection, we can use a one-liner command. In the example command I’m using the username ‘system’ together with the password ‘oracle’ to reach out to the Oracle database server ‘10.1.10.15’ via port ‘1521’ and connect to the service name ‘xe’:

sqlplus64 'system/[email protected](DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.10.15)(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=xe)))'

In the output we can see: we are using the 19.12 client to connect to 11.2 server:

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Sep 6 13:47:36 2021
Version 19.12.0.0.0
Copyright (c) 1982, 2021, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

Note: This gives us an extra hint regarding the Oracle instant client – newer versions of the client are backwards compatible with the older versions of the Oracle database server. Though this doesn’t apply to every version of Oracle client/server, please check the Oracle instant client documentation first.

ODBC connector

When it comes to configuring ODBC, let’s first install the ODBC driver manager

dnf -y install unixODBC

Now we can see that we have two new files –  ‘/etc/odbc.ini’ (possibly empty) and ‘/etc/odbcinst.ini’.

The file ‘/etc/odbcinst.ini’ describes driver relation. Currently, when we ‘grep’ the keyword ‘oracle’ there is no oracle relation installed, the output is empty when we run:

grep -i oracle /etc/odbcinst.ini

Our next step is to Install Oracle ODBC driver package:

rpm -ivh /tmp/oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

The ‘oracle-instantclient*-odbc’ package contains a script that will update the ‘/etc/odbcinst.ini’ configuration automatically:

cd /usr/lib/oracle/19.12/client64/bin
./odbc_update_ini.sh / /usr/lib/oracle/19.12/client64/lib

It will print:

 *** ODBCINI environment variable not set,defaulting it to HOME directory!

Now when we print the file on the screen:

cat /etc/odbcinst.ini

We will see that there is the Oracle 19 ODBC driver section added at the end of the file::

[Oracle 19 ODBC driver]
Description     = Oracle ODBC driver for Oracle 19
Driver          = /usr/lib/oracle/19.12/client64/lib/libsqora.so.19.1
Setup           =
FileUsage       =
CPTimeout       =
CPReuse         =

It’s important to check if there are no errors produced in the output when executing the ‘ldd’ command. This ensures that the dependencies are satisfied and accessible and there are no conflicts with the library versioning:

ldd /usr/lib/oracle/19.12/client64/lib/libsqora.so.19.1

It will print something similar like:

linux-vdso.so.1 (0x00007fff121b5000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb18601c000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb185c9a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb185a7a000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fb185861000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb185659000)
libaio.so.1 => /lib64/libaio.so.1 (0x00007fb185456000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fb18523f000)
libclntsh.so.19.1 => /usr/lib/oracle/19.12/client64/lib/libclntsh.so.19.1 (0x00007fb1810e6000)
libclntshcore.so.19.1 => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so.19.1 (0x00007fb180b42000)
libodbcinst.so.2 => /lib64/libodbcinst.so.2 (0x00007fb18092c000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb180567000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb1864da000)
libnnz19.so => /usr/lib/oracle/19.12/client64/lib/libnnz19.so (0x00007fb17fdba000)
libltdl.so.7 => /lib64/libltdl.so.7 (0x00007fb17fbb0000)

When we executed the ‘odbc_update_ini.sh’ script, a new DSN (data source name) file was made in ‘/root/.odbc.ini’. This is a sample configuration ODBC configuration file which describes what settings this version of ODBC driver supports.

Let’s move this configuration file from the user directories to a location accessible system-wide:

cat /root/.odbc.ini | sudo tee -a /etc/odbc.ini

And remove the file from the user directory completely:

rm /root/.odbc.ini

This way, every user in the system will use only this one ODBC configuration file.

We can now alter the existing configuration – /etc/odbc.ini. I’m highlighting things that have been changed from the defaults:

[Oracle11g]
AggregateSQLType = FLOAT
Application Attributes = T
Attributes = W
BatchAutocommitMode = IfAllSuccessful
BindAsFLOAT = F
CacheBufferSize = 20
CloseCursor = F
DisableDPM = F
DisableMTS = T
DisableRULEHint = T
Driver = Oracle 19 ODBC driver
DSN = Oracle11g
EXECSchemaOpt =
EXECSyntax = T
Failover = T
FailoverDelay = 10
FailoverRetryCount = 10
FetchBufferSize = 64000
ForceWCHAR = F
LobPrefetchSize = 8192
Lobs = T
Longs = T
MaxLargeData = 0
MaxTokenSize = 8192
MetadataIdDefault = F
QueryTimeout = T
ResultSets = T
ServerName = //10.1.10.15:1521/xe
SQLGetData extensions = F
SQLTranslateErrors = F
StatementCache = F
Translation DLL =
Translation Option = 0
UseOCIDescribeAny = F
UserID = system
Password = oracle

DNS – Data source name. Should match the section name in brackets, e.g.:[Oracle11g]
ServerName – Oracle server address
UserID – Oracle user name
Password – Oracle user password

To test the connection from the command line, let’s use the isql command-line tool which should simulate the ODBC connection akin to what the Zabbix is doing when gathering metrics:

isql -v Oracle11g

The isql command in this example picks up the ODBC settings (Username, Password, Server address) from the odbc.ini file. All we have to do is reference the particular DSN – Oracle11g

On the other hand, if we do not prefer to keep the password on the filesystem (/etc/odbc.ini), we can erase the lines ‘UserID’ and ‘Password’. Then we can test the ODBC connection with:

isql -v Oracle11g 'system' 'oracle'

In case of a successful connection it should print:

+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL>

And that’s it for the ODBC configuration! Now we should be able to apply the Oracle by ODBC template in Zabbix

Don’t forget that we also need to provide the necessary Oracle credentials to start collecting Oracle database metrics:

The lessons learned in this blog post can be easily applied to ODBC monitoring and troubleshooting in general, not just Oracle. If you’re having any issues or wish to share your experience with ODBC or Oracle database monitoring – feel free to leave us a comment!

Maintaining Zabbix API token via JavaScript

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/maintaining-zabbix-api-token-via-javascript/15561/

In this blog post, we will talk about maintaining and storing the Zabbix API session key in an automated fashion. The blog post builds upon the Close problem automatically via Zabbix API subject and can be used as extra configuration for this particular use-case. The blog post also shares a great example of synthetic monitoring by way of JavaScript preprocessing – how to emulate a scenario in an automated fashion and get alerted in case of any problems.

Prerequisites

First, let us create the Zabbix API user and user macros where we will store our username, password, Zabbix URL and the API session key.

1) Open “Administration” => “Users”. Create a new user ‘api’ with password ‘zabbix’. At the permissions tab set User Type “Zabbix Super Admin”.

2) Go to “Administration” => “General” => “Macros”. Configure base characteristics:

      {$Z_API_PHP} = http://demo.zabbix.demo/api_jsonrpc.php
     {$Z_API_USER} = api
 {$Z_API_PASSWORD} = zabbix
{$Z_API_SESSIONID} =

It’s OK to leave {$Z_API_SESSIONID} empty for now.

3) Let’s check if the Zabbix backend server can reach the Zabbix frontend server. Make sure that you are logged into the Zabbix backend server by looking up the zabbix_server process:

ps auxww | grep "[z]abbix_server.*conf"

Ensure that we can reach the Zabbix frontend by curling the Zabbix frontend server from the Zabbix backend server:

curl -s "http://demo.zabbix.demo/index.php" | grep Zabbix

4) Download template “Check and repair Zabbix API token” and import it in your instance.

5) Create a new host with an Agent interface and link the template. The IP address of the host does not matter. The template will use an agentless check to do the monitoring, it will use an “HTTP agent” item.

How it works

Our goal for today is to figure out a way to keep the Zabbix API authentication token up to date in a user macro. This way we can reuse the macro repeatedly for items, action operations and scripts that require for us to use the Zabbix API. We need to ensure that even if the token changes, the macro gets automatically updated with the new token value! Let’s try and understand each step of the underlying workflow required for us to achieve this goal.

The first component of our workflow is the “Validate session key raw” item. This is an HTTP agent item that performs a POST request with an arbitrary method – proxy.get in this case, but we could have used ANY other method. We simply want to check if an arbitrary Zabbix API call can be executed with the current {$Z_API_SESSIONID} macro value.

The second part of the workflow is the “Repair session key” dependent item. This item utilizes the JavaScript preprocessing step with custom JavaScript code to check the values obtained by the previous item and generate a new authentication token if that is necessary.

The third item – “Status”, is another dependent item that uses regular expression preprocessing steps to check for different error messages or status codes in the value of the “Validate session key raw” item. Most of the triggers defined in this template will react to the values obtained by this item.

 

Below you can see the full underlying workflow:

Code-wise, the magic is implemented with the following JavaScript code snippet:

if (value.match(/Session terminated/)) {

var req = new CurlHttpRequest();

// Zabbix API
var json_rpc='{$Z_API_PHP}'

// lib curl header
req.AddHeader('Content-Type: application/json');

// First request to get authentication token
var token =  JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"user.login","params":{"user":"{$Z_API_USER}","password":"{$Z_API_PASSWORD}"},"id":1,"auth":null}'
));

// If authentication was unsuccessful
if ( token.error )
{
// Login name or password is incorrect
return 32500;
}

else {
// Update the macro

// Get the global macro ID
// We cannot plot here a very native Zabbix macro because it will be automatically expanded
// Must use a workaround to distinguish a dollar sign from the actual macro name and merge with '+'
var id = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.get","params":{"output":["globalmacroid"],"globalmacro":true,"filter":{"macro":"{$'+'Z_API_SESSIONID'+'}"}},"auth":"'+token.result+'","id":1}'
)).result[0].globalmacroid;

// This line contains a keyword '+value+' which will grab exactly the previous value outside this JavaScript snippet
var overwrite = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.updateglobal","params":{"globalmacroid":"'+id+'","value":"'+token.result+'"},"auth":"'+token.result+'","id":1}'
));

// Return the id (an integer) of the macro, which was updated
return overwrite.result.globalmacroids[0];
}

} else {
return 0;
}

Throughout the JavaScript code, we are extracting a value from one step and using that as an input for the next step.

After the data has been collected, the values are analyzed by 4 triggers. 3 out of these 4 triggers prints a misconfiguration problem that requires a human investigation. We also have a “repairing Zabbix API session key in background” title, which is the main trigger that indicates a token has been expired and repair automatically.

And that’s it – now any integration that requires a Zabbix API authentication token can receive the current token value by referencing the user macro that we created in the article. We’ve ensured that the token is always going to stay up to date and in case of any issues, we will receive an alert from Zabbix!