Tag Archives: How-to

Blue/Green deployment with AWS Developer tools on Amazon EC2 using Amazon EFS to host application source code

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/devops/blue-green-deployment-with-aws-developer-tools-on-amazon-ec2-using-amazon-efs-to-host-application-source-code/

Many organizations building modern applications require a shared and persistent storage layer for hosting and deploying data-intensive enterprise applications, such as content management systems, media and entertainment, distributed applications like machine learning training, etc. These applications demand a centralized file share that scales to petabytes without disrupting running applications and remains concurrently accessible from potentially thousands of Amazon EC2 instances.

Simultaneously, customers want to automate the end-to-end deployment workflow and leverage continuous methodologies utilizing AWS developer tools services for performing a blue/green deployment with zero downtime. A blue/green deployment is a deployment strategy wherein you create two separate, but identical environments. One environment (blue) is running the current application version, and one environment (green) is running the new application version. The blue/green deployment strategy increases application availability by generally isolating the two application environments and ensuring that spinning up a parallel green environment won’t affect the blue environment resources. This isolation reduces deployment risk by simplifying the rollback process if a deployment fails.

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, and fully-managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It scales on demand, thereby eliminating the need to provision and manage capacity in order to accommodate growth. Utilize Amazon EFS to create a shared directory that stores and serves code and content for numerous applications. Your application can treat a mounted Amazon EFS volume like local storage. This means you don’t have to deploy your application code every time the environment scales up to multiple instances to distribute load.

In this blog post, I will guide you through an automated process to deploy a sample web application on Amazon EC2 instances utilizing Amazon EFS mount to host application source code, and utilizing a blue/green deployment with AWS code suite services in order to deploy the application source code with no downtime.

How this solution works

This blog post includes a CloudFormation template to provision all of the resources needed for this solution. The CloudFormation stack deploys a Hello World application on Amazon Linux 2 EC2 Instances running behind an Application Load Balancer and utilizes Amazon EFS mount point to store the application content. The AWS CodePipeline project utilizes AWS CodeCommit as the version control, AWS CodeBuild for installing dependencies and creating artifacts,  and AWS CodeDeploy to conduct deployment on EC2 instances running in an Amazon EC2 Auto Scaling group.

Figure 1 below illustrates our solution architecture.

Sample solution architecture

Figure 1: Sample solution architecture

The event flow in Figure 1 is as follows:

  1. A developer commits code changes from their local repo to the CodeCommit repository. The commit triggers CodePipeline execution.
  2. CodeBuild execution begins to compile source code, install dependencies, run custom commands, and create deployment artifact as per the instructions in the Build specification reference file.
  3. During the build phase, CodeBuild copies the source-code artifact to Amazon EFS file system and maintains two different directories for current (green) and new (blue) deployments.
  4. After successfully completing the build step, CodeDeploy deployment kicks in to conduct a Blue/Green deployment to a new Auto Scaling Group.
  5. During the deployment phase, CodeDeploy mounts the EFS file system on new EC2 instances as per the CodeDeploy AppSpec file reference and conducts other deployment activities.
  6. After successful deployment, a Lambda function triggers in order to store a deployment environment parameter in Systems Manager parameter store. The parameter stores the current EFS mount name that the application utilizes.
  7. The AWS Lambda function updates the parameter value during every successful deployment with the current EFS location.

Prerequisites

For this walkthrough, the following are required:

Deploy the solution

Once you’ve assembled the prerequisites, download or clone the GitHub repo and store the files on your local machine. Utilize the commands below to clone the repo:

mkdir -p ~/blue-green-sample/
cd ~/blue-green-sample/
git clone https://github.com/aws-samples/blue-green-deployment-pipeline-for-efs

Once completed, utilize the following steps to deploy the solution in your AWS account:

  1. Create a private Amazon Simple Storage Service (Amazon S3) bucket by using this documentation
    AWS S3 console view when creating a bucket

    Figure 2: AWS S3 console view when creating a bucket

     

  2. Upload the cloned or downloaded GitHub repo files to the root of the S3 bucket. the S3 bucket objects structure should look similar to Figure 3:
    AWS S3 bucket object structure after you upload the Github repo content

    Figure 3: AWS S3 bucket object structure

     

  3. Go to the S3 bucket and select the template name solution-stack-template.yml, and then copy the object URL.
  4. Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
  5. Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 3, and then choose Next.
  6. On the Specify stack details page, enter a name for the stack and provide the following input parameter. Modify the default values for other parameters in order to customize the solution for your environment. You can leave everything as default for this walkthrough.
  • ArtifactBucket– The name of the S3 bucket that you created in the first step of the solution deployment. This is a mandatory parameter with no default value.
Defining the stack name and input parameters for the CloudFormation stack

Figure 4: Defining the stack name and input parameters for the CloudFormation stack

  1. Choose Next.
  2. On the Options page, keep the default values and then choose Next.
  3. On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and then choose Create Stack.
  4. Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:
  • A virtual private cloud (VPC) configured with two public and two private subnets.
  • NAT Gateway, an EIP address, and an Internet Gateway.
  • Route tables for private and public subnets.
  • Auto Scaling Group with a single EC2 Instance.
  • Application Load Balancer and a Target Group.
  • Three security groups—one each for ALB, web servers, and EFS file system.
  • Amazon EFS file system with a mount target for each Availability Zone.
  • CodePipeline project with CodeCommit repository, CodeBuild, and CodeDeploy resources.
  • SSM parameter to store the environment current deployment status.
  • Lambda function to update the SSM parameter for every successful pipeline execution.
  • Required IAM Roles and policies.

      Note: It may take anywhere from 10-20 minutes to complete the stack creation.

Test the solution

Now that the solution stack is deployed, follow the steps below to test the solution:

  1. Validate CodePipeline execution status

After successfully creating the CloudFormation stack, a CodePipeline execution automatically triggers to deploy the default application code version from the CodeCommit repository.

  • In the AWS console, choose Services and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the CodePipelineURL key and click on the URL.
  • Validate that all steps have successfully completed. For a successful CodePipeline execution, you should see something like Figure 5. Wait for the execution to complete in case it is still in progress.
CodePipeline console showing execution status of all stages

Figure 5: CodePipeline console showing execution status of all stages

 

  1. Validate the Website URL

After completing the pipeline execution, hit the website URL on a browser to check if it’s working.

  • On the stack Outputs tab, look for the WebsiteURL key and click on the URL.
  • For a successful deployment, it should open a default page similar to Figure 6.
Sample “Hello World” application (Green deployment)

Figure 6: Sample “Hello World” application (Green deployment)

 

  1. Validate the EFS share

After the website deployed successfully, we will get into the application server and validate the EFS mount point and the application source code directory.

  • Open the Amazon EC2 console, and then choose Instances in the left navigation pane.
  • Select the instance named bg-sample and choose
  • For Connection method, choose Session Manager, and then choose connect

After the connection is made, run the following bash commands to validate the EFS mount and the deployed content. Figure 7 shows a sample output from running the bash commands.

sudo df –h | grep efs
ls –la /efs/green
ls –la /var/www/
Sample output from the bash command (Green deployment)

Figure 7: Sample output from the bash command (Green deployment)

 

  1. Deploy a new revision of the application code

After verifying the application status and the deployed code on the EFS share, commit some changes to the CodeCommit repository in order to trigger a new deployment.

  • On the stack Outputs tab, look for the CodeCommitURL key and click on the corresponding URL.
  • Click on the file html.
  • Click on
  • Uncomment line 9 and comment line 10, so that the new lines look like those below after the changes:
background-color: #0188cc; 
#background-color: #90ee90;
  • Add Author name, Email address, and then choose Commit changes.

After you commit the code, the CodePipeline triggers and executes Source, Build, Deploy, and Lambda stages. Once the execution completes, hit the Website URL and you should see a new page like Figure 8.

New Application version (Blue deployment)

Figure 8: New Application version (Blue deployment)

 

On the EFS side, the application directory on the new EC2 instance now points to /efs/blue as shown in Figure 9.

Sample output from the bash command (Blue deployment)

Figure 9: Sample output from the bash command (Blue deployment)

Solution review

Let’s review the pipeline stages details and what happens during the Blue/Green deployment:

1) Build stage

For this sample application, the CodeBuild project is configured to mount the EFS file system and utilize the buildspec.yml file present in the source code root directory to run the build. Following is the sample build spec utilized in this solution:

version: 0.2
phases:
  install:
    runtime-versions:
      php: latest   
  build:
    commands:
      - current_deployment=$(aws ssm get-parameter --name $SSM_PARAMETER --query "Parameter.Value" --region $REGION --output text)
      - echo $current_deployment
      - echo $SSM_PARAMETER
      - echo $EFS_ID $REGION
      - if [[ "$current_deployment" == "null" ]]; then echo "this is the first GREEN deployment for this project" ; dir='/efs/green' ; fi
      - if [[ "$current_deployment" == "green" ]]; then dir='/efs/blue' ; else dir='/efs/green' ; fi
      - if [ ! -d $dir ]; then  mkdir $dir >/dev/null 2>&1 ; fi
      - echo $dir
      - rsync -ar $CODEBUILD_SRC_DIR/ $dir/
artifacts:
  files:
      - '**/*'

During the build job, the following activities occur:

  • Installs latest php runtime version.
  • Reads the SSM parameter value in order to know the current deployment and decide which directory to utilize. The SSM parameter value flips between green and blue for every successful deployment.
  • Synchronizes the latest source code to the EFS mount point.
  • Creates artifacts to be utilized in subsequent stages.

Note: Utilize the default buildspec.yml as a reference and customize it further as per your requirement. See this link for more examples.

2) Deploy Stage

The solution is utilizing CodeDeploy blue/green deployment type for EC2/On-premises. The deployment environment is configured to provision a new EC2 Auto Scaling group for every new deployment in order to deploy the new application revision. CodeDeploy creates the new Auto Scaling group by copying the current one. See this link for more details on blue/green deployment configuration with CodeDeploy. During each deployment event, CodeDeploy utilizes the appspec.yml file to run the deployment steps as per the defined life cycle hooks. Following is the sample AppSpec file utilized in this solution.

version: 0.0
os: linux
hooks:
  BeforeInstall:
    - location: scripts/install_dependencies
      timeout: 180
      runas: root
  AfterInstall:
    - location: scripts/app_deployment
      timeout: 180
      runas: root
  BeforeAllowTraffic :
     - location: scripts/check_app_status
       timeout: 180
       runas: root  

Note: The scripts mentioned in the AppSpec file are available in the scripts directory of the CodeCommit repository. Utilize these sample scripts as a reference and modify as per your requirement.

For this sample, the following steps are conducted during a deployment:

  • BeforeInstall:
    • Installs required packages on the EC2 instance.
    • Mounts the EFS file system.
    • Creates a symbolic link to point the apache home directory /var/www/html to the appropriate EFS mount point. It also ensures that the new application version deploys to a different EFS directory without affecting the current running application.
  • AfterInstall:
    • Stops apache web server.
    • Fetches current EFS directory name from Systems Manager.
    • Runs some clean up commands.
    • Restarts apache web server.
  • BeforeAllowTraffic:
    • Checks application status if running fine.
    • Exits the deployment with error if the app returns a non 200 HTTP status code. 

3) Lambda Stage

After completing the deploy stage, CodePipeline triggers a Lambda function in order to update the SSM parameter value with the updated EFS directory name. This parameter value alternates between “blue” and “green” to help CodePipeline identify the right EFS file system path during the next deployment.

CodeDeploy Blue/Green deployment

Let’s review the sequence of events flow during the CodeDeploy deployment:

  1. CodeDeploy creates a new Auto Scaling group by copying the original one.
  2. Provisions a replacement EC2 instance in the new Auto Scaling Group.
  3. Conducts the deployment on the new instance as per the instructions in the yml file.
  4. Sets up health checks and redirects traffic to the new instance.
  5. Terminates the original instance along with the Auto Scaling Group.
  6. After completing the deployment, it should appear as shown in Figure 10.
AWS CodeDeploy console view of a Blue/Green CodeDeploy deployment on Ec2

Figure 10: AWS console view of a Blue/Green CodeDeploy deployment on Ec2

Troubleshooting

To troubleshoot any service-related issues, see the following links:

More information

Now that you have tested the solution, here are some additional points worth noting:

  • The sample template and code utilized in this blog can work in any AWS region and are mainly intended for demonstration purposes. Utilize the sample as a reference and modify it further as per your requirement.
  • This solution works with single account, Region, and VPC combination.
  • For this sample, we have utilized AWS CodeCommit as version control, but you can also utilize any other source supported by AWS CodePipeline like Bitbucket, GitHub, or GitHub Enterprise Server

Clean up

Follow these steps to delete the components and avoid any future incurring charges:

  1. Open the AWS CloudFormation console.
  2. On the Stacks page in the CloudFormation console, select the stack that you created for this blog post. The stack must be currently running.
  3. In the stack details pane, choose Delete.
  4. Select Delete stack when prompted.
  5. Empty and delete the S3 bucket created during deployment step 1.

Conclusion

In this blog post, you learned how to set up a complete CI/CD pipeline for conducting a blue/green deployment on EC2 instances utilizing Amazon EFS file share as mount point to host application source code. The EFS share will be the central location hosting your application content, and it will help reduce your overall deployment time by eliminating the need for deploying a new revision on every EC2 instance local storage. It also helps to preserve any dynamically generated content when the life of an EC2 instance ends.

Author bio

Rakesh Singh

Rakesh is a Senior Technical Account Manager at Amazon. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Recreate Gradius’ rock-spewing volcanoes | Wireframe #52

Post Syndicated from Ryan Lambie original https://www.raspberrypi.org/blog/recreate-gradius-volcanoes-wireframe-52/

Code an homage to Konami’s classic shoot-’em-up, Gradius. Mark Vanstone has the code in the new edition of Wireframe magazine, available now.

Released by Konami in 1985, Gradius – also known as Nemesis outside Japan – brought a new breed of power-up system to arcades. One of the keys to its success was the way the player could customise their Vic Viper fighter craft by gathering capsules, which could then be ‘spent’ on weapons, speed-ups, and shields from a bar at the bottom of the screen.

Gradius screenshot
The Gradius volcanoes spew rocks at the player just before the end-of-level boss ship arrives.

Flying rocks

A seminal side-scrolling shooter, Gradius was particularly striking thanks to the variety of its levels: a wide range of hazards were thrown at the player, including waves of aliens, natural phenomena, and boss ships with engine cores that had to be destroyed in order to progress. One of the first stage’s biggest obstacles was a pair of volcanoes that spewed deadly rocks into the air: the rocks could be shot for extra points or just avoided to get through to the next section. In this month’s Source Code, we’re going to have a look at how to recreate the volcano-style flying rock obstacle from the game.

Our sample uses Pygame Zero and the randint function from the random module to provide the variations of trajectory that we need our rocks to have. We’ll need an actor created for our spaceship and a list to hold our rock Actors. We can also make a bullet Actor so we can make the ship fire lasers and shoot the rocks. We build up the scene in layers in our draw() function with a star-speckled background, then our rocks, followed by the foreground of volcanoes, and finally the spaceship and bullets.

Dodge and shoot the rocks in our homage to the classic Gradius.

Get the ship moving

In the update() function, we need to handle moving the ship around with the cursor keys. We can use a limit() function to make sure it doesn’t go off the screen, and the SPACE bar to trigger the bullet to be fired. After that, we need to update our rocks. At the start of the game our list of rocks will be empty, so we’ll get a random number generated, and if the number is 1, we make a new rock and add it to the list. If we have more than 100 rocks in our list, some of them will have moved off the screen, so we may as well reuse them instead of making more new rocks. During each update cycle, we’ll need to run through our list of rocks and update their position. When we make a rock, we give it a speed and direction, then when it’s updated, we move the rock upwards by its speed and then reduce the speed by 0.2. This will make it fly into the air, slow down, and then fall to the ground. 

Collision detection

From this code, we can make rocks appear just behind both of the volcanoes, and they’ll fly in a random direction upwards at a random speed. We can increase or decrease the number of rocks flying about by changing the random numbers that spawn them. We should be able to fly in and out of the rocks, but we could add some collision detection to check whether the rocks hit the ship – we may also want to destroy the ship if it’s hit by a rock. In our sample, we have an alternative, ‘shielded’ state to indicate that a collision has occurred. We can also check for collisions with the bullets: if a collision’s detected, we can make the rock and the bullet disappear by moving them off-screen, at which point they’re ready to be reused.

That’s about it for this month’s sample, but there are many more elements from the original game that you could add yourself: extra weapons, more enemies, or even an area boss.

Here’s Mark’s volcanic code. To get it working on your system, you’ll need to install Pygame Zero. And to download the full code and assets, head here.

Get your copy of Wireframe issue 52

You can read more features like this one in Wireframe issue 52, available directly from Raspberry Pi Press — we deliver worldwide.

Wireframe issue 52's cover

And if you’d like a handy digital version of the magazine, you can also download issue 52 for free in PDF format.

The post Recreate Gradius’ rock-spewing volcanoes | Wireframe #52 appeared first on Raspberry Pi.

Triggers, calculated and aggregated items in Zabbix 5.4

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/triggers-calculated-and-aggregated-items-in-zabbix-5-4/14880/

In earlier Zabbix versions, we had three categories of things we could manage inside the instance — triggers, calculated items, and aggregated items. Each of them had its own syntax, so we could not be sure what syntax to use in a certain case. That’s why we introduced the unified syntax for every category inside the monitoring tool that will ease up the documentation task and the configuration process. To appreciate the innovation, we need to recap these three categories in Zabbix.

Contents

I. What’s different in Zabbix 5.4? (1:52)
II. Examples (9:20)
III. Aggregated checks (14:09)
IV. Questions & Answers (19:08)

The trigger is a logic implying calculating some formula. If it is true, it will generate an event, a so-called problem. At the same time, calculated items are doing calculations as well, but at the end, they store a value inside the database. This value will be either an integer or a floating number. So, both these things are doing calculations, but before Zabbix 5.4 the syntax was different.

What’s different in Zabbix 5.4?

Each item has a tag:tagvalue

It all starts almost with the item as it is how we collect the metrics. Previously, it was always an application type of thing. Whenever we designed an item, it could be a memory-type of thing, a network, CPU, etc. Now, we have these column tags.

TAG:TAGVALUE

As you can see in this example, the tag value is ‘Application’. This really lets us have a flawless upgrade operation. The ‘Application’ field is completely gone, and now we have only tags on this item level.

Period and time shift is one argument now

Previously, whenever we were dealing with trigger functions, most of the time, the first argument was the interval to be used as an input. We had two arguments: we could analyze metrics for the specified period with <period> and, to go back in time and use <time shift> after comma — <period>,<time shift>.

We have decided to use one argument — <period:time shift>.

If you want to analyze data for yesterday, or last week, or last year, you can now use one argument:

  • <1h> — during the last one hour,
  • <1h:now-1d> — during the same hour one day ago,
  • <1d:now/d> — yesterday during the day.

STR(), REGEX(), IREGEX() => FIND()

Another big change — the functions searching for the data containing a string will now be converted into one function — find().

So, the string function — STR(), which was less CPU-intensive, could search for a string. If we needed a more complex pattern, we used REGEX() or, in case-sensitive cases, the regular expression function — IREGEX(). Now, we can use a single function covering all the needs, including case-sensitive search, — find().

It contains more arguments, which specify a string way to search, and consumes less CPU power. Or we might really need to utilize the regular expression thing.

New syntax

So, in the earlier Zabbix version, the trigger had a curly parenthesis at the beginning and at the end, the key in the middle, and the trigger function mathematical calculations with the dot in the middle: {host:key.max(15m)}.

The unified syntax will always start with the function. This lets us use recursions — as some functions support multiple arguments, we can put a function inside the function thus eliminating the previous limitations: max(/host/key,15m).

This syntax is very similar to that used in the Linux file system — it starts with the forward slash, then comes the host name, and the item key with the interval (including shifting) after the first comma.

Now, the same syntax is used for triggers and calculated items. In addition, previously, we did have the aggregated items. Now, if you want to do some aggregation, you’ll have to use calculated items inside the interface.

Examples

Now, there are many things you can accomplish, which were not possible before.

Absolute time periods

We might need to know what has happened today, for instance, whether the workstation is on at 10 a.m. We can do it by summing up the agent.ping items. So, if the agent.ping is zero, then it is still offline and no one has turned the workstation on. Otherwise, the agent.ping will be 1.

sum(/host/key, 1d:now/d+1d) = 0 and time() > 100000

  • We might need to find out the maximum workstation uptime yesterday. If the uptime is bigger than 10 hours, a person might have been working at the computer for too long and might need a reminder that there’s life outside.

trendmax(/host/key, 1d:now/d)>10h

  • When analyzing user sessions, we might search for the maximum peak last week. To do that, here we compare the last week’s peak with yesterday’s peak and find out that it has doubled. So, it’s a problem and we can get an event.

trendmax(/host/key, 1d:now/d) > trendmax(/host/key, 1d:now/w) / 2

  • We can also analyze activity, for instance, the stock module memory utilization per Windows machine for the previous month. So, we are looking at the maximum memory used during the previous month per server, not the workstation, and then compare the used memory with the memory available.

trendmax(/host/key1, 1M:now/M) < last(/host/key2) / 2

Here, the server is not utilizing half of the available memory, so we will get notified.

The beauty of this trigger is that the stock template already contains all those 12 items — a collection per-used memory, a collection per-total memory. All we need to do is go to the Trigger section, install this trigger, and as soon as one single metric comes in, it will generate the event about the last month as all the data is stored inside the database.

Compare string between two hosts

Now, we can compare text strings. We can use a different type of input, for instance, different servers (say, node 1, node 2), and compare the agent versions. If the version is different or the same, you can decide what the problem is. It will fire up an event, and we can indicate what the difference between those versions is in the event title.

last(/node1/agent.version) <> last(/node2/agent.version)

Aggregated checks

Aggregation on current host (foreach)

You might need to calculate something and store the data in the database.

In this example, if we have a website. we will have a template containing a web scenario, which consists of multiple steps, such as checking for pages. It will provide the Latest data page and the response time per page.

In the Templates, we can select this calculated item, and, for instance, aggregate all those built-in response time metrics.

Average web page response time: avg(last_foreach(//web.test.time[check performance,*,resp]))

The ‘*’, the wildcard, will tell Zabbix to aggregate all the items reflecting the response time, and the double forward slash means aggregation for the current host. ‘foreach’ will now be all over the place whenever we do the aggregation. This command is the same as used in the Windows PowerShell used to go through the different types of elements: either through the hosts, either through the items.

We might use a different type of aggregation using the same approach. For instance, when monitoring the disk space, we are capturing the used space by the Linux or Windows machine that can have different drives or different modes. Using one calculated item, we can aggregate the total used space per all the mounts or all the drives on the server. This might be useful, for instance, to estimate how much disk space you need if it’s time to purchase an SSD drive.

Total used space on all drives: sum(last_foreach(//vfs.fs.size[*,used]))

Aggregation by host group

Another type of functionality — aggregation by the host group.

You might have a group, for instance, ‘MySQL servers’ with the official solution looking for the queries per second. It will aggregate all the queries per second in one single item. If there is a pool of MySQL servers, then you will end up seeing the total number of these queries and afterward linking a trigger on this item.

MySQL queries per second: sum(last_foreach(/*/mysql.queries.rate?[group=”MySQL servers”]))

Aggregation by tag:value

You can do the aggregation not only by host group but also by utilizing the item Tags (tag name and tag value). It does not need to be a tag on the item level. You can mark your host element with the role on the host level. So, you create this item inside the template, and it will search for all these items, which belong to a specific host with this tag and tag value, and do the aggregation. You will end up with an item, which has used space metrics (domain controllers in this example).

Used space on domain controllers: sum(last_foreach(/*/vfs.fs.size[*,used]?[tag=”Role:Domain Controller”]))

Questions & Answers

Question. After the upgrade, what will happen with the trigger item syntax? Do we have to rebuild everything?

Answer. It gets upgraded automatically. However, as per the results of my testing in the test environment, some items have to be checked. I would highly suggest extracting the configuration and testing the upgrade process beforehand to learn how those items will affect the system (just to be on the safe side).

Question. Are there any changes or improvements in the predictive trigger syntax?

Answer. You should definitely take a look at the roadmap for the exact information.

Question. When we’re doing an aggregation, is it always going to be on a single host or can we specify a host group or a tag?

Answer. Surely. Aggregation by host group is possible as it was before. Still, it’s more flexible now. We need to specify a host, a wildcard for the host item, and then we can add additional value by using the host group or by specifying what kind of tag the item should have. Then your parameters will be respected. We can combine all those things, that is, filter by host group and by the tag name and tag value.

In addition, when we’re doing the upgrade, the prototype items and triggers for the low-level discovery rules are also going to be automatically switched over.

 

Auto-healing Kafka connector tasks with Zabbix

Post Syndicated from Ronald Schouw original https://blog.zabbix.com/auto-healing-kafka-connector-tasks-with-zabbix/14269/

In this post, we will talk about the low-level discovery of Kafka connectors and tasks. When a Kafka task fails, a trigger is fired, which starts a remote command to restart the failed Kafka task. Of course, with the necessary logging around it.

You can find the template and scripts on the Zabbix share. But first, let’s talk a little bit about Kafka producers and consumers.  Let’s say you have got a couple of connectors set up, pulling data from Postgres with Debezium and streaming it into Elasticsearch. The Postgres source is a bit flaky and goes offline periodically. If you view the status of the Postgres source, the producer, you noticed the task is failed. Kafka does not restart the failed task out of the box. We don’t wait for the customer to complain, but we let Zabbix actively monitor the tasks. A failed connector task is easy to restart using the Rest API.  But manually restarting and watching a task is annoying. We used to do that at our business. Now Zabbix comes into play and restarts the failed Kafka task automatically. And we do sleep well.

About Kafka

Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open-sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform.

First, let’s do a curl and check the failed connector task.

curl -s "http://localhost:8083/connectors"| 
jq '."connector_sink-test"| .status.tasks'
[{
"id": 0,
"state": "RUNNING",
"worker_id": "connect1.test.com:8083"
},
{
"id": 1,
"state": "FAILED",
"worker_id": "connect2.test.com:8083"
}]

So this is where the fun starts – we have a connector task with id “1” which has failed. At the end of the blog, Zabbix restarts the connector, but first, let’s look at an example. This curl post should restart the connector task: connect2.test.com id:1

curl -X POST http://localhost:8083/connectors/connect2.test.com/tasks/1/restart
Low-level discovery

The zabbix_kafka_connector template does work out of the box. To implemented the use cases provided in this blog  you will need the scripts bundled together with the template. Kafka connectors can have multiple tasks. First, we determine the connectors and later the state of the connectors and tasks. Let’s run the following script – api_connectors.sh. I suggest you execute the script via a cronjob every 5 minutes, depending on your priority to run the curl jobs.

api_connectors.sh

curl http://localhost:8083/connectors?expand=status | jq > check_connectors
curl http://localhost:8083/connectors | jq .[] > get_connectors

It creates two files, check_connectors, and  get_connectors. Needless to say, we use curl with authentication in the production environment.

The next shell script get_connector_data.sh uses check_connectors and get_connectors files as input. It defines the connector {#CONNECTOR} and the connector tasks {#CONNECTOR_ID} with the corresponding ID used by low-level discovery. Down the line it might be more efficient to rewrite it as a python script. Json query is our useful friend here. The script is used by a user parameter later on.

get_connector_data.sh

#!/bin/sh
CONNECTOR=$(cat get_connectors)
CONNECTOR_IDS=$(cat get_connectors | tr -d ")
FIRST="1"
#create zabbix lld discovery connectors
echo "{"
echo " "data":["
for i in $CONNECTOR
do
if [ "$FIRST" -eq 0 ]
then
printf ",n"
fi
FIRST="0"
printf " {"{#CONNECTOR}": $i}"
done
#create zabbix lld discovery task connectors
for i in $CONNECTOR_IDS
do
IDS=$(cat check_connectors | jq --arg i ${i} -r '."'${i}'"| .status.tasks[].id')
for z in $IDS
do
if [ "$FIRST" -eq 0 ]
then
printf ",n"
fi
FIRST="0"
printf " {"{#CONNECTOR_ID}": "${i}-${z}"}"
done
done
#
printf "n ] n}"

Part of the script output will look like this, depending, of course, how many connectors there are and tasks in your Kafka environment.

{
"data":[
{"{#CONNECTOR}": "source_invoices-prod"},
{"{#CONNECTOR}": "employee_sink-prod"},
{"{#CONNECTOR_ID}": "ource_invoices-prod-0"},
{"{#CONNECTOR_ID}": "source_invoices-prod-1"},
{"{#CONNECTOR_ID}": "employee_sink-prod-0"},
{"{#CONNECTOR_ID}": "employee_sink-prod-1"},
{"{#CONNECTOR_ID}": "employee_sink-prod-2"},
{"{#CONNECTOR_ID}": "employee_sink-prod-3"}
]
}
Template.

We will define a template with the LLD rule in it and later attach the template to a host. Create a template Configuration > Templates > Create template.  Give it a name according to your choice: Template_kafka_connector or some other name, depending on your template naming policies.

Discovery rule

Next, we create a discovery rule. Keep lost resources period is an arbitrary value here – once again, depending on your policies regarding LLD entities.
In this case, we will discard the lost resource immediately – Keep Lost resources (0). This can be a bit more database friendly, in case when Kafka creates hundreds of connectors. The update interval is the same as the cronjob interval.

Configuration > Templates > your created template > discovery > create discovery rule

The key is used by the User Parameter further in the blog

Item prototype.

We will create two item prototypes, one for the connector and one for the task of the connector with the corresponding ID of the task. The ID is important because we want to restart the correct task later.

Name: State of {#CONNECTOR} connector
Key: state[{#CONNECTOR}]

Configuration > Templates > your created template > item prototypes > create item prototype

Trigger prototypes

Four trigger prototypes have been created. They are sets of two. The sets have different severities. The highest severity only fires after six hours and is intended for the operation center. Most times, Zabbix will restart the failed task within 5 or 10 minutes. It is then not necessary to burden the operation center with this. I will explain the most important trigger. This trigger will soon be used in an action to start the remote command. The URL macro {TRIGGER.URL} is used, which determines the ID of the task that should be restarted. There are probably other solutions, but this one works well and is stable.

Configuration > Templates > your created template > item prototypes > create trigger prototype


The other trigger examples are provided below.

Name: Kafka Connector task {#CONNECTOR_ID} on {HOST.NAME} is not RUNNING
Expression: {C_Template kafka Connector:task[{#CONNECTOR_ID}].str(RUNNING,6h)}=0 and {C_Template kafka Connector:task[{#CONNECTOR_ID}].str(FAILED)}=1
Severity Warning
Name: Kafka Connector {#CONNECTOR} on {HOST.NAME} is FAILED
Expression: {C_Template kafka Connector:state[{#CONNECTOR}].str(FAILED)}=1
Severity: Not classified
Name: Kafka Connector {#CONNECTOR} on {HOST.NAME} is not RUNNING
Expression: {C_Template Kafka Connector:state[{#CONNECTOR}].str(RUNNING,6h)}=0 and {C_Template Kafka Connector:state[{#CONNECTOR}].str(FAILED)}=1
Severity: warning
Userparameter

Three User Parameters are required—one for the low-level discovery and two for the items.

UserParameter=connector.discovery,sh /etc/zabbix/get_connector_data.sh
UserParameter=state[*],/etc/zabbix/check_connector.sh $1
UserParameter=task[*],/etc/zabbix/check_task_connector.sh $1

check_connector.sh script gets the state of the connector.

#!/bin/sh
CONNECTOR="$1"
cat /etc/zabbix/check_connectors | jq --arg CONNECTOR "${CONNECTOR} " -r '."'${CONNECTOR}'" | .status.connector.state'

check_task_connector.sh  Does a check on the connector task. A disadvantage of this construction is that the connector can have a maximum of 10 tasks. At ID -10 or higher, the check fails. But that’s unusual in Kafka to deploy a connector with so many tasks.

#!/bin/sh
value=$1
CONNECTOR=$(echo ${value::-2})
IDS=$(echo ${value:(-1)})
cat /etc/zabbix/check_connectors | jq --arg CONNECTOR "${CONNECTOR}" --arg IDS '${IDS}' -r '."'${CONNECTOR}'" | .status.tasks[]| select(.id=='$IDS').state'
Zabbix-agent

When all scripts are in the right place, we make a small adjustment to the Zabbix agent config. The LogRemoteCommands option is not necessary, but it is useful for debugging. Restart the Zabbix agent afterward. Add the Kafka template to a host, and we can proceed.

EnableRemoteCommands=1
LogRemoteCommands=1
Action auto-healing

Let’s define some actions that can heal our connector tasks by automatically restarting a Kafka task with an action. Create a new action –  you can choose any conditions that can be applied to your trigger.

Configuration > actions > event source – triggers > create action.

Create an operation. This can be a bit tricky. In my case, I restart the tasks every five minutes for the first half-hour. If unsuccessful, the Kafka admins will receive an email. After that, the tasks are restarted every hour for three days. In practice, this has never happened, but such a situation can occur over the weekend, for example. After three days, the operation stops and sends a final email. Usually, the task starts the first time – if not, then the second attempt is sufficient in 99% of the cases.

Restart script.

You will probably have to adapt the script to your own environment. We have built-in some extra logging. This is certainly useful during the initial setup.

#!/bin/sh
LOG=/var/log/zabbix/restarted-connector.log
value=$(echo $1 | awk -F "/" '{print $(NF)}')
echo $value
CONNECTOR=$(echo ${value::-2})
IDS=$(echo ${value:(-1)})
curl -v -X POST http://localhost:8083/connectors/"{$CONNECTOR}"/tasks/"{$IDS}"/restart 2>&1 | tee -a $LOG
echo "Connector $CONNECTOR ID $IDS has been restarted at $(date)" >> $LOG

The {TRIGGER.URL} macro is used here, not intended to be used this way out of the box by Zabbix, but it gets the job done for this use case. The awk ensures that the http: // is fetched.

If you have any other suggestions on how to improve the scripts or the templates – you are very much welcome to leave a comment with your idea!

Credits.

I am inspired by Robin Moffatt at Confluent and not in the last place my colleague Werner Dijkerman at fullstaq

Correlation between devices across client site

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/correlation-between-devices-across-client-site/14657/

In this blog post, we will talk about aggregating different kinds of devices that are disconnected from the general network. Finding out how many devices per kind are “down” right now. This can be useful in the Internet Service Provider type of situation.

A property

It all starts with a property.

A property can be a building, a block. Most likely it has a firewall and a core switch at the top of everything.

A building can have floors. Floors can own a switch. An edge switch.

Each floor can have rooms or departments. It may be enough to put there a router to feed all devices around.

Vision

When something goes off, we want to see “what is the damage?” If a major component goes down, that should be a priority to concentrate on.

In general, we target much more descriptive message like:

=> 2 edge switches down

=> A core switch is down

=> 15 routers down

=> Firewall is down

For the best experience, we target to have only one message.

Prerequisites

To inform us how many devices are down, we need to make sure:

1) Each client host must belong to a host group. The name of host group describes the location of the property, for example “Riga/Block7”:

2) Each host object owns a macro {$PROPERTY_HOST_GROUP}. This can be delivered through the template. The macro value must be the same as the name of the host group: “Riga/Block7”

3) There is one virtual host in the client pool. This host will do the aggregations, determine what kind of devices are down and how many of them.

4) At least one passive check must work for devices. SNMP polling must be in place.

How it works?

Monitoring software is executing passive checks:

As a result, it will generate red/green icons:

A “Zabbix internal” type can read the status of the icon:

zabbix[host,agent,"available"]
zabbix[host,snmp,"available"]

if the icon is red, a number 0 will be reported

if the icon is green, a number 2 will be reported

2 items in the template

There are 2 items in the template per category. At first, the “availability” item fetch the status of the icon and then a dependable item transforms this information into another number:

// Router:
if (value == 0) {return 1} else {return 0}

// Switch:
if (value == 0) {return 100} else {return 0}

// Core switch:
if (value == 0) {return 100000} else {return 0}

// Firewall:
if (value == 0) {return 1000000} else {return 0}

Aggregation

Each device type will generate numbers like 1 or 100 or 100000 or 1000000.

We can have 2 options:

1) Link a trigger directly on the calculated number. If the integer number is bigger, it’s more critical to the client.

2) We can also operate with dependable items. Here is one method to transform calculated item back into dependent items:

// Routers down:
if (value == 0) {return 1} else {return 0}

// Switches down:
if (value > 99) {return value.replace(/..$/,"") % 1000} else {return 0}
// % 1000 is because the client can have 999 switches

// Core switches down:
if (value > 99999) {return value.replace(/.....$/,"") % 10} else {return 0}
// % 10 is because the client can have 9 core switches

// Firewalls down:
if (value > 999999) {return value.replace(/......$/,"") % 10} else {return 0}
// % 10 is because the client can have 9 firewalls

Detect flapping

It would be quite useful to detect when some devices are changing up/down too frequently. No elegant solution here, be we can have a workaround. We can clone all 4 items:

and, per each item, add a second preprocessing step “Discard unchanged”:

We will result up in +4 items:

One last step, to create additional +4 items to count how many metrics the “changes” item did receive. Here is a sample of one:

Macro value of {$FLAP} can be ‘1d’.

Known issues with the solution to detect flapping

If service ‘zabbix-server’ will receive a restart, It will generate “+1 flap” per each device type.

If a device in one category change state to “up” and another device in the same category in the same minute changes state to “down”. This will not be detected 🙁

How far we can go?

How many classifications we can use? The calculated item is limited to a 64-bit integer which is ‘18446744073709551615’, there are 20 digits in this number. Because it starts with a ‘1’ it means that safely we can use only 19 digits.

Proof of concept template

available

There are 6 templates included in one XML file:

To use this solution:

1) Import XML file.

2) Clone “Property” template.

3) Open a cloned “Property” template and install the correct value of macro {$PROPERTY_HOST_GROUP}. Value must be the same a the host group where all client devices are in.

4) In the same host group where all client devices are in, create a dummy host, apply the “aggregate status” template and assign “Property” template to this host.

5) Assign “binary” templates (the ones which contain a ‘1’ in a name) to devices (switches, core switches, firewalls) the client owns.

Alright, that is it for today. Bye!

Building an ARM64 Rust development environment using AWS Graviton2 and AWS CDK

Post Syndicated from Alistair McLean original https://aws.amazon.com/blogs/devops/building-an-arm64-rust-development-environment-using-aws-graviton2-and-aws-cdk/

2020 was the year that ARM chips made the headlines by moving from largely mobile form factors into the cloud thanks to AWS Graviton2, allowing you to have up to 40% better price performance over comparable current generation x86 Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances.

We speak to customers daily about Graviton2. One recurring question we hear is “Graviton2 is great, but how can my team develop for ARM natively without the complexity of cross-compilation or having to buy custom hardware on premises?” This post seeks to answer that question by setting up the Visual Studio Code-based Code Server IDE, running on a Graviton2 EC2 instance that enables native development in a cost-effective and secure manner accessed via your browser.

The Rust programming language has gained a huge amount of popularity recently. This post aims to show that you can use this environment for Rust development as well as hundreds of other supported languages. AWS has committed to supporting the Rust community and using the language to deliver fast and robust services to customers at scale, and we want to enable our customers to do the same.

We also include instructions for building and installing the rust-analyzer and CodeLLDB debugger plugins to add additional language features.

Solution overview

The following diagram illustrates our solution architecture.

Architecture of the solution showing components and their linkages

The solution consists of an EC2 Graviton2 instance located in a private VPC subnet routed through an AWS Global Accelerator accelerator to provide routing optimization and keep packet loss, jitter, and latency lower by up to 60%. An internal facing Application Load Balancer containing the AWS Certificate Manager certificate decrypts and forwards traffic to this instance.

Code Server queries AWS Secrets Manager to initially set the login password on startup and allow for continued password-based authentication and easy password rotation. The EC2 instance has access to the internet through a NAT gateway and has no public IP address or key pair associated, and is accessible only through AWS Systems Manager Session Manager.

Prerequisites

For this walkthrough, the following are prerequisites:

AWS CDK stack

In order to deploy our architecture, I use the AWS CDK. As a developer, it’s more intuitive to me to define my infrastructure using a language and tooling with which I am familiar. I can also do things like environment variable injection and scripting as part of the stack creation to add stack parameters and customization points.

The AWS CDK application is comprised of five stacks. Each stack defines a separate part of the architecture:

  • Networking – Defines a VPC across two Availability Zones with the CIDR range of your choice. The routing and public/private subnet creation is done for us as part of the default configuration.
  • Certificate – This is the reason for the domain prerequisite. It’s a best practice to encrypt web applications using TLS, and for that we need a certificate and therefore a domain. This stack creates a certificate for the subdomain you specify as part of the stack creation and DNS validation in Route 53.
  • Amazon EC2 configuration – This defines both our AMI and the instance type and configuration. In this case, we’re using Amazon Linux 2 ARM64 edition. Here we also set the instance-managed roles that allow Session Manager connectivity and Secrets Manager access.
  • ALB configuration – Here we define the internal load balancer and specify the listener, certificate, and target configuration. I have injected the Amazon EC2 configuration as part of the class constructor so that I can reference it directly as a target.
  • Global accelerator configuration – Finally, the accelerator is defined here with two ports open, the ALB we defined in the ALB stack as a target, and most importantly adds in a CNAME DNS entry pointing to the DNS name of the accelerator.

Walkthrough overview

This walkthrough uses the AWS CDK command line tools to deploy the stack. Session Manager is enabled to allow access to the EC2 instance and configure the Code Server application and associated plugins.

The walkthrough specifically covers the following steps:

  1. Deploy the AWS CDK stacks via CloudShell to build out the application infrastructure and associated IAM roles.
  2. Launch Code Server via the official Docker container with the commands to get and set the password stored in Secrets Manager.
  3. Log in and build the rust-analyzer and CodeLLDB plugins from a terminal to allow for debugging within a “Hello World” application.

Start CloudShell and install the appropriate tooling

In this section, I use dummy values for the domain, the VPC CIDR, AWS Region, and the secret password. You need to submit real values as appropriate.

Sign in to CloudShell and enter the following commands:

sudo yum groupinstall -y "Development Tools"
sudo npm install aws-cdk -g
git clone https://github.com/aws-samples/cdk-graviton2-alb-aga-route53.git
cd cdk-graviton2-alb-aga-route53
python3 -m venv .
source bin/activate
python -m pip install -r requirements.txt
export VPC_CIDR=”10.0.0.1/16” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export CDK_DEPLOY_REGION=$AWS_REGION
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk bootstrap aws://$CDK_DEPLOY_ACCOUNT/$CDK_DEPLOY_REGION
cdk deploy --all

The deploy step takes around 10-15 mins to run and prompts a couple of times to add resources like security groups and IAM roles.

Log in to the new instance using Session Manager

Install the latest version of the Session Manager plugin for the AWS CLI:

cd ~
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/linux_64bit/session-manager-plugin.rpm" -o "session-manager-plugin.rpm"
sudo yum install -y session-manager-plugin.rpm

Now start a session, logging into the newly created EC2 instance and log in as ec2-user:

aws ssm start-session --target i-1234xyz7890abc #Substitute the instance id we just created here
#Once session is active:
sudo su - ec2-user

Add the password as a secret and start the container

Enter the following code to add the password as a secret in Secrets Manager and start the container:

aws secretsmanager create-secret --name CodeServerProd --secret-string Password123abc # Substitute the appropriate password here.
sudo docker run -d --name=code-server -e PUID=1000 -e PGID=1000 -e PASSWORD=`aws secretsmanager get-secret-value --secret-id CodeServerProd | jq -r '.SecretString'` -p 8080:8080 -v /home/ec2-user/.config:/config --restart unless-stopped codercom/code-server

Access and configure the web application for Rust development

So far, we have accomplished the following:

  • Created the infrastructure in the diagram via AWS CDK deployment
  • Configured the EC2 instance to run Docker and added this to the systemctl startup scripts
  • Created a secret in Secrets Manager to use as the application login password
  • Instantiated a Docker container running Code Server

Next, we access the running container via the web interface and install the required development tools.

Log in to the Code Server web application

To log in to the Code Server web application, complete the following steps:

  1. Browse to https://code-server.example.com, where example.com is the name of the domain you supplied in the AWS CDK step.
  2. Log in using the password you created in Secrets Manager.
  3. Create a new terminal by choosing the hamburger icon and, under Terminal, choosing New Terminal.
  4. Issue the following commands into the terminal to install the Rust programming language:
bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential npm clang lldb
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Install the rust-analyzer plugin

Open the extensions panel and enter Rust Analyzer in the search bar. Then install the plugin.

Install the debugger

Go back to the extensions panel in the Code Server application and enter CodeLLDB into the search bar. Then install this extension.

Create a sample application and open it in the Code Server window

To create and use our sample application, complete the following steps:

  • In the existing Code Server terminal, enter the following:
mkdir -p ~/src/
cd ~/src
cargo new helloworld --bin
  • Open the newly created folder in Code Server verifying that the helloworld directory was successfully created.

Open File or Folder dialog in Code Server

  • Rust-analyzer runs when you open up src/main.rs and index the file.
  • You can run the program by choosing Run in the editor.

Main Code Server editor window showing helloworld Rust program code.

  • Similarly, to launch the debugger, choose Debug in the editor.

Code Server Debugger view

Troubleshooting

If the CloudShell session times out, you need to reset your environment variables in order to re-deploy, modify, and delete the stack deployment.

Clean up

This stack incurs an estimated monthly cost of $143.00.

To delete the stack, log in to CloudShell and enter the following commands:

cd cdk-graviton2-alb-aga-route53
source bin/activate

# Re-set the environment variables again if required
export VPC_CIDR=”10.0.0.1/16” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export CDK_DEPLOY_REGION=$AWS_REGION
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk destroy --all

This destroys all the resources created in the first step. You can verify this by browsing to the AWS CloudFormation console and noting the deletion of all the stacks.

Conclusion

AWS is a place where builders can reinvent the future. The future of development means supporting different chipsets depending on different business requirements. This post is designed to enable development targeting the ARM64 microarchitecture by utilizing AWS Graviton2. Happy building!

Author bio

Author portrait

Alistair is a Principal Solutions Architect at AWS focused on EdTech customers. Originally from the west coast of Scotland, Alistair now lives in Fairfield, Connecticut, with his wife and two daughters and enjoys spending time with his family, skiing, golfing, cycling, and using his pellet smoker.

Zabbix proxy performance tuning and troubleshooting

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/zabbix-proxy-performance-tuning-and-troubleshooting/14013/

Most Zabbix users use proxies, and those running medium to large instances might have encountered some performance issues. From this post and the video, you will learn more about the most common troubleshooting steps to resolve any proxy issues and to detect them as sometimes you might be unaware of an ongoing issue, as well as basic performance tuning to prevent such issues in the future.

Contents

I. Zabbix proxy (1:36)
II. Proxy performance issues (5:35)
III. Selecting and tuning the DB backend (13:27)
VI. General performance tuning (16:59)
V. Proxy network connectivity troubleshooting (20:43)

Zabbix proxy

Zabbix proxy can be deployed and most of the time is used to monitor distributed IT infrastructures, for instance, on a remote location to prevent data loss in case of network outages as the proxy collects the data locally and it is then pushed/pulled to/from Zabbix server.

Zabbix proxy supports active and passive modes, so we can push the data to the Zabbix server or have the Zabbix server pull the data from the proxy. Even if we don’t have any remote locations and have a single data center, it is still a good practice to delegate most of your data collection to a proxy running next to your server, especially in medium-sized and large instances. This allows for offloading our data collection and preprocessing performance overhead from the server to the proxy.

Active vs. passive

Whether an active or a passive mode is better for your company at the end of the day will depend on your security policies. We can use passive mode with the server pulling the data from the proxy or active mode with the proxy establishing the connection to the Zabbix server and pushing the data.

  • Active mode is the default configuration parameter as it is a bit more simple to configure — almost all of the configuration can be done only on the proxy side. Then, we need to add the proxy on the frontend and we’re good to go.
### Option: ProxyMode
#   Proxy operating mode.
#   0 - proxy in the active mode
#   1 - proxy in the passive mode
#
# Mandatory: no
# Default:
# ProxyMode=0
  • In the case of a passive proxy, we have to make some changes in the Zabbix server configuration file, which would involve a restart of the Zabbix server and, as a consequence, downtime.

Finally, it is all going to boil down to our networking team and the network and security policies, for instance, allowing for passive or active mode only. If both modes are supported, then the active mode is a bit more elegant.

Proxy versions

Another common question is about the proxy version to install and the database backend to use.

  • The main point here is that the major proxy version  should match the major version of the Zabbix server, while minor versions can differ. For instance, Proxy 5.0.4 can be used with Server 5.0.3 and Web 5.0.9 (in this example, the first and the second number should match). Otherwise, the proxy won’t be able to send the data to the server and you will see some error messages in your log files about version mismatch and data formatting not fitting your server requirements.
  • Proxies support: SQLite / MySQL/ PostgreSQL/ Oracle backends. To install the proxy, we need to select the proper package for either SQLite3, MySQL, PostgreSQL, or just compile proxy with Oracle database backend support.

— SQLite proxy package:

# yum install zabbix-proxy-sqlite3

— MySQL proxy package:

# yum install zabbix-proxy-mysql

— PostgreSQL proxy package:

# yum install zabbix-proxy-pgsql

For instance, if we do # yum install zabbix-proxy-sqlite3 or copy and paste the instructions from the Zabbix website for SQLite, we will later wonder why it is not working for MySQL as there are some unique dependencies for each of these packages.

NOTE. Don’t forget to select the proper package in relation to the proxy DB backend

Proxy performance issues

After we have installed everything and covered the basics of what needs to be done and how to set things up, we can start tuning or proxy and try to detect any potential  performance issues.

Detecting proxy performance issues

How can we find out what the root cause of performance issues is or if we are having them at all?

  1. First, we need to make sure that we are actually monitoring our proxy. So, we need to:
  • Create a host in Zabbix,
  • Assign this host to be monitored by the proxy. If the host is monitored by the server, it will report the wrong metrics — the Zabbix server metrics, not the Zabbix proxy metrics.

So, we need to create a host and configure it to be monitored by the proxy itself. Then we can use an out-of-the-box proxy monitoring template — Template Apps Zabbix Proxy.

Template App Zabbix Proxy

NOTE. Template Apps Zabbix Proxy gets updated on the git.zabbix page, when we add new components to Zabbix, new internal processes, new gathering processes, and so on, to support these new components.

If you are running an older version of Zabbix, for example, all the way back from version 2.0, make sure that you download the newer template from our git page not to stay in the blind about the newer internal component performance.

Once we have applied the template, we will see performance graphs with information about gathering processes, internal processes, cache usage, and proxy performance, and both the queue and the new values received per second. So, we can actually react to predefined triggers provided by the template, if there is an issue.

Performance graphs

  1. Then, we need to have a look at the administration queue. A large or growing proxy-specific queue can be a sign of performance issues or a misconfiguration. We might have failed to allow our agents to communicate with our proxies or we might have some network issue on our proxy preventing us from collecting data from the proxy.

An issue on the proxy

In this case:

  • Check the proxy status, graphs, and log files. In the example above, the proxy has been down for over a year, so it should be decommissioned and removed from the Zabbix environment.
  • Check the agent logs for issues related to connecting to the proxy. For instance, the proxy might be trying to pull the data but have no rights to do so due to no permissions in the agent configuration file.

Lack of server resources

In some cases, we might simply try to monitor way too much on a really small server, for instance, an older version of a Raspberry Pi device. So, we should use tools such as sar or top to identify resource bottlenecks on the proxy server  coming, for example, from the storage performance.

sar -wdp 3 5 > disk.perf.txt

sar is a part of a sysstat package, and this command can provide us with information about our storage performance, serialization, wait times, queues, input/output operations per second, and so on. sar can tell us when something might be overloaded especially if we have longer wait times.

NOTE. Don’t get confused by high %util, which is relevant on hard drives, but on an SSD or a RAID setup the utilization is normally very high. While hard drives can handle only one operation at a time, SSD disks or RAID setups support parallel operations. This can cause SSD or RAID util% to skyrocket, which might not necessarily be a sign of an issue.

Proxy queue

Another useful, though a bit hackish, indicator of the proxy performance is the proxy queue on the proxy database — the count of the metrics pending but not yet sent to the server.

  • We can observe this in real-time by queueing the proxy DB.
  • A constantly growing number means that we cannot catch up with our backlog — the network is down or there are some performance issues on the server or the proxy, so more data is getting backlogged than sent.
  • The list of unsent metrics is stored in proxy_history table.
  • The last sent metric is marked in the IDs table.
select count(*) from proxy_history where id>(select nextid from ids where
table_name="proxy_history");

This value will keep growing if the proxy is unable to send the data at all or due to performance issues. If the network is down, this is to be expected between the proxy and the server. However, if everything is working but the count still keeps growing, we need to investigate for any spamming items, thousands of log lines coming per second, or other performance issues with our storage and/or our database. There might be performance problems on the server due to the server being unable to ingest all of this data in time after a restart, a long downtime, etc. Such a problem should get resolved over time on its own. Otherwise, if there are no significant factors regarding the performance or any recent changes, we need to investigate deeper.

If this value is steadily decreasing, the proxy is actually catching up with the backlog and the incoming data, and is sending data to the server faster than it is collecting new metrics. So, this backlog will get resolved over time.

Configuration frequency

Don’t forget about the configuration frequency. Any configuration changes will be applied on the proxy after ConfigFrequency interval. By default, these changes get applied once an hour, so ConfigFrequency is 3600.

### Option: ConfigFrequency
#   How often proxy retrieves configuration data from Zabbix Server in seconds.
#   For a proxy in the passive mode this parameter will be ignored.
#
# Mandatory: no
# Range: 1-3600*24*7
# Default:
# ConfigFrequency=3600

On active proxies, we can force configuration cache reload by executing config_cache_reload for Zabbix proxy.

#zabbix_proxy -R config_cache_reload
#zabbix_proxy [1972]: command sent successfully

This is another good reason to use active proxies to pick up all of the configuration changes from the server. However, on passive proxies, the only thing we can do is a proxy restart to force a reload of the configuration changes, which is not a good idea. Otherwise, we have to wait for an hour or some other configuration interval until the changes are picked up by the proxy.

Selecting and tuning the DB backend

The next important step is a selection of the database.

SQLite

A common question, which has no clear answer is when to use SQLite and when should we switch to a more robust DB backend.

  • SQLite is perfect for small instances as it supports embedded hardware. So, if I were to run a proxy on Raspberry or an older desktop machine, I might use SQLite. Even embedded hardware aside, on smaller instances with fewer than 1,000 new values per second, SQLite backend should feel quite comfortable, though a lot will depend on the underlying hardware.
### Option: ConfigFrequency
#   How often proxy retrieves configuration data from Zabbix Server in seconds.
#   For a proxy in the passive mode this parameter will be ignored.
#
# Mandatory: no
# Range: 1-3600*24*7
# Default:
# ConfigFrequency=3600
  • So, in most cases, when proxies collect less than 1,000 NVPS per second, SQLite proxy DB backends are sufficient. With SQLite, you don’t need to additionally configure the database.
#zabbix_proxy -R config_cache_reload
#zabbix_proxy [1972]: command sent successfully
  • With SQLite, there’s no need to have additional database configuration, preparation, or tuning. In the proxy configuration file, we just point at the location of the SQLite file.
  • A single file is created at the proxy startup, which can be deleted if data cleanup is necessary.
### Option: DBName
#   Database name.
#   For SQLite3 path to database file must be provided. DBUser and DBPassword are ignored.
#   Warning: do not attempt to use the same database Zabbix server is using.
#
# Mandatory: yes
# Default:
# DBName=
DBName=/tmp/zabbix_proxy

All in all the SQLite backend comparatively easy to manage However, it comes with a set of negatives. If we need something more robust that we can tune and tweak, then SQLite won’t do. Essentially, if we reach over 1,000 new values per second, I would consider deploying something more robust — MySQL, PostgreSQL, or Oracle.

Other proxy DB backends

  • Any of the supported DB backends can be used for a proxy. In addition, the Zabbix server and Zabbix proxy can use different DB backends. The DB configuration parameters are very similar in Zabbix server and Zabbix proxy configuration files, so users should feel right at home with configuring the proxy DB backend.
  • DB and DB user should be created beforehand with the proper collation and permissions.
shell> mysql -uroot -p<password>
mysql> create database zabbix_proxy character set utf8 collate utf8_bin; 
mysql> create user 'zabbix'@'localhost' identified by '<password>'; 
mysql> grant all privileges on zabbix_proxy.* to 'zabbix'@'localhost'; 
mysql> quit;
  • DB schema import is also a prerequisite. The command for proxy schema import is very similar to the server import.
zcat /usr/share/doc/zabbix-proxy-mysql*/schema.sql.gz | mysql -uzabbix -p zabbix_proxy

DB Tuning

  • Make sure to use the DB backend you are most familiar with.
  • The same tuning rules apply to the Zabbix proxy DB as to the Zabbix server DB.
  • Default configuration parameters of the backend will depend on the version of the backend used. For instance, different MySQL versions will have different default parameters, so we need to have a look at MySQL documentation, the default parameters, and the way to tune them.
  • For PostgreSQL, it is possible to use the online tuner — PGTune. Though it is not an ideal instrument, it is a good starting point not to leave the proxy hanging without any tuning as we might encounter issues sooner rather than later. With tuning, the database will be more robust and will last longer before we will have to add any resources and rescale the database config.

PGTune

General performance tuning

Proxy configuration tuning

Database aside, how we can tune the proxy itself?

Proxy configuration is similar to the configuration of the Zabbix server: we still need to take into account and tune our gathering processes, internal processes, such as preprocessors, and our cache sizes. So, we need to have a look at our gathering graphs, internal process graphs, and our cache graphs to see how busy the processes and how full the graphs are and adjust accordingly. This is a lot easier to do on the proxy than on the server since proxy restart is usually quicker and a lot less critical, and less impactful than the server downtime.

In addition, these will differ on each of the proxy servers depending on the proxy size and types of items. For instance, if on proxy A we are capturing SNMP traps, we need to enable the SNMP trapper process and configure our trap handler — Perl, snmptrapd, etc.  If we are doing a lot of ICMP pings for another proxy, we’ll require many ICMP pingers. A really large proxy will need to have its History Syncers increased. So, each proxy will be different, and there is no one-fit-all configuration.

  • Since most of the time proxies handle fewer values since they are distributed and scaled out, we will have a lesser number of History Syncers on proxies in comparison with Zabbix server. In the vast majority of cases, the default number of History Syncers is more than sufficient. Though sometimes we might need to change the count of History Syncers on the proxy.
### Option: StartDBSyncers
#             Number of pre-forked instances of DB Syncers.
#
# Mandatory: no
# Range: 1-100
# Default:
# StartDBSyncers=4

There are always exceptions to the rule. For instance, we might want to have a single large-scale and robust proxy collecting the data from some very critical or very large location with many data points – such an infrastructure layout will still be supported.

  • If DB syncers do underperform on a seemingly small instance, chances are it is due to lack of hardware resources or, for SQLite, DB backend limitations

We need to monitor the resource usage via sar, or top, or any other tool to make sure that hardware resources aren’t overloaded.

Proxy data buffers

We also have the option to store the data on our proxies if the server is offline or store them even if the Zabbix server is reachable and the data has been sent to the server. we may want to keep our data in the proxy database and utilize it by other third-party tools or integrations.

On our proxies, we have a local buffer and an offline buffer, which determine for how long we can store the data. The size of Local and Offline buffers will affect the size and the performance of your database. The larger the time window for which we store the data, the larger the database is. So, the fewer resources we utilize, the better the performance is, the easier it is to scale up, etc.

  • Local buffer
### Option: ProxyLocalBuffer
#   Proxy will keep data locally for N hours, even if the data have already been synced with the server.
#
# Mandatory: no
# Range: 0-720
# Default:
# ProxyLocalBuffer=0
  • Offline buffer
### Option: ProxyOfflineBuffer
#   Proxy will keep data for N hours in case if no connectivity with Zabbix Server.
#   Older data will be lost.
#
# Mandatory: no
# Range: 1-720
# Default:
# ProxyOfflineBuffer=1

Proxy network connectivity troubleshooting

Detecting network issues

Sometimes we have network issues between proxies and the server: either the server cannot talk to proxies or proxies cannot talk to the server.

  • A good first step would be to test telnet connectivity to/from a proxy.
time telnet 192.168.1.101 10051
  • Another great method is to time your pings to see how long pinging takes or how long it takes to establish a telnet connection. This could point you towards network latency issues: slow networks, network outages, and so on.
  • Log file can help you figure out proxy connectivity issues.
125209:20210214:073505.803 cannot send proxy data to server at "192.168.1.101": ZBX_TCP_WRITE() timed out
  • Load balancers, Traffic inspectors, and other IDS/Firewall tools can hinder proxy traffic. Sometimes it can take hours troubleshooting an issue to find out that it boils down to a load balancer, a traffic inspector, or IDS/firewall tool.

Troubleshooting network issues

  • A great way to troubleshoot this would be to deploy a test proxy with a different firewall/load balancing configuration. From time to time, network connectivity drops seemingly for no reason. We can bring up another proxy with no load balancers or no traffic inspectors, and ideally, in the same network as the problematic proxies. We need to find out if the new proxy is experiencing the same problems, or if the issue is resolved after we remove the load balancers, IDS/firewall tools. If the problem gets resolved, then this might be a case of misconfigured firewall/IDS.
  • Another great approach of detecting networking issues due to transport problems, for instance, IDS/Firewalls cutting up our packets, is to perform a tcpdump on proxy and server to correlate network traffic with error messages in the log.

tcpdump on the proxy:

tcpdump -ni any host -w /tmp/proxytoserver

tcpdump on the server:

tcpdump -ni any host -w /tmp/servertoproxy

— Correlating retransmissions with errors in logs could signify a network issue.

Many retransmissions may be a sign of network issues. If there are a few of them, if we open Wireshark to find just a couple of retransmits, it might not be the root cause. However, if the majority of our packet capture result is read with duplicate packets, retransmits, acknowledges without data packets being received, etc., that can be a sign of an ongoing network issue.

Ideally, we could take a look at this packet capture and correlate it with our proxy log file to figure out if these error messages in our proxy logfile or server logfile (depending on the type of communication — active or passive) correlate with packet capture issues. If so, we can be quite sure that the networking issue is at fault and then we need to figure out what is causing it — IDS, load balancers, a shoddy network, or anything else.

Integrate GitHub monorepo with AWS CodePipeline to run project-specific CI/CD pipelines

Post Syndicated from Vivek Kumar original https://aws.amazon.com/blogs/devops/integrate-github-monorepo-with-aws-codepipeline-to-run-project-specific-ci-cd-pipelines/

AWS CodePipeline is a continuous delivery service that enables you to model, visualize, and automate the steps required to release your software. With CodePipeline, you model the full release process for building your code, deploying to pre-production environments, testing your application, and releasing it to production. CodePipeline then builds, tests, and deploys your application according to the defined workflow either in manual mode or automatically every time a code change occurs. A lot of organizations use GitHub as their source code repository. Some organizations choose to embed multiple applications or services in a single GitHub repository separated by folders. This method of organizing your source code in a repository is called a monorepo.

This post demonstrates how to customize GitHub events that invoke a monorepo service-specific pipeline by reading the GitHub event payload using AWS Lambda.

 

Solution overview

With the default setup in CodePipeline, a release pipeline is invoked whenever a change in the source code repository is detected. When using GitHub as the source for a pipeline, CodePipeline uses a webhook to detect changes in a remote branch and starts the pipeline. When using a monorepo style project with GitHub, it doesn’t matter which folder in the repository you change the code, CodePipeline gets an event at the repository level. If you have a continuous integration and continuous deployment (CI/CD) pipeline for each of the applications and services in a repository, all pipelines detect the change in any of the folders every time. The following diagram illustrates this scenario.

 

GitHub monorepo folder structure

 

This post demonstrates how to customize GitHub events that invoke a monorepo service-specific pipeline by reading the GitHub event payload using Lambda. This solution has the following benefits:

  • Add customizations to start pipelines based on external factors – You can use custom code to evaluate whether a pipeline should be triggered. This allows for further customization beyond polling a source repository or relying on a push event. For example, you can create custom logic to automatically reschedule deployments on holidays to the next available workday.
  • Have multiple pipelines with a single source – You can trigger selected pipelines when multiple pipelines are listening to a single GitHub repository. This lets you group small and highly related but independently shipped artifacts such as small microservices without creating thousands of GitHub repos.
  • Avoid reacting to unimportant files – You can avoid triggering a pipeline when changing files that don’t affect the application functionality (such as documentation, readme, PDF, and .gitignore files).

In this post, we’re not debating the advantages or disadvantages of a monorepo versus a single repo, or when to create monorepos or single repos for each application or project.

 

Sample architecture

This post focuses on controlling running pipelines in CodePipeline. CodePipeline can have multiple stages like test, approval, and deploy. Our sample architecture considers a simple pipeline with two stages: source and build.

 

Github monorepo - CodePipeline Sample Architecture

This solution is made up of following parts:

  • An Amazon API Gateway endpoint (3) is backed by a Lambda function (5) to receive and authenticate GitHub webhook push events (2)
  • The same function evaluates incoming GitHub push events and starts the pipeline on a match
  • An Amazon Simple Storage Service (Amazon S3) bucket (4) stores the CodePipeline-specific configuration files
  • The pipeline contains a build stage with AWS CodeBuild

 

Normally, after you create a CI/CD pipeline, it automatically triggers a pipeline to release the latest version of your source code. From then on, every time you make a change in your source code, the pipeline is triggered. You can also manually run the last revision through a pipeline by choosing Release change on the CodePipeline console. This architecture uses the manual mode to run the pipeline. GitHub push events and branch changes are evaluated by the Lambda function to avoid commits that change unimportant files from starting the pipeline.

 

Creating an API Gateway endpoint

We need a single API Gateway endpoint backed by a Lambda function with the responsibility of authenticating and validating incoming requests from GitHub. You can authenticate requests using HMAC security or GitHub Apps. API Gateway only needs one POST method to consume GitHub push events, as shown in the following screenshot.

 

Creating an API Gateway endpoint

 

Creating the Lambda function

This Lambda function is responsible for authenticating and evaluating the GitHub events. As part of the evaluation process, the function can parse through the GitHub events payload, determine which files are changed, added, or deleted, and perform the appropriate action:

  • Start a single pipeline, depending on which folder is changed in GitHub
  • Start multiple pipelines
  • Ignore the changes if non-relevant files are changed

You can store the project configuration details in Amazon S3. Lambda can read this configuration to decide what needs to be done when a particular folder is matched from a GitHub event. The following code is an example configuration:

{

    "GitHubRepo": "SampleRepo",

    "GitHubBranch": "main",

    "ChangeMatchExpressions": "ProjectA/.*",

    "IgnoreFiles": "*.pdf;*.md",

    "CodePipelineName": "ProjectA - CodePipeline"

}

For more complex use cases, you can store the configuration file in Amazon DynamoDB.

The following is the sample Lambda function code in Python 3.7 using Boto3:

def lambda_handler(event, context):

    import json
    modifiedFiles = event["commits"][0]["modified"]
    #full path
    for filePath in modifiedFiles:
        # Extract folder name
        folderName = (filePath[:filePath.find("/")])
        break

    #start the pipeline
    if len(folderName)>0:
        # Codepipeline name is foldername-job. 
        # We can read the configuration from S3 as well. 
        returnCode = start_code_pipeline(folderName + '-job')

    return {
        'statusCode': 200,
        'body': json.dumps('Modified project in repo:' + folderName)
    }
    

def start_code_pipeline(pipelineName):
    client = codepipeline_client()
    response = client.start_pipeline_execution(name=pipelineName)
    return True

cpclient = None
def codepipeline_client():
    import boto3
    global cpclient
    if not cpclient:
        cpclient = boto3.client('codepipeline')
    return cpclient
   

Creating a GitHub webhook

GitHub provides webhooks to allow external services to be notified on certain events. For this use case, we create a webhook for a push event. This generates a POST request to the URL (API Gateway URL) specified for any files committed and pushed to the repository. The following screenshot shows our webhook configuration.

Creating a GitHub webhook2

Conclusion

In our sample architecture, two pipelines monitor the same GitHub source code repository. A Lambda function decides which pipeline to run based on the GitHub events. The same function can have logic to ignore unimportant files, for example any readme or PDF files.

Using API Gateway, Lambda, and Amazon S3 in combination serves as a general example to introduce custom logic to invoke pipelines. You can expand this solution for increasingly complex processing logic.

 

About the Author

Vivek Kumar

Vivek is a Solutions Architect at AWS based out of New York. He works with customers providing technical assistance and architectural guidance on various AWS services. He brings more than 23 years of experience in software engineering and architecture roles for various large-scale enterprises.

 

 

Gaurav-Sharma

Gaurav is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.

 

 

 

Nitin-Aggarwal

Nitin is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.

 

 

 

 

Out-of-the-box database monitoring

Post Syndicated from Renats Valiahmetovs original https://blog.zabbix.com/out-of-the-box-database-monitoring/13957/

From this post and the video, you’ll learn about the possibilities of database monitoring using out-of-the-box Zabbix functionality without having to install additional tools, additional applications, or additional software that might not be allowed by your company.

Contents

I. Classic ODBC monitoring (0:22)

II. Synthetic MySQL monitoring (11:13)
III. DB monitoring with Zabbix Agent 2 (13:48)

IV. LLD for DB monitoring (17:03)

V. Questions & Answers (21:09)

Classic ODBC monitoring

What is ODBC?

ODBC stands for open database connectivity. There are a couple of ODBC drivers available for different database management systems (DBMS):

    • Oracle,
    • PostgreSQL,
    • MySQL,
    • Microsoft SQL Server,
    • Sybase ASE,
    • SAP HANA,
    • DB2.

All of these databases have different ODBCs specifically tailored for them. They offer slightly different functionality. So, even if you have set up the database monitoring for one database it might not necessarily work just as good for the other, as the functionality used to monitor one database might not exist for the other. In addition, as different technologies have different capabilities, most ODBC drivers do not implement all functionality defined in the ODBC standard.

What to monitor?

When we are planning to use ODBC for monitoring, what kind of data we can expect to receive? The answer ultimately depends on your own preferences, needs, or your proficiency in a specific database. You can monitor any possible database performance metrics and incidents using Zabbix templates.

Generally, monitoring of the following areas is of interest:

    • database performance
    • engine availability
    • configuration changes that you need to be aware of

To make the process easier, we provide ready-to-use templates, which can be applied to a host where your database is deployed. You can browse a full list of available metrics in these templates’ descriptions. So, you don’t have to perform configuration completely from scratch, which is good news.

How does it work?

Without diving too deep into the transport layer and all of the technical details, the ODBC driver accesses the database over the network using the database API. So, there is no direct connection between Zabbix and the database. Zabbix only creates a query passed to the ODBC manager for processing, which then moves the request over to the ODBC driver that connects to the database management system and then executes the query. Here, Zabbix does not limit the query execution timeout, and the timeout parameter is used as the ODBC login timeout.

Chain of processes

ODBC configuration is based on two files:

  • odbc.ini — holds a list of installed ODBC database drivers, which are used for specific communication.
  • odbcinst.ini — holds the definitions of data sources so that we know to which database we are going to connect.

Where to start?

What do we need to do in order to start using this ODBC monitoring approach?

  1. First, we will need to install the ODBC driver relevant to the database we are going to monitor. A simple yum command will suffice if we’re working with CentOS.
# yum -y install unixODBC unixODBC-devel
  1. Then we need to specify the package (driver) we want to install and modify the ODBC driver files.
  • odbc.ini:
[[email protected] ~]# cat /etc/odbc.ini
[MySQL]
Description=NewDatabase
Driver=MariaDB
Server=localhost
User=root
Password=VerySecurePassword
Port=3306
Database=DatabaseName
  • odbcinst.ini:
[[email protected] ~]# cat /etc/odbcinst.ini
[MySQL]
Description=ODBC for MySQL
Driver=/usr/lib/libmyodbc5.so
Setup=/usr/lib/libodbcmyS.so
Driver64=/usr/lib64/libmyodbc8a.so
Setup64=/usr/lib64/libmyodbc8a.so
FileUsage=1

Then we need to populate them with the necessary information. So, in this case, DSN (data source name) is used to call a specific connection. We need to get this part correctly, otherwise, the connection will not work out, for instance, in case of a typo.

  1. After we have installed the ODBC driver and configured the configuration files, we don’t really need to go ahead into Zabbix to create a new item and see if it works. We can test the ODBC configuration using isql to connect or at least attempt to connect to a particular database using the specified configuration.

Using isql to test ODBC configuration

If we receive an output that you have been connected then the communication is correct. You can also execute a sort of query, for instance, select some information from the database. If you get the result, then you do have the necessary permissions to access that data, and the connection, that is the ODBC driver, is working fine. Then you can proceed to the frontend.

  1. In the frontend, we will need to create an item of the ‘Database monitor’ type on a particular host or a template and specify one of the two keys available for ODBC monitoring: db.odbc.select or db.odbc.get.

Creating ‘Database monitor’ item

The difference between these item keys is pretty simple — select will return only one value and get will return values in bulk. So, get is more efficient and allows for reducing the load on the database if we are working with a lot of data. Within the key parameters, we need to specify the same DSN that we have defined in our odbc.ini file.

We need to make sure that the first parameter is unique so that this particular item key is unique and does not duplicate anything else, and the second parameter is the DSN.

  1. After we have specified everything, we specify the query, which is a part of the item configuration.
  2. We test the item using the test form in the Zabbix frontend. If the test form returns a value or does not return an error message, then everything is fine and we can proceed with this item or create more items.

Testing the item

ODBC templates

  1. There are a couple of built-in templates. If the metrics obtained through these templates are sufficient, we obviously don’t need to create these items from scratch or configure them. We can simply assign the templates we need to the host, on which we are monitoring the database. All we need to do is to tweak a little, if necessary, modify the macro related to the DSN, and then start monitoring.

Assigning a template

NOTE. The easiest way to get the templates is to upgrade to the latest Zabbix with our official templates already built in. If you don’t have the needed templates for any reason, you can download them from Zabbix official repository or Zabbix integrations. If you still need a specific template, you can definitely check out the community-created templates.

  1. Finally, we can execute discovery rules:

and check the Latest data:

Synthetic MySQL monitoring

Synthetic MySQL monitoring approach is using capabilities of the Zabbix Agent. Though that is not something that Zabbix Agent is doing out of the box, still we don’t need to install anything or perform some super difficult manipulations to make it work as it is a part of Zabbix functionality.

As you might already know, the Zabbix Agent functionality can be extended using custom UserParameters and then used for database monitoring.

  1. So, we can create new UserParameters, which invoke native MySQL administration client commands providing output, which can then be used to calculate performance metrics.
UserParameter=mysql.ping[*], mysqladmin -h"$1" -P"$2" ping
UserParameter=mysql.get_status_variables[*], mysql -h"$1" -P"$2" -sNX -e "show
global status"
UserParameter=mysql.version[*], mysqladmin -s -h"$1" -P"$2" version
UserParameter=mysql.db.discovery[*], mysql -h"$1" -P"$2" -sN -e "show
databases"
  1. It is a good practice to test the commands themselves to make sure that they work and to test the UserParameter keys, for instance using the zabbix_get utility.
  2. Then you might want to use our official MySQL monitoring template by creating an additional file .my.cnf under /var/lib/zabbix (default location) as follows:
[client] 
user='zbx_monitor' 
password='<password>'
  1. Then we need to provide credentials for the user to confirm that the user has the necessary permissions to access the database.
  2. If everything is working, assign MySQL by Zabbix agent template.

In this case, we are not actually logging in to the database. We execute commands from the terminal by using Zabbix Agent and extending the functionality beyond the built-in functions.

DB monitoring with Zabbix Agent 2

Why Zabbix Agent 2?

What are the benefits of Zabbix Agent 2 in relation to database monitoring?

  • Zabbix Agent 2 is the improved version of our original Zabbix Agent, which is now written in Go.
  • Zabbix Agent 2 is more efficient and supports some new functions that Zabbix Agent 1 does not, for instance, custom intervals with active checks as Zabbix Agent 2 is using the Scheduler plugin and is capable of keeping track of time when certain checks need to be executed;
  • Older configuration is also supported. So, if we switch from Zabbix Agent 1 to Zabbix Agent 2, we do not need to rewrite the whole configuration file in order for Zabbix Agent 2 to work.
  • Zabbix Agent 2 is installed simply with one-line command just like Zabbix Agent 1, we need just to specify a different package.
# yum -y install zabbix-agent2
  • Zabbix Agent 2 is based on plugins, so you do not need to install it with ODBC drivers, as plugins do the work, or anything extra as Zabbix Agent 2 has out-of-the-box database-specific plugins to monitor your database, including MySQL, Oracle, and PostgreSQL.
  • Plugins are also written in Go.
  • We have created Zabbix Agent 2-specific templates, which we can assign to the host. So, if you decide to use Zabbix Agent 2, you need to perform even fewer manipulations in order to get your database monitored by Zabbix.

Built-in Zabbix Agent 2 templates

Configuration

The configuration is very simple. We need to decide whether we specify the necessary parameters within the item keys or, if we prefer named sessions, we edit the configuration file of Zabbix Agent 2 to define those and use the session name as the first parameter of the key.

  1. So, we specify the key according to the documentation page. In the first case, we can specify essentially the location of our database and provide the credentials.

In the second case, we simply need to provide the DSN in order to connect to the database using Zabbix Agent 2 built-in plugins.

Plugins.Mysql.Sessions.Prod.Uri=tcp://192.168.1.1:3306

Plugins.Mysql.Sessions.Prod.User=<UserForProd>

Plugins.Mysql.Sessions.Prod.Password=<PasswordForProd>
  1. After we have created these items or applied a template, we can definitely test them out and see whether they are working fine.

NOTE. Check available MySQL-related item keys documentation page.

LLD for DB monitoring

Why LLD?

Finally, you can definitely use low-level discovery for database monitoring. LLD is a very efficient and powerful tool within Zabbix. You can definitely use either built-in discovery keys, which utilize Zabbix Agent, or other sources such as custom scripts to pass the payload to your low-level discovery rule.

LLD:

    • Automatically creates items, triggers, and graphs from different entities on a host.
    • Parses data received in Zabbix-specific JSON format.
    • Different sources for LLD can be used, such as:
      • Built-in discovery keys,
      • Dependent on a built-in item key,
      • Dependent on a custom script/custom UserParameter.

Here we have a script providing our JSON-formatted payload, which is sent by the Data sender Zabbix utility to the Master trapper item within our Zabbix instance, while our LLD rule depends on this particular Master trapper item.

So, we just populate this trapper item with the JSON payload, LLD rule creates new entities based on the prototypes, and then the items created by those prototypes are collecting the data from that master trapper item each time a new payload comes in.

How to configure custom LLD?

In general, to create LLD from scratch:

  1. First, you will need to decide on the actual payload delivery method (Zabbix Agent, script, Zabbix sender, or UserParameter).
  2. Make sure that your payload is in JSON that is structurally sound so that Zabbix can accept and parse it.
[{"{#DATABASE}":"information_schema"},{"{#DATABASE}":"mysql"},{"{#DATABASE}":"p erformance_schema"},{"{#DATABASE}":"sys"},{"{#DATABASE}":"zabbix"}]
  1. Create LLD rule with type according to delivery method.
  2. Test the rule (if available for passive checks) to see JSON you receive.
  3. Create filters or overrides, if necessary.
  4. Create prototypes, based on which your entities will be created.

If we don’t want to create LLD rules from scratch, we can definitely modify the built-in templates without wasting time creating custom LLD rules:

    • Modify/create new entities;
    • Clone the templates;
    • Refer to templated discovery rule configuration.

Modifying LLD rules of official templates

Questions & Answers

Question. Can we monitor the database using active checks or passive checks?

Answer. As I have mentioned, everything depends on your preferences and, ultimately, on the way you want to pass this output to Zabbix Server. If we’re talking about active checks, you can utilize Zabbix sender, for instance. So, it will be a trapper item on the Zabbix Server side waiting for data. In case of passive checks, we can use Zabbix Agent. So, we can use both types of checks for database monitoring.

Question. Can we establish a secure connection between the ODBC gateway and the database, which is somewhere on a distant machine?

Answer. Yes, this can be done though it does require a little bit of finesse. It is an extensive topic, and the security of the connection is highly dependent on the driver, which should support a secure connection. Some older databases might not have this functionality.

Question. Are ODBC checks influencing the performance of the master server?

Answer. It depends on what kind of data you are collecting. If you have a lot of items utilizing db.odbc.get item key, which retrieves just one value from the database, this might impact your database performance. You might not notice this impact if your hardware is powerful enough. However, it is advisable to use the odbc.select key in order to collect this information in bulk. Otherwise, you might be locking up some entries within your database that could potentially lead to problems.

Question. So, we provide two solutions with one of them using ODBC agentless checks ODBC. In addition, we have the agent tool. Will you briefly describe the advantages of ODBC and Agent checks?

Answer. If we’re talking about the ODBC database monitoring method, the most obvious difference is that you don’t need to install an agent. From the data collection perspective, there is not much difference. Everything depends on your specific needs.

 

MySQL performance tuning 101 for Zabbix

Post Syndicated from Vittorio Cioe original https://blog.zabbix.com/mysql-performance-tuning-101-for-zabbix/13899/

In this post and the video, you will learn about a proper approach to getting the most out of Zabbix and optimizing the underlying MySQL Database configuration to improve performance while working with a database-intensive application such as Zabbix.

Contents

I. Zabbix and MySQL (1:12)
II. Optimizing MySQL for Zabbix (2:09)

III. Conclusion (15:43)

Zabbix and MySQL

Zabbix and MySQL love each other. Half of the Zabbix installations are running on MySQL. However, Zabbix is quite a write-intensive application, so we need to optimize the database configuration and usage to work smoothly with Zabbix that reads the database and writes to the database a lot.

Optimizing MySQL for Zabbix

Balancing the load on several disks

So, how can we optimize MySQL configuration to work with Zabbix? First of all, it is very important to balance the load on several hard drives by using:

    • datadir to specify the default location, that is to dedicate the hard drives to the data directory;
    • datadir innodb_data_file_path to define size, and attributes of InnoDB system tablespace data files;
    • innodb_undo_directory to specify the path to the InnoDB undo tablespaces;
    • innodb_log_group_home_dir to specify the path to the InnoDB redo log files;
    • log-bin to enable binary logging and set path/file name prefix (dual functionality); and
    • tmpdir (Random, SSD, tmpfs).

The key here is to split the load as much as possible across different hard drives in order to avoid different operations fighting for resources.

Viewing your MySQL configuration

Now, we can jump straight to MySQL configuration. It is important to start from your current configuration and check who and when has changed this configuration.

SELECT t1.*, VARIABLE_VALUE FROM performance_schema.variables_info t1 JOIN
performance_schema.global_variables t2 ON t2.VARIABLE_NAME=t1.VARIABLE_NAME WHERE
t1.VARIABLE_SOURCE not like "COMPILED"

This query can help you to understand who has changed the configuration. However, when the configuration is changing is also important to keep track of these changes.

Viewing MySQL configuration

MySQL key variables to optimize in your configuration

InnoDB buffer pool

The king of all of the variables to be optimized is InnoDB buffer pool, which is the main parameter determining the memory for storing the DB pages — MySQL buffer pool — an area in main memory MySQL where InnoDB caches table and index data as it is accessed.

  • InnoDB default value is to log, for production 50-75% of available memory on the dedicated database server.
  • Since MySQL 5.7, innodb_buffer_pool_size can be changed dynamically.

Judging from experience, 50 percent of available memory will be enough for the majority of databases with a lot of connections or activities, as many other indicators are used, which occupy memory. So, 50 percent is a good though conservative parameter.

To check InnoDB Buffer Pool usage (in %) and if you need to allocate more memory for the InnoDB Buffer Pool, you can use the query, which allows you to see the current usage as a percentage (though there are many queries to monitor the InnoDB Buffer Pool).

SELECT CONCAT(FORMAT(DataPages*100.0/TotalPages,2),
' %') BufferPoolDataPercentage
FROM (SELECT variable_value DataPages FROM information_schema.global_status
WHERE variable_name = 'Innodb_buffer_pool_pages_data') A,
(SELECT variable_value TotalPages FROM information_schema.global_status
WHERE variable_name = 'Innodb_buffer_pool_pages_total') B;

Binary logs

Binary logs contain events that describe changes, provide data changes sent to replicas, and are used for data recovery operations.

If you work with replication, you might know that binary logs require special attention apart from having them on a separate disk. You should size the binary logs properly, set the proper expiration time (1 month by default), and the maximum size, for instance, of 1 GB so that you will be able to write 1 GB of data per day.

We can have about 30 log files in the binary logs. However, you should check the activities of your system to consider increasing this number, as well as the expiration of the binary logs, if you need to keep more data for operations, such as finding time recovery, for instance.

How to control binary logs:

    • log_bin, max_binlog_size, binlog_expire_logs_seconds, etc.
    • PURGE BINARY LOGS TO|BEFORE to delete all the binary log files listed in the log index file prior to the specified log file name or date.
    • In addition, consider using GTID for replication to keep track of transactions.

InnoDB redo logs

This is yet another beast, which we want to keep control of — the redo and undo logs, which get written prior to flushing the data to the disk.

    • innodb_log_file_size

– The size of redo logs will impact the writing speed over the time to recover.
– The default value is too low, so consider using at least 512 MB for production.
– Total redo log capacity is determined by innodb_log_files_in_group (default value 2). For write-intensive systems, consider increasing innodb_log_files_in_group and keeping them on in a separate disk.

NOTE. Here, the related parameters are innodb_log_file_size and innodb_log_files_in_group.

Trading performance over consistency (ACID)

Associated with the redo and undo log discussion is the trading performance over consistency discussion about when InnoDB should flush/sync committed truncations.

innodb_flush_log_at_trx_commit defines how ofter InnoDB flushes the logs to the disk. This variable can have different values:

    • 0 — transactions are written to redo logs once per second;
    • 1 — (default value) fully ACID-compliant with redo logs written and flushed to disk at transaction commit;
    • 2 — transactions are written to redo logs at commit, and redo logs are flushed once per second.

If the system is write-intensive, you might consider setting this value to 2 to keep redo logs at every commit with the data written to disk once per second. This is a very good compromise between data integrity and performance successfully used in a number of write-intensive setups. This is a relief for the disk subsystem allowing you to gain that extra performance.

NOTE. I recommend using default (1) settings unless you are bulk-loading data, set session variable to 2 during load, experiencing an unforeseen peak in workload (hitting your disk system) and need to survive until you can
solve the problem, or you use the latest MySQL 8.0. You can also disable redo-logging completely. 

table_open_cache and max_connections

Opening the cache discussions, we will start from the max_connections parameter, which sets the maximum number of connections that we want to accept on the MySQL server, and the table_open_cache parameter, which sets the value of the cache of open tables we want to keep. Both parameters affect the maximum number of files the server keeps open:

    • table_open_cache value — 2,000 (default), which means that by default you can keep 2,000 tables open per connection.
    • max_connections value — 151 (default).

If you increase both values too much, you may easily run out of memory. So, the total number of open tables in MySQL is:

N of opened tables = N of connections x N (max number of tables per join)

NOTE. This number is related to the joins operated by your database per connection.

So, having an insight into what Zabbix does and which queries it executes can help you fine-tune this parameter. In addition, you can go by the rule of thumb checking if the table_open_cache sheets are full. To do that, you can check the global status like ‘opened_tables‘ to understand what is going on.

In addition, if you are going to increase the table up and cache on the maximum number of connections, you can check open_files_limit in MySQL and ulimit — the maximum number of open files in the operating system, as new connections are kept as open files in Linux. So, this is a parameter to fine-tune as well.

Open buffers per client connection

There are other buffers that depend on the number of connections (max_connections), such as:

    • read_buffer_size,
    • read_rnd_buffer_size,
    • join_buffer_size,
    • sort_buffer_size,
    • binlog_cache_size (if binary logging is enabled),
    • net_buffer_length.

Depending on how often you get connections to the Zabbix database, you might want to increase these parameters. It is recommended to monitor your database to see how these buffers are being filled up.

You also need to reserve some extra memory for these buffers if you have many connections. That is why it is recommended to reserve 50 percent of available memory for InnoDB buffer pool, so that you can use these spare 25 percent for extra buffers.

However, there might be another solution.

Enabling Automatic Configuration for a Dedicated MySQL Server

In MySQL 8.0, innodb_dedicated_server automatically configures the following variables:

    • innodb_buffer_pool_size,
    • innodb_log_file_size,
    • innodb_log_files_in_group, and
    • innodb_flush_method.

I would enable this variable as it configures the innodb_flush_ method which has a dependency with the file system.

NOTE. Enabling innodb_dedicated_server is not recommended if the MySQL instance shares system resources with other applications, as this variable enabled implicitly means that we are running only MySQL on the machine.

Conclusion

Now, you are ready to fine-tune your configuration step by step, starting from innodb_buffer_pool, max_connections, and table_open_cache, and see if your performance improves. Eventually, you can do further analysis and go further to really fine-tune your system up to your needs.

In general, 3-5 core parameters would be enough for operating with Zabbix in the vast majority of cases. If you tune those parameters keeping in mind dealing with a write-intensive application, you can achieve good results, especially if you separate the resources at a hardware level or at a VM level.

Performance tuning dos and don’ts

  • For a high-level performance tuning 101, think carefully and consider the whole stack together with the application.
  • In addition, think methodically:
    1. define what you are trying to solve, starting from the core of variables, which you want to fine-tune;
    2. argue why the proposed change will work;
    3. create an action plan; and
    4. verify the change worked.
  • To make things work:

— don’t micromanage;
— do not optimize too much;
— do not optimize everything; and, most importantly,
— do not take best practices as gospel truth, but try to adjust any practices to your particular environment.

 

Low-Level Discovery with Dependent items

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/low-level-discovery-with-dependent-items/13634/

The low-level discovery was introduced in Zabbix 2.0 and still belongs to one of the all-time favorites. Before LLD was available, adding items was all manual work. For example adding new disks, new interfaces, network ports on switches and everything else was all manual labor. And then LLD came around and suddenly we were able to ‘discover’ entities, and based on those discovered entities we can add new items, triggers, and such automatically.

Contents

  • Low-Level Discovery setup
  • Dependent items
  • Combing Low-Level Discovery and Dependent items
  • Conclusion

For a video guide, check out the Zabbix YouTube here: Zabbix: Low Level Discovery with Dependent items – YouTube

Low-Level Discovery setup

Let’s go over the idea of Low-Level Discovery first.

For the sake of clarity, we will stick with the default Zabbix agent item. Of course, as we will discover it’s only the format that matters for Zabbix to consider a response as LLD information. Let’s use built-in agent key: vfs.fs.discovery. Once we force the Zabbix agent to execute this item, it will reply with something like this:

[{"{#FSNAME}":"/sys","{#FSTYPE}":"sysfs"},{"{#FSNAME}":"/proc","{#FSTYPE}":"proc"},{"{#FSNAME}":"/dev","{#FSTYPE}":"devtmpfs"},{"{#FSNAME}":"/sys/kernel/security","{#FSTYPE}":"securityfs"},{"{#FSNAME}":"/dev/shm","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/dev/pts","{#FSTYPE}":"devpts"},{"{#FSNAME}":"/run","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/sys/fs/cgroup","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/sys/fs/cgroup/systemd","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/pstore","{#FSTYPE}":"pstore"},{"{#FSNAME}":"/sys/firmware/efi/efivars","{#FSTYPE}":"efivarfs"},{"{#FSNAME}":"/sys/fs/bpf","{#FSTYPE}":"bpf"},{"{#FSNAME}":"/sys/fs/cgroup/net_cls,net_prio","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/devices","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/hugetlb","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/memory","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/rdma","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/freezer","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/cpu,cpuacct","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/cpuset","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/perf_event","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/blkio","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/fs/cgroup/pids","{#FSTYPE}":"cgroup"},{"{#FSNAME}":"/sys/kernel/tracing","{#FSTYPE}":"tracefs"},{"{#FSNAME}":"/sys/kernel/config","{#FSTYPE}":"configfs"},{"{#FSNAME}":"/","{#FSTYPE}":"xfs"},{"{#FSNAME}":"/sys/fs/selinux","{#FSTYPE}":"selinuxfs"},{"{#FSNAME}":"/proc/sys/fs/binfmt_misc","{#FSTYPE}":"autofs"},{"{#FSNAME}":"/dev/hugepages","{#FSTYPE}":"hugetlbfs"},{"{#FSNAME}":"/dev/mqueue","{#FSTYPE}":"mqueue"},{"{#FSNAME}":"/sys/kernel/debug","{#FSTYPE}":"debugfs"},{"{#FSNAME}":"/sys/fs/fuse/connections","{#FSTYPE}":"fusectl"},{"{#FSNAME}":"/boot","{#FSTYPE}":"ext4"},{"{#FSNAME}":"/boot/efi","{#FSTYPE}":"vfat"},{"{#FSNAME}":"/home","{#FSTYPE}":"xfs"},{"{#FSNAME}":"/run/user/0","{#FSTYPE}":"tmpfs"}]

When we put this in a more readable format (truncated) it will look like this:

[
{
"{#FSNAME}":"/sys",
"{#FSTYPE}":"sysfs"
},
{
"{#FSNAME}":"/proc",
"{#FSTYPE}":"proc"
},
{
"{#FSNAME}":"/dev",
"{#FSTYPE}":"devtmpfs"
},
{
"{#FSNAME}":"/sys/kernel/config",
"{#FSTYPE}":"configfs"
},
{
"{#FSNAME}":"/",
"{#FSTYPE}":"xfs"
},
{
"{#FSNAME}":"/boot",
"{#FSTYPE}":"ext4"
},
{
"{#FSNAME}":"/home",
"{#FSTYPE}":"xfs"
}
]

In this format it suddenly becomes clear, we have the {#FSNAME} macro, with the name of a filesystem, combined with the type, captured in {#FSTYPE}.

Perfect! We feed this information into Zabbix, and LLD magic will happen.
Based on the Item prototypes, new items per {#FSNAME} will be added, and monitoring will start on those items.

Looking at the Item prototypes, they look a lot like normal items:

So, we have one item prototype that is responsible for providing the LLD information, and then the created ‘normal’ items to query the filesystem statistics. As you can imagine, with just 5 filesystems and 1 metric per filesystem, queried once per minute, no problem. But what if we have 50 filesystems, 7 metrics per filesystem and they get queried every 10 seconds… That’s a lot of queries against the host! Not only does that add load to the Zabbix server, but obviously also to the monitored host. It works, but is it ideal? It certainly isn’t!

So we’ve basically just setup this:

Dependent items

But then Zabbix introduced dependent items. Let’s take a quick look at dependent items and what they are

We have one master item that gathers all information (in bulk) and propagates that information to all the dependent items. On those dependent items we just do the cherry picking and filtering of the relevant metrics. Let’s put this to work and see how that goes.

So we create an item, with, in this case, the http agent type, which will collect the following information regarding the server status in a single request:

ServerVersion: Apache/2.4.37 (centos)
ServerMPM: event
Server Built: Nov  4 2020 03:20:37
CurrentTime: Monday, 08-Mar-2021 14:35:20 CET
RestartTime: Monday, 08-Mar-2021 11:04:09 CET
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 12671
ServerUptime: 3 hours 31 minutes 11 seconds
Load1: 0.01
Load5: 0.03
Load15: 0.00
Total Accesses: 1182
Total kBytes: 10829
Total Duration: 95552
CPUUser: 5.01
CPUSystem: 7.34
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0974667
Uptime: 12671
ReqPerSec: .0932839
BytesPerSec: 875.14
BytesPerReq: 9381.47
DurationPerReq: 80.8393
BusyWorkers: 1
IdleWorkers: 99
Processes: 4
Stopping: 0
BusyWorkers: 1
IdleWorkers: 99
ConnsTotal: 4
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 0
ConnsAsyncClosing: 0
Scoreboard: _________________________________________________________________________________________W__________............................................................................................................................................................................................................................................................................................................

 

Now, we create some dependent items, that depend on that first item (which we will call the Master item). Every time the Master item receives information, the complete reply will be pushed to the dependent items, without any altering of that data. So the master and dependent items are identical when no preprocessing is applied. That’s why on the dependent items we apply preprocessing to filter relevant information, for example, the BusyWorkers:

Perfect. So querying a host once, getting all the metrics in bulk, and then parsing it in Zabbix using preprocessing. Say bye to excessive load on the monitored host… (and due to preprocessing processes within Zabbix, no problem on the Zabbix server side).

Combining Low-Level Discovery and Dependent items

Ok, and what if we combine these to concepts? LLD with Dependent items? Wouldn’t that be the ultimate goal? Automatically creating new items without putting extra load to the monitored host? Let’s get this going!

To stick with the first example of LLD, we will discover filesystems, but now without the vfs.fs.discovery key, but the newly introduced vfs.fs.get key. Once we force the agent to execute this key, we will see this reply:

[{"fsname":"/dev","fstype":"devtmpfs","bytes":{"total":1940963328,"free":1940963328,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":473868,"free":473487,"used":381,"pfree":99.919598,"pused":0.080402}},{"fsname":"/dev/shm","fstype":"tmpfs","bytes":{"total":1958469632,"free":1958469632,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478141,"used":1,"pfree":99.999791,"pused":0.000209}},{"fsname":"/run","fstype":"tmpfs","bytes":{"total":1958469632,"free":1892040704,"used":66428928,"pfree":96.608121,"pused":3.391879},"inodes":{"total":478142,"free":477519,"used":623,"pfree":99.869704,"pused":0.130296}},{"fsname":"/sys/fs/cgroup","fstype":"tmpfs","bytes":{"total":1958469632,"free":1958469632,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478125,"used":17,"pfree":99.996445,"pused":0.003555}},{"fsname":"/","fstype":"xfs","bytes":{"total":95516360704,"free":55329644544,"used":40186716160,"pfree":57.926877,"pused":42.073123},"inodes":{"total":46661632,"free":46535047,"used":126585,"pfree":99.728717,"pused":0.271283}},{"fsname":"/boot","fstype":"ext4","bytes":{"total":1023303680,"free":705544192,"used":247296000,"pfree":74.046435,"pused":25.953565},"inodes":{"total":65536,"free":65497,"used":39,"pfree":99.940491,"pused":0.059509}},{"fsname":"/home","fstype":"xfs","bytes":{"total":5358223360,"free":5286903808,"used":71319552,"pfree":98.668970,"pused":1.331030},"inodes":{"total":2621440,"free":2621428,"used":12,"pfree":99.999542,"pused":0.000458}},{"fsname":"/run/user/0","fstype":"tmpfs","bytes":{"total":391692288,"free":391692288,"used":0,"pfree":100.000000,"pused":0.000000},"inodes":{"total":478142,"free":478137,"used":5,"pfree":99.998954,"pused":0.001046}}]

And if we format it to be more readable, it will look like this (truncated):

[
  {
    "fsname":"/",
    "fstype":"xfs",
    "bytes":{
      "total":95516360704,
      "free":55329644544,
      "used":40186716160,
      "pfree":57.926877,
      "pused":42.073123
    },
    "inodes":{
      "total":46661632,
      "free":46535047,
      "used":126585,
      "pfree":99.728717,
      "pused":0.271283
    }
  },
  {
    "fsname":"/home",
    "fstype":"xfs",
    "bytes":{
      "total":5358223360,
      "free":5286903808,
      "used":71319552,
      "pfree":98.668970,
      "pused":1.331030
    },
    "inodes":{
      "total":2621440,
      "free":2621428,
      "used":12,
      "pfree":99.999542,
      "pused":0.000458
    }
  }
]

Per filesystem, we get the original information FSNAME and FSTYPE, but also the statistics of these filesystems… bulk metrics! So, we create a normal item (Which will serve as the master item) getting out all those metrics in a single query:

Once we’ve got this data in Zabbix, we feed it into the LLD rule, giving this LLD rule the dependent LLD type:

Of course there are no ready to use LLD macros in this data, but since it is in JSON format, it shouldn’t be too hard to create the LLD macros with the ‘LLD macros’ option in the frontend and the relevant JSONPath expression:

Note: Technically we do not need to create the {#FSTYPE} macro to get this working!

Once this is done, we should be ready to create the item prototypes for this LLD rule. The data is there, macros are available, nothing is going to stop us now!

Let’s move on to item prototypes. But of course, we do not want to poll that remote host again per discovered filesystem. That means we will make this item prototype of the dependent item type as well, pointing it back to the master item we’ve created.

For the first item prototype, we want to obtain the total size per filesystem:

But, as I mentioned earlier: a dependent item without any preprocessing is identical to the master item and of course that would be wrong in this case. We just want to see the total bytes per filesystem and not all the collected statistics. In the configuration above we already know what to get out, so the Type of information and Units are filled already. What is not visible on that screenshot is the preprocessing rule that we need. Here the ‘JSONPath’ preprocessing step comes in handy since we receive JSON data. We would like to get out this part for our item (truncated):

[
  {
    "fsname":"/",
    "fstype":"xfs",
    "bytes":{
      "total":95516360704,
      "free":55329644544,
       "used":40186716160,
      "pfree":57.926877,
      "pused":42.073123

So, if we try to get this information out using JSONPath, it should look like: $.bytes.total.first() but this will match on any filesystem, so we need to configure it a bit more specific like: $[?(@.fsname==’/’)].bytes.total.first() 

As you can see, the JSONPath is a bit more complex here. We are forcing it to match on @.fsname==’/’ and from that entity, get out the bytes.total. Now, to make it even more complex we shouldn’t configure the filesystem hardcoded in the JSONPath since we’re working with Item prototypes. It should be the LLD Macro {#FSNAME} instead!

Now we save this item prototype, grab a cup of coffee (or just force a config_cache_reload on the server) and just wait for the magic to happen.

We’ve now built this setup:

 

So the master item will get values (i.e. obtain bulk data every minute) and push it into the LLD rule. From there, as per item prototypes, items will be created and those are populated from the master item as well, filtering out only the relevant metrics using Preprocessing.

So far, so good, but we have one small problem to solve: We want to get metrics every minute or so, but since all those metrics will get pushed into the LLD rule, we might be adding unnecessary extra load due to the high frequency. Luckily, solving that problem is no too hard. Navigate to the discovery rule, go to the ‘Preprocessing tab’ and select ‘Discard unchanged with heartbeat’ parameter: 1h or even larger interval!

This is insane! With just one poll/query to a host, we will utilize the power of LLD and dependent items, getting all metrics without adding minimal extra load on that host.

 

Conclusion

That’s it. If you’ve setup everything correctly, you should now get out quite a few filesystem metrics without adding any extra performance overhead on the host by performing unnecessary data requests.

Of course, if you need help optimizing your Zabbix environment, support contracts, consultancy, or training, we from Opensource ICT Solutions are always available to assist you in every possible way, worldwide, 24×7.

Thanks for reading this blog post, see you in the next one.

Finalizing the installation of Zabbix Agent with Ansible

Post Syndicated from Werner Dijkerman original https://blog.zabbix.com/finalizing-the-installation-of-zabbix-agent-with-ansible/13321/

In the previous blog posts, we created a Zabbix Server with a new user, a media type, and an action. In the 2nd blog post, we continued with creating and configuring a Zabbix Proxy. In the last part of this series of blog posts, we will install the Zabbix Agent on all of the 3 nodes we have running.

This blog post is the 3rd part of a 3 part series of blog posts where Werner Dijkerman gives us an example of how to set up your Zabbix infrastructure by using Ansible.
You can find part 1 of the blog post by clicking here.

To summarize, so far we have a Zabbix Server and a Zabbix Proxy. The Zabbix Server has a MySQL instance running on a separate node, the MySQL instance for the Zabbix Proxy runs on the same node. But we are missing one component right now, and that is something we will install with the help of this blog post. We will install the Zabbix Agent on the 3 nodes.

A git repository containing the code used in these blog posts is available on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we run Ansible, we need to make sure we have opened a shell on the “bastion” node. We can do that with the following command:

$ vagrant ssh bastion

Once we have opened the shell, go to the “/ansible” directory where we have all of our Ansible files present.

$ cd /ansible

In the previous blog post, we executed the “zabbix-proxy.yml” playbook. Now we are going to use the “zabbix-agent.yml” playbook. The playbook will install the Zabbix Agent on all nodes (“node-1”, “node-2” and “node-3”). Next up, on both the “node-1” and “node-3”, we will add a user parameters file specifically for MySQL. With this user parameters file, we are able to monitor the MySQL instances.

$ ansible-playbook -i hosts zabbix-agent.yml

This playbook will run for a few minutes installing the Zabbix Agent on the nodes. It will install the zabbix-agent package and add the configuration file, but it will also make a connection to the Zabbix Server API. We will automatically create a host with the correct IP information and the correct templates! When the Ansible playbook has finished running, the hosts can immediately be found in the Frontend. And better yet, it is automatically correctly configured, so the hosts will be monitored immediately!

We have several configurations spread over multiple files to make this work. We first start with the “all” file.

The file “/ansible/group_vars/all” contains the properties that will apply to all hosts. Here we have the majority of essential properties configured that are overriding the default properties of the Ansible Roles. Each role has some default configuration, which will work out of the box. But in our case, we need to override these, and we will discuss some of these properties next.

zabbix_url

This is the URL on which the Zabbix Frontend is available and thus also the API. This property is for example used when we create the hosts via the API as part of the Proxy and Agent installation.

zabbix_proxy

The Zabbix Agents will be monitored by the Zabbix Proxy unless the Agent runs on the Zabbix Server or the host running the database for the Zabbix Server. Like with the previous blog post, we will also use some Ansible notation to get the IP address of the host running the Zabbix Proxy to configure the Zabbix Agent.

zabbix_proxy: node-3
zabbix_agent_server: "{{ hostvars[zabbix_proxy]['ansible_host'] }}"
zabbix_agent_serveractive: "{{ hostvars[zabbix_proxy]['ansible_host'] }}"

With the above configuration, we configure both the Server and ServerActive in the Zabbix Agent’s configuration file to use the IP address of the host running the Zabbix Proxy. If you look at the files “/ansible/group_vars/zabbix_database” and “/ansible/group_vars/zabbix_server/generic” you would see that these contain the following:

zabbix_agent_server: "{{ hostvars['node-1']['ansible_host'] }}"
zabbix_agent_serveractive: "{{ hostvars['node-1']['ansible_host'] }}"

The Zabbix Agent on the Zabbix Server and on its database is using the IP address of the Zabbix Server to be used as the value for both the “Server” and “ActiveServer” configuration settings for the Zabbix Agent.

zabbix_api_user & zabbix_api_pass

These are the default in the roles, but I have added them here so it is clear that they exist. When you change the Admin user password, don’t forget to change them here as well.

zabbix_api_create_hosts & zabbix_api_create_hostgroups 

Because we automatically want to create the Zabbix Frontend hosts via the API, we need to set both these properties to true. Firstly, we create the host groups that can be found with the property named “zabbix_host_groups”. After that, as part of the Zabbix Agent installation, the hosts will be created via the API because of the property zabbix_api_create_hosts.

Now we need to know what kind of information we want these hosts created with. Let’s go through some of them.

zabbix_agent_interfaces

This property contains a list of all interfaces that are used to monitoring the host. This is relatively simple in our case, as the hosts only have 1 interface available. You can find some more information about what to use when you have other interfaces like IPMI or SNMP: https://github.com/ansible-collections/community.zabbix/blob/main/docs/ZABBIX_AGENT_ROLE.md#other-interfaces We use the interface with the value from property “ansible_host” for port 10050.

zabbix_host_groups

This property was also discussed before – we automatically assign our new host to these host groups. Again, we have a fundamental setup, and thus it is an effortless property.

zabbix_link_templates

We provide a list of all Zabbix Templates we will want to assign to the hosts with this property. This property seems a bit complicated, but no worries – let’s dive in!

zabbix_link_templates:
  - "{{ zabbix_link_templates_append if zabbix_link_templates_append is defined else [] }}"
  - "{{ zabbix_link_templates_default }}"

With the first line, we add the property’s value “zabbix_link_templates_append”, but we only do that if that property exists. If Ansible can not find that property, then we basically add an empty list. So where can we find this property? We can check the files in the other directories in the group_vars directory. If we check, for example “/ansible/group_vars/database/generic”, we will find the property:

zabbix_link_templates_append:
  - 'MySQL by Zabbix agent'

So on all nodes that are part of the database group, we add the value to the property “zabbix_link_templates”. All of the database servers will get this template attached to the host. If we would check the file “/ansible/group_vars/zabbix_server/generic”, then we will find the following:

zabbix_link_templates_append:
  - 'Zabbix Server'

As you probably understand now, when we create the Zabbix Server host, we will add the “Zabbix Server” template to the host, because this file is only used for the hosts that are part of the zabbix_server group.

With this setup, we can configure specific templates for the specific groups, but there is also at least 1 template that we always want to add. We don’t want to add the template to each file as that is a lot of configuration, so we use a new property for this named “zabbix_link_templates_default”. In our case, we only have Linux hosts, so we always want to add the templates:

zabbix_link_templates_default:
  - "Linux by Zabbix agent active"

On the Zabbix Server, we both assign the “Zabbix Server” template and the template “Linux by Zabbix agent active” to the host.

But what if we have Macros?

zabbix_macros

As part of some extra tasks in this playbook execution, we also need to provide a macro for some hosts. This macro is needed to make the Zabbix Template we assign to the hosts work. For the hosts running a MySQL database, we need to add a macro, which can be found with the property zabbix_macros_append in the file “/ansible/group_vars/database/generic”.

zabbix_macros_append:
  - macro_key: "MYSQL.HOST"
    macro_value: "{{ ansible_host }}"

We will create 1 macro with the key name “MYSQL.HOST” and assign a value that will be equal to the contents of the property ansible_host (For the “node-2” host, the host running the database for the Zabbix Server), which is “10.10.1.12”.

User parameters

The “problem” with assigning the MySQL template is that it also requires some UserParameter entries set. The Zabbix Agent role can deploy files containing UserParameters to the given hosts. In “/ansible/group_vars/database/generic” we can find the following properties:

zabbix_agent_userparameters_templates_src: "{{ inventory_dir }}/files/zabbix/mysql"
zabbix_agent_userparameters:
  - name: template_db_mysql.conf

The first property “zabbix_agent_userparameters_templates_src” will let Ansible know where to find the files. The “{{ inventory_dir }}” will be translated to “/ansible” and here you will find a directory named “files” (and you will find the group_vars directory as well) and further drilling down the directories, you will find the file “template_db_mysql.conf”.

With the second property “zabbix_agent_userparameters” we let Ansible know which file we want to deploy to the host. In this case, the only file found in the directory named “template_db_mysql.conf”.

When the Zabbix agent role is fully executed, we have everything set to monitor all the hosts automatically. Open the dashboard, and you will see something like the following:

It provides an overview, and on the right side, you will notice we have a total of 3 nodes of which 3 are available. Maybe you will see a “Problem” like in the screenshot above, but it will go away.

If we go to “Configuration” and “Hosts,” we will see that we have the 3 nodes, and they have the status “Enabled” and the “ZBX” icon is green, so we have a proper connection.

We should verify that we have some data, so go to “Monitoring” and click on “Latest data.” We select in the Host form field the “Zabbix database,” and we select “MySQL” as Application and click on “Apply.” If everything is right, it should provide us with some information and values, just like the following screenshot. If not, please wait a few minutes and try again.

Summary

This is the end of a 3 part blog post in creating a fully working Zabbix environment with a Zabbix Server, Proxy, and Agent. With these 3 blogposts you were able to see how you can install and configure a complete Zabbix environment with Ansible. Keep in mind that the code shown was for demo purposes and it is not something you can immediately use for the Production environment. We also used some of the available functionality of the Ansible collection for Zabbix, there are many more possibilities like creating a maintenance period or a discovery rule. Not everything is possible, if you do miss a task or functionality of a role that Ansible should do or configure, please create an issue on Github so we can make it happen.

Don’t forget to execute the following command:

$ vagrant destroy -f

With this, we clean up our environment and delete our 4 nodes, thus finishing with the task at hand!

Swing into action with an homage to Pitfall! | Wireframe #48

Post Syndicated from Ryan Lambie original https://www.raspberrypi.org/blog/swing-into-action-with-an-homage-to-pitfall-wireframe-48/

Grab onto ropes and swing across chasms in our Python rendition of an Atari 2600 classic. Mark Vanstone has the code

Whether it was because of the design brilliance of the game itself or because Raiders of the Lost Ark had just hit the box office, Pitfall Harry became a popular character on the Atari 2600 in 1982.

His hazardous attempts to collect treasure struck a chord with eighties gamers, and saw Pitfall!, released by Activision, sell over four million copies. A sequel, Pitfall II: The Lost Caverns quickly followed the next year, and the game was ported to several other systems, even making its way to smartphones and tablets in the 21st century.

Pitfall

Designed by David Crane, Pitfall! was released for the Atari 2600 and published by Activision in 1982

The game itself is a quest to find 32 items of treasure within a 20-minute time limit. There are a variety of hazards for Pitfall Harry to navigate around and over, including rolling logs, animals, and holes in the ground. Some of these holes can be jumped over, but some are too wide and have a convenient rope swinging from a tree to aid our explorer in getting to the other side of the screen. Harry must jump towards the rope as it moves towards him and then hang on as it swings him over the pit, releasing his grip at the other end to land safely back on firm ground.

For this code sample, we’ll concentrate on the rope swinging (and catching) mechanic. Using Pygame Zero, we can get our basic display set up quickly. In this case, we can split the background into three layers: the background, including the back of the pathway and the tree trunks, the treetops, and the front of the pathway. With these layers we can have a rope swinging with its pivot point behind the leaves of the trees, and, if Harry gets a jump wrong, it will look like he falls down the hole in the ground. The order in which we draw these to the screen is background, rope, tree-tops, Harry, and finally the front of the pathway.

Now, let’s get our rope swinging. We can create an Actor and anchor it to the centre and top of its bounding box. If we rotate it by changing the angle property of the Actor, then it will rotate at the top of the Actor rather than the mid-point. We can make the rope swing between -45 degrees and 45 degrees by increments of 1, but if we do this, we get a rather robotic sort of movement. To fix this, we add an ‘easing’ value which we can calculate using a square root to make the rope slow down as it reaches the extremes of the swing.

Our homage to the classic Pitfall! Atari game. Can you add some rolling logs and other hazards?

Our Harry character will need to be able to run backwards and forwards, so we’ll need a few frames of animation. There are several ways of coding this, but for now, we can take the x coordinate and work out which frame to display as the x value changes. If we have four frames of running animation, then we would use the %4 operator and value on the x coordinate to give us animation frames of 0, 1, 2, and 3. We use these frames for running to the right, and if he’s running to the left, we just mirror the images. We can check to see if Harry is on the ground or over the pit, and if he needs to be falling downward, we add to his y coordinate. If he’s jumping (by pressing the SPACE bar), we reduce his y coordinate.

We now need to check if Harry has reached the rope, so after a collision, we check to see if he’s connected with it, and if he has, we mark him as attached and then move him with the end of the rope until the player presses the SPACE bar and he can jump off at the other side. If he’s swung far enough, he should land safely and not fall down the pit. If he falls, then the player can have another go by pressing the SPACE bar to reset Harry back to the start.

That should get Pitfall Harry over one particular obstacle, but the original game had several other challenges to tackle – we’ll leave you to add those for yourselves.

Pitfall Python code

Here’s Mark’s code for a Pitfall!-style platformer. To get it working on your system, you’ll need to  install Pygame Zero.  And to download the full code and assets, head here.

Get your copy of Wireframe issue 48

You can read more features like this one in Wireframe issue 48, available directly from Raspberry Pi Press — we deliver worldwide.
Wireframe issue 48
And if you’d like a handy digital version of the magazine, you can also download issue 48 for free in PDF format.
A banner with the words "Be a Pi Day donor today"

The post Swing into action with an homage to Pitfall! | Wireframe #48 appeared first on Raspberry Pi.

Installing and configuring the Zabbix Proxy

Post Syndicated from Werner Dijkerman original https://blog.zabbix.com/installing-and-configuring-the-zabbix-proxy/13319/

In the previous blog post, we created a Zabbix Server setup, created several users, a media type, and an action. But today, we will install on a 3rd node the Zabbix Proxy. This Zabbix Proxy will have its database running on the same host, so the “node-3” host has both the MySQL and Zabbix Proxy running.

This blog post is the 2nd part of a 3 part series of blog posts where Werner Dijkerman gives us an example of how to set up your Zabbix infrastructure by using Ansible.
You can find part 1 of the blog post by clicking Here

A git repository containing the code of these blog posts is available, which can be found on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we run Ansible, we have opened a shell on the “bastion” node. We do that with the following command:

$ vagrant ssh bastion

Once we have opened the shell, go to the “/ansible” directory where we have all of our Ansible files present.

$ cd /ansible

With the previous blog post, we executed the “zabbix-server.yml” playbook. Now we use the “zabbix-proxy.yml” playbook. The playbook will deploy a MySQL database on “node-3” and also installs the Zabbix Proxy on the same host.

$ ansible-playbook -i hosts zabbix-proxy.yml

This playbook will run for a few minutes creating all services on the node. While it is running, we will explain some of the configuration options we have set.

The configuration which we will talk about can be found in “/ansible/group_vars/zabbix_proxy” directory. This is the directory that is only used when we deploy the Zabbix proxy and contains 2 files. 1 file called “secret”, and a file called “generic”. It doesn’t really matter what names the files have in this directory. I used a file called the “secret” for letting you know that this file contains secrets and should be encrypted with a tool like ansible-vault. As this is out of scope for this blog, I simply made sure the file is in plain text. So how do we know that this directory is used for the Zabbix Proxy node?

In the previous blog post, we mentioned that with the “-I” argument, we provided the location for the inventory file. This inventory file contains the hostnames and the groups that Ansible is using. If we open the inventory file “hosts”, we can see a group called “zabbix_proxy.” So Ansible uses the information in the “/ansible/group_vars/zabbix_proxy” directory as input for variables. But how does the “/ansible/zabbix-proxy.yml” file know which host or groups to use? At the beginning of this file, you will notice the following:

- hosts: zabbix_proxy
  become: true
  collections:
    - community.zabbix

Here you will see the that “hosts” key contains the value “zabbix_proxy”. All tasks and roles that we have configured in this play will be applied to all of the hosts that are part of the zabbix_proxy group. In our case, we have only 1 host part of the group. If you would have for example 4 different datacenters and within each datacenter you want to have a Zabbix Proxy running, executing this playbook will be done on these 4 hosts and at the end of the run you would have 4 Zabbix Proxy servers running.

Within the “/ansible/group_vars/zabbix_proxy/generic” the file, we have several options configured. Let’s discuss the following options:

* zabbix_server_host
* zabbix_proxy_name
* zabbix_api_create_proxy
* zabbix_proxy_configfrequency

zabbix_server_host

The first one, the “zabbix_server_host” property tells us where the Zabbix Proxy can find the Zabbix Server. This will allow the Zabbix Proxy and the Zabbix Server to communicate with each other. Normally you would have to configure the firewall (Iptables or Firewalld) as well to allow the traffic, but in this case, there is no need for that. Everything inside our environment which we have created with Vagrant has full access. When you are going to deploy a production-like environment, don’t forget to configure the firewall (Currently this configuration of the firewalls are not yet available as part of the Ansible Zabbix Collection for both the Zabbix Server and the Zabbix Proxy. So for now you should be creating a playbook in order to configure the local firewall to allow/deny traffic).

As you will notice, we didn’t configure the property with a value like an IP address or FQDN. We use some Ansible notation to do that for us, so we only have the Zabbix Server information in one place instead of multiple places. In this case, Ansible will get the information by reading the inventory file and looking for a host entry with the name “node-1” (Which is the hostname that is running the Zabbix Server), and we use the value found by the property named “ansible_host” (Which has a value “10.10.1.11”).

zabbix_proxy_name

This is the name of the Zabbix Proxy host, which will be shown in the Zabbix frontend. We will see this later in this blog when we will create a new host to be monitored. When you create a new host, you can configure if that new host should be monitored by a proxy and if so, you will see this name.

zabbix_api_create_proxy

When we deploy the Zabbix Proxy role, we will not only install the Zabbix Proxy package, the configuration file and start the service. We also perform an API call to the Zabbix Server to create a Zabbix Proxy entry. With this API call, we can configure hosts to be monitored via this new Zabbix Proxy.

zabbix_proxy_configfrequency

The last one is just for demonstration purposes. With a default installation/configuration of the Zabbix Proxy, it has a basic value of 3600. This means that the Zabbix Server sends the configuration every 3600 to the Zabbix Proxy. Because we are running a small demo here in this Vagrant setup, we have set this to 60 seconds.

Now the deployment of our Zabbix Proxy will be ready.

When we open the Zabbix Web interface again, we go to “Administration” and click on “Proxies”. Here we see the following:

We see an overview of all proxies available, and in our case, we only have 1. We have “node-3” configured, which has an “Active” mode. When you want to configure a “Passive” mode proxy, you’ll have to update the “/ansible/group_vars/zabbix_proxy” file and add somewhere in the file the following entry: “zabbix_proxy_status: passive”. Once you have updated and saved the file, you’ll have to rerun the “ansible-playbook -i hosts zabbix-proxy.yml” command. If you will then recheck the page, you will notice that it now has the “Passive” mode.

So let’s go to “Configuration” – “Hosts”. At the moment, you will only see 1 host, which is the “Zabbix server,” like in the following picture.

Let’s open the host creation page to demonstrate that you can now set the host to be monitored by a proxy. The actual creation of a host is something that we will do automatically when we deploy the Zabbix Agent with Ansible and not something we should do manually. 😉 As you will notice, you are able to click on the dropdown menu with the option “Monitored by proxy” and see the “node-3” appear. That is very good!

Summary

We have installed and configured both a Zabbix Server and a Zabbix Proxy, and we are all set now. With the Zabbix Proxy, we have installed both the MySQL database and the Zabbix Proxy on the same node. Whereas we did install them separately with the Zabbix Server. With the following blog post, we will go and install the Zabbix Agent on all nodes.

Installing the Zabbix Server with Ansible

Post Syndicated from Werner Dijkerman original https://blog.zabbix.com/installing-the-zabbix-server-with-ansible/13317/

Today we are focusing more on the automation of installation and software configuration instead of using the manual approach. Installing and configuring software the manual way takes a lot more time, you can easily make more errors by forgetting steps or making typos, and it will probably be a bit boring when you need to do this for a large number of servers.

In this blog post, I will demonstrate how to install and configure a Zabbix environment with Ansible. Ansible has the potential to simplify many of your day-to-day tasks. As an alternative to Ansible, you may also opt in to use Puppet, Chef, and SaltStack to install and configure your Zabbix environment.

Ansible does not have any specific infrastructure requirements for it to do its job. We just need to make sure that the user exists on the target host, preferably configured with SSH keys. With tools like Puppet or Chef, you need to have a server running somewhere, and you will need to deploy an agent on your nodes. You can learn more about Ansible here:  https://docs.ansible.com/ansible/latest/index.html.

This post is the first in a series of three articles. We will set up a (MySQL) Database running on 1 node (“node-2”), Zabbix Server incl. Frontend, which will be running on a separate node (“node-1”). Once we have built this, we configure an action, media and we will create some users. In the following image you will see the environment we will create.

Our environment we will create.
The environment we will create.

In the 2nd blog post, we will set up a Zabbix Proxy and a MySQL database on a new but the same node (“node-3”). In the 3rd blog post, we will install the Zabbix Agent on all of the 3 nodes we were using so far and configure some user parameters. Where the Zabbix Agent on “node-3” is using the Zabbix Proxy, the Zabbix Agent on the nodes “node-1” and “node-2” will be monitored by the Zabbix Server.

Preparations

A git repository containing the code used in these blog posts is available, which can be found on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we can do anything, we have to install Vagrant (https://www.vagrantup.com/downloads.html) and Virtualbox (https://www.virtualbox.org/wiki/Downloads). Once you have done that, please clone the earlier mentioned git repository somewhere on your host. For this demo, we will not run the Zabbix Frontend with TLS certificates.

We have to update the hosts file. With the following line, we need to make sure that we can access the Zabbix Frontend.

10.10.1.11 zabbix.example.com

In the “ROOT” directory of the git repository which you cloned some moments ago, where you can also find the Vagrantfile, This Vagrantfile contains the configuration of the virtual machine of our setup. We will create 4 Virtual Machine’s running Ubuntu 20.04, each with 1 CPU and 1 GB of Ram which you can see in the first “config” block. In the 2nd config block, we configure our “bastion” host, which we discuss later. This node will get the ip 10.10.1.3 and we also mount the ansible directory in this Virtual Machine on location “/ansible”. For installing and configuring this node we will use a playbook bastion.yml to do this. With this playbook, we will install some packages like Python, git and Ansible inside this bastion virtual machine.

The 3rd config block is part of a loop that will configure and it will create 3 Virtual Machines. Each virtual machine is also an Ubuntu node, had its own ip (respectively 10.10.1.11 for the first node, 10.10.1.12 for the second and 10.10.1.13 for the 3rd node) and just like the “bastion” node, they have each 1 CPU and 1 GB of RAM.

You will have to execute the following command:

$ vagrant up

With this command, we will start our Virtual Machine’s. This might take a while, as it will download a VirtualBox image containing Ubuntu. The “vagrant up” command will start the “bastion” node and all other nodes as a part of this demo. Once that is done, we need to access a shell on the “bastion” node:

$ vagrant ssh bastion

This “bastion” node is a fundamental node on which we will execute Ansible, but we will not be installing anything on this host. We have opened a shell in the Virtual Machine we just created. You can compare it with creating an “ssh” connection. We have to go to the following directory before we can download the dependencies:

$ cd /ansible

As mentioned before, we have to download the Ansible dependencies. The installation depends on several Ansible Roles and an Ansible Collection. With the Ansible Roles and the Ansible Collection, we can install MySQL, Apache, and the Zabbix components. We have to execute the following command to download the dependencies:

$ ansible-galaxy install -r requirements.yml
Starting galaxy role install process
- downloading role 'mysql', owned by geerlingguy
- downloading role from https://github.com/geerlingguy/ansible-role-mysql/archive/3.3.0.tar.gz
- extracting geerlingguy.mysql to /home/vagrant/.ansible/roles/geerlingguy.mysql
- geerlingguy.mysql (3.3.0) was installed successfully
- downloading role 'apache', owned by geerlingguy
- downloading role from https://github.com/geerlingguy/ansible-role-apache/archive/3.1.4.tar.gz
- extracting geerlingguy.apache to /home/vagrant/.ansible/roles/geerlingguy.apache
- geerlingguy.apache (3.1.4) was installed successfully
- extracting wdijkerman.php to /home/vagrant/.ansible/roles/wdijkerman.php
- wdijkerman.php was installed successfully
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'community.zabbix:1.2.0' to '/home/vagrant/.ansible/collections/ansible_collections/community/zabbix'
Created collection for community.zabbix at /home/vagrant/.ansible/collections/ansible_collections/community/zabbix
community.zabbix (1.2.0) was installed successfully

Your output may vary because of versions that might have been updated already since writing this blog post. We now have downloaded the dependencies and are ready to install the rest of our environment. But why do we need to download a role for MySQL, Apache and php? A role contains all the neccecerry tasks and files to configure that specific service. So in the case for the MySQL Ansible role, it will install the MySQL-server and all other packages that MySQL requires on the host, it will configure that the mysqld service is created and is running, but it will also create the databases, create and configure MySQL users and configure the root password. Using a role will help us install our environment and we don’t have to figure out ourselves on installing and configuring a MySQL server manually.

So what about the collection, the Ansible Community Zabbix Collection? Ansible has introduced this concept with Ansible 2.10 and is basically a “collection” of plugins, modules and roles for a specific service. In our case, with the Zabbix Collection, the collection contains the roles for installing the Zabbix Server, Proxy, Agent, Javagateway and the Frond-end. But it also contains a plugin to use a Zabbix environment as our inventory and contains modules for creating resources in Zabbix. All of these modules will work with the Zabbix API to configure these resources, like actions, triggers, groups. templates, proxies etc. Basically, everything we want to create and use can be done with a role or a collection.

Installing Zabbix Server

Now we can execute the following command, which will install the MySQL database on “node-2” and installs the Zabbix Server on “node-1”:

$ ansible-playbook -i hosts zabbix-server.yml

This might take a while, a minute, or 10 depending on the performance of your host. We execute the “ansible-playbook” command, and then “-i” we provide the location of the inventory file. Here you see the contents of the inventory file:

[zabbix_server]
node-1 ansible_host=10.10.1.11

[zabbix_database]
node-2 ansible_host=10.10.1.12

[zabbix_proxy]
node-3 ansible_host=10.10.1.13

[database:children]
zabbix_database
zabbix_proxy

This inventory file contains basically all of our nodes and to which group the hosts belong. We can see in that file that there is a group called “zabbix_server” (The value between [] square brackets is the name for the group) and contains the “node-1” host. Because we have a group called “zabbix_server,” we also have a directory containing some files. These are all the properties (or variables) that will be used for all hosts (in our case, only the “node-1”) in the “zabbix_server” group.

Web Interface

Now you can open your favorite browser and open “zabbix.example.com”, and you will see the Zabbix login screen. Please enter the default credentials:

Username: Admin
Password: zabbix

On the Dashboard, you will probably notice that it complains that it can not connect to the Zabbix Agent running on the Zabbix Server, which is fine as we haven’t  installed it yet. We will do this in a later blog post.

Dashboard overview

When we go to “Administration” and click on “Media types,” we will see a media type called “A: Ops email.” That is the one we have created. We can open the “/ansible/zabbix-server.yml” file and go to line 33, where we have configured the creation of the Mediatype. In this case, we have configured multiple templates for sending emails via the “mail.example.com” SMTP server.

Now we have seen the media type, we will look at the trigger we just created. This trigger makes use of the media type we just saw. The trigger can be found in the “/ansible/zabbix-server.yml” file on line 69. When you go to “Configuration” and “Actions,” you will see our created trigger “A: Send alerts to Admin”. But we don’t want to run this in Production, and for demonstrating purposes, we have selected to be triggered when the severity is Information or higher.

And lastly, we are going to see that we have also created new internal users. Navigate to “Administration” – “Users,” and you will see that we have created a user called “wdijkerman”, which can be found in the “/ansible/zabbix-server.yml” file on line 95. This user will be part of a group created earlier called “ops,”. The user type is Zabbix super admin and we have configured the email media type to be used 24×7.

We have defined a default password for this user – “password”. When you have changed the password in the Zabbix Frontend UI, executing the playbook would not change the password back again to “password.” So don’t worry about it. But if you would have removed – let’s say – the “ops” group, then, when you execute the playbook again, the group will be re-added to the user.

Summary

As you see, it is effortless to create and configure a Zabbix environment with Ansible. We didn’t have to do anything manually, and all installations and configurations were applied automatically when we executed the ansible-playbook command. You can find more information on either the Ansible page https://docs.ansible.com/ansible/latest/collections/community/zabbix/ or on the Github page https://github.com/ansible-collections/community.zabbix.

In the next post, we will install and configure the Zabbix Proxy.

Managing complexity in Zabbix installations with Splunk

Post Syndicated from Christian Anton original https://blog.zabbix.com/managing-complexity-in-zabbix-installations-with-splunk/13053/

A big data analytics engine can be used to optimize large and complex Zabbix installations: keeping track of the amount and kind of problems over time, top alert producers, and much more. You can employ Splunk to optimize and analyze vital Zabbix runtime parameters, such as ‘unsupported items,’ repeatedly happening host availability issues, misconfigured agents, and Zabbix Queue entries.

Contents

I. Complexity (1:15)
II. Zabbix entity inventory (8:28)
III. Use cases (15:16)
IV. Conclusion (20:09)
V. Questions & Answers (21:41)

secadm GmbH is a service provider located in the south of Germany. The company with a strong background in monitoring and automation, network infrastructure, and security software development supports customers of all sizes to manage their IT infrastructures. secadm GmbH is a Zabbix partner and also a Zabbix training partner.

Complexity

Operating a Zabbix deployment of a specific size comes with some challenges:

  • A huge number of hosts, templates, items, host groups, macros, and configuration elements inside your Zabbix instance.
  • LLD rules/unsupported items — items that are unable to fetch information, for example, a wrong password or a wrong path, in an external check. It is often hard to get a hold of how many of those you have and in which of the various error states. Therefore, it’s also difficult to fix them.
  • Host availability/network issues — errors that you see only in the logs — things going up and down, losing their connectivity, but getting back in time before issuing an alert.
  • Queue entries. In larger Zabbix installations, you might have ten thousand or even more items in this service queue. Zabbix actually tells you that some items do not receive their data in time. Zabbix shows that something is really wrong, though it doesn’t give a hint about what is wrong.
  • Zabbix as a monitoring tool is there to actually generate problems and alerts out of these problems. Many problems often cause ‘alert fatigue’ when people start ignoring monitoring results because of too many alerts.

Therefore, we receive a lot of questions from our customers, such as:

  • Where do all these problems come from?
  • What are the hosts generating most of the problems, at what times, and generated by what templates?
  • Did the latest change/upgrade have any negative impact on our monitoring?
  • Can you get rid of unsupported items?
  • How many hosts have specific problems (for instance, caused by a known bug in an old version of an agent that behaves strangely with a specific version of the Zabbix server), and what would be the effect if we fixed those problems?
  • Where do all these queue entries come from?

Zabbix is a transparent and predictable monitoring tool that offers great ways to organize the monitored elements with templates and macros. Zabbix also offers excellent visualization capabilities. However, Zabbix is not an analytical utility offering a flexible query language to gather the required information in the required format, having on-demand statistical functions, and allowing you to enrich and correlate data with the data from arbitrary sources. So, extra tools will have to do the extra work.

Secadm GmbH being the partner of Zabbix and Splunk, has concluded that it’s obvious to use Splunk for such extra work. Splunk is offering many possibilities to onboard data in the platform far beyond the simple indexing of log data, looking up the Key-Value store, implementing scripts and programs inside the Splunk platform to fetch data in real-time and on-demand out of other systems without having to store and to index any kind of information, as well as performing custom search commands.

Zabbix entity inventory

The most important Zabbix data used for analysis — the inventory of all elements inside Zabbix that do not often change, such as:

  • Hosts,
  • Items,
  • Proxies,
  • Templates,
  • Triggers,
  • Discovery rules (LLD),
  • Item Prototypes, and
  • Trigger Prototypes.

As this data is not changing constantly, we fetch this data from Splunk with the scheduled search and custom search command directly from the API endpoints in Zabbix. Then we can store this information inside the Splunk KV Store, which is, in fact, the MongoDB allowing us to perform searches in milliseconds without having to index any data and quickly get the results.

Zabbix entity inventory

So, you can get statistics on status and state to drill down on the unsupported items for a list of all of the items. You can further identify the correlation for the hostnames instead of host IDs, which are not human-readable. The hostnames are available at the KV Store, which stores the hosts with their metadata. You can also identify how many unsupported items there are on each host.

You can also get information on the hostnames, hosts, item names, item types, and errors. You can categorize the problems as SNMP problems, shell problems such as wrong paths, and see how often certain problems happen and what hosts are assigned to what templates and host groups, and so on. This information may also be aggregated or correlated with information from UCMDB.

More data

More fun than having data within a data analytics platform has more data.

  • Indexing the Zabbix Server / Proxy Logs logs, categorizing events to identify availability issues, item problems, preprocessing problems, housekeepers statistics, etc.

  • A module to fetch information from Zabbix (item, host, trigger) in real-time.

  • Gathering metrics (History / Trends data) directly from Zabbix in real-time without the need to store these metrics in any place other than the Zabbix database. We can still use the data for graphing, correlations, calculations, etc.

  • Onboarding the Zabbix problems into Splunk by using the new custom Media types — Webhook.

Custom Media type

  • Correlation of the alert logs, which are new and available through the API since Zabbix 5.0.
  • Working on the queue items to solve these questions.

Use cases

Zabbix queue

Zabbix queue may be a real headache as you can wait for a Zabbix installation with 20,000-50,000 items for 5 or 10 minutes or even longer.

In this dashboard, the same view is displayed in Zabbix: items are categorized by overtime, item type, proxy, etc. Splunk here offers what Zabbix fails — the history so that you can see the spikes when things have changed dramatically. For instance, when more significant network changes happen, the network slows down, and the queues grow dramatically. You can see whether these queues have gone back down or remained up. This information is complicated to analyze in Zabbix.

You can also drill down to see the items correlated with their actual status and the host’s status inside Zabbix. So, you can clearly see, for instance, that an item is on the host that is down or in the queue as it’s not supported and doesn’t get any data.

Here, there is also an Ignore list. So you can get statistics for the remaining items and group them, for instance, by Item type. You can go further and analyze and fix the problems.

Zabbix problems analytics

Zabbix problems dashboard

In this dashboard, Zabbix problems are displayed by system categories. For instance, we can see that over the last 24 hours, Windows caused most of the problems.

Here, we can also drill down to see, for instance, if there are many similar problems. You can go further to identify a single issue that has caused many alerts or problems. You can see that one host is creating almost all of the problems. So, if you switch this one host off, you would have fewer problems.

Zabbix data for management visibility

We can use Zabbix data for greater management visibility, such as:

  • Correlation of data to generate meaningful dashboards:

— Zabbix (metrics, status, problems, etc.),
— application logs,
— other data sources,
— inventory (CMDB, …)

  • Business-level visualization.

Conclusion

Splunk is open-source software and is distributed for free. We are currently in the process of integrating Splunk with Zabbix.

If you are interested in Splunk, you can send a request to [email protected]  or look for Christian Anton on LinkedIn or Instagram.

Questions & Answers

Question. If we use this kind of integration, are there any performance issues caused by Splunk or some misconfiguration?

Answer. We have been using Splunk for installations with several tens of thousands of monitored hosts and from hundreds of thousands up to millions of items and have not seen any performance implications.

Question. How does this connector work under the hood? Does it use the API or direct queries to the database?

Answer. We rely on the API. Besides, we can fetch the data directly from the database.

 

Setting up Zabbix Agent 2 for PostgreSQL monitoring and revealing how it works

Post Syndicated from Daria Vilkova original https://blog.zabbix.com/setting-up-zabbix-agent-2-for-postgresql-monitoring-and-revealing-how-it-works/13208/

This article will recall the most important theses about the plugin for PostgreSQL monitoring for Zabbix Agent 2. Here you’ll find the explanation of how the plugin works under the hood illustrated with a simple example. You will also get familiar with a new mechanism of custom queries that let you collect metrics from separate SQL files on PC.

Contents

I.Zabbix Agent plugin (2:40)

    1. Implementation (3:10)
    2. Basic features (4:24)
    3. How to get a simple metric? (11:07)

II. Custom metrics (14:05)
III. Conclusion (17:58)
IV. Questions & Answers (19:20)

 

Zabbix Agent plugin

As a rule, Zabbix Agent is installed on the Zabbix Server machine. It gathers data, which is lately collected by the Zabbix Server. The user can have full access to it via the web interface.

Implementation

  • The plugin uses github.com/jackc/pgx — PG driver and toolkit for Go to connect to Postgres. The plugin supports the database/sql interface, which is a universal interface in Golang for SQL-like databases. Connections in the upcoming version of these databases are made via this database/sql interface.
  • The handler is the basic unit of the plugin, and all queries are executed in separate handlers and then sent to the Zabbix Server. We have made an effort to create an efficient connection to, and to optimize operations of the database.
  • Some metrics are generated in JSON and grouped as dependency items and discovery rules.

Basic features

  • Zabbix Agent 2 allows for keeping a permanent connection to the PostgreSQL database. In earlier versions, to connect to PostgreSQL, we had to make psql calls affecting the server load.
  • Zabbix Agent 2 provides for flexible polling intervals, which can be customized in templates.
  • The plugin is compatible with PostgreSQL 10+ and Zabbix Server version 4.4+ and Zabbix Agent.
  • In the latest plugin release, a new feature is introduced to allow for monitoring several PostgreSQL instances by one Agent using sessions.

Plugin connection parameters

There are three levels of the plugin connection parameters:

  • Global (global for all Zabbix Agent plugins).
  • Macros.
  • Sessions.

Macros and Sessions parameters are used to define a connection to the database.

Macros level

Macros should be familiar to all users of the first Zabbix Agent. In the template, we can define macros for the user, database, etc.

Filling in the template

Then we need to fill in the Key definition as a parameter.

Key definition as a parameter

Here, the sequence is important — URI, USER, and PASSWORD. The first two parameters are mandatory. If no password is given, an empty string is used as a password. If there is no database name, the default database name is used — ‘Postgres

NOTE. There may be parameters No. 5, 6, 7, etc., which can be used as parameters for dynamic queries in the handler.

This way to connect to the database is considered as default. In the official template for PostgreSQL monitoring on the Zabbix website, macros and keys are already specified, so the setup can be done in no time.

Sessions level

Each session has its own connection parameters. So, by creating multiple sessions, we can create multiple connections to several databases.

Sessions are defined in the Zabbix Agent configuration file — zabbix_agent2.conf.

Defining four parameters for session ‘Test’

  • To define the session ‘Test’, in the configuration file, you need to go to:
# Plugins.Postgres.Sessions.
  • Then, you fill in the name of the session:
# Plugins.Postgres.Sessions.Test.Uri=tcp://localhost:5432
  • Then, you do the same for the other three parameters and define macros for the session in the template:

Defining connection parameters and the name of the session in {$PG.SESSION}.

  • You need to fill in the session Name as the only parameter for the Key:

Now the agent will automatically pick up the connection parameters for this session name from the configuration file and start running.

Metrics monitored by the plugin

In the upcoming release, the plugin will be able to gather more than 98 metrics covering almost all the important parameters in the database, including:

  • number of connections,
  • database size,
  • info about archive files,
  • number of ‘bloating’ tables,
  • replication status,
  • background writer processes activity, etc.

Some of these metrics are not very informative without the operating system parameters. However, Zabbix Agent 2 can already gather all these metrics using the operational system plugin. In zabbix.connect, we have all the needed templates to get a full picture of the database health.

 

How to get a simple metric?

1. Create a handler (file) to get a new metric, for instance, the uptime metric: — zabbix/src/go/plugins/postgres/handler_uptime.go.

NOTE. The handler definitions for the current and the upcoming version are available in the article on the PostgreSQL monitoring plugin.

2. Import package to work with Postgres and specify the unique key for the new metric:

package postgres
const (
keyPostgresUptime = "pgsql.uptime"
)

3. Find the handler with the following query:

func uptimeHandler(ctx context.Context, conn PostgresClient, _ string, _
map[string]string, _ ...string) (interface{}, error) {
var uptime float64
query := `SELECT date_part('epoch', now() - pg_postmaster_start_time());

4. Define the variable, which will hold the result.

NOTE. The matching between the Golang variables and the Postgres variables can be found on the pgx documentation page.

5. Define the query for the new metric:

row, err := conn.QueryRow(ctx, query)
if err != nil {
...
}
err = row.Scan(&uptime)
if err != nil {
...
}
return uptime, nil

Here, we:

  • perform the query,
  • check if there are any errors,
  • scan the results for the Golang variable,
  • scan for errors again, and
  • finally, return the results.

6. Register the key of your new metric in metrics.go:

var metrics = metric.MetricSet{
....,
keyPostgresUptime: metric.New("Returns uptime.",
[]*metric.Param{paramURI, paramUsername,
paramPassword,paramDatabase}, false),
}

In the metrics variable, all the metrics in the plugin are defined. Here, we need to add the description of the new metric.

Now, we need to recompile the agent and start it running as we’ll have all the new metrics on board.

Custom metrics

In the upcoming version, the agent will be able to execute queries in separate sql files located on your local machine and return the result to the Zabbix Server alongside the default metrics. To create the sql file with the query:

  • in zabbix_agent2.conf, specify the path to the directory with the sql files named Plugins.Postgres.CustomQueriesPath.
  • in the template, provide the name for the sql file as the 5th parameter for the new key — pgsql.query.custom and specify the additional parameters for this query if needed.

Custom metric example

1. Let’s consider a simple table containing three rows.

  • # CREATE table example (phrase text, year int );
  • # SELECT * FROM example;

2. I have created two files retrieving data from this table:

  • $touch custom2.sql.
    — $echo “SELECT * FROM example;” > custom2.sql.
  • $touch custom1.sql.
    — $echo “SELECT phrase FROM example WHERE year=$1;” > custom1.sql.

In the first file, no parameters are required, while the ‘WHERE’ statements is specified in the second file, so we’ll need one additional parameter.

3. I have added the path to the sql files in zabbix_agent2.conf:

Plugins.Postgres.CustomQueriesPath=/path/to/file

4. In the templates, I need to create the key — pgsql.query.custom. Here, the first four parameters are connection parameters, and the name of the file containing the query is defined as the parameter (in this case, custom2).

Then, it is necessary to do the same for the second file. However, the second query requires some additional parameters. These parameters are specified as parameter 6. Here, for the custom1 file, the ‘2021’ parameter will be used for the query.

After these two keys are created, Zabbix Agent will automatically find them, execute them, and soon the results will appear in the Latest data.

The result for each query appears in text format

As the first one starts in 2020 and the second one — in 2021, the parameter has been used for the second key.

Conclusion

The new version of the plugin with custom metrics will hopefully become available with the next Zabbix Server release.

Questions & Answers

Question. What is the point of specifying the database name in that key? Are any metrics stored there? Should we create a separate database for Zabbix?

Answer. You can use the Postgres default database, but it is recommended to create a separate database as it is more secure to get monitoring metrics from a separate database. 

Question. Does the Zabbix user both in the OS and in the database need any special permissions to get this going? 

Answer. Two permissions should be defined. These permissions are specified in the instruction for the PostrgeSQL monitoring plugin for Zabbix. 

Question. Will Zabbix work independently of the pg_stat_statements module? 

Answer. It gathers some data from the pg_stat_statements module. Without this module installed, we will not be able to get some crucial metrics from it, though the module itself will be running.

Question. Can the plugin work in the passive mode or in the active mode only?

Answer. The plugin is working similar to the Zabbix Agent — it pushes the data.

Question. Does this Postgres plugin work automatically against the Zabbix backend if we use Postgres as Zabbix backend?

Answer. If you use Agent 2 with this plugin, then it will work out of the box though you’ll have to apply templates and create items, etc. Otherwise, you’ll have to update it.

Question. What is the advantage of using the plugin over Zabbix user parameters, which are custom scripts that the agent can execute?

Answer. If you use user parameters, connections to Postgres are established through psql calls. This can create additional server load. The plugin establishes a permanent connection entailing fewer overheads.

Supercharge Zabbix with powerful insights

Post Syndicated from alexk original https://blog.zabbix.com/supercharge-zabbix-with-powerful-insights/12841/

A new set of trigger functions for long-term analysis of trend data will allow Zabbix to analyze historical data and generate alerts on detected anomalies.

Contents

I. Types of monitoring (0:39)

II. Zabbix 5.2 new functions (5:34)

III. In a nutshell (13:28)
IV. Questions & Answers (14:17)

Types of monitoring

Let’s start with a philosophical observation. In many cases, configuring monitoring entities is a pretty straightforward exercise. For instance, we know that computers should have some free disk space as applications won’t work otherwise; that CPU should not run at 100oC; that user-facing application should respond in less than a couple of seconds, otherwise, users will notice and complain. To be alerted when any of these expectations fail, we need to use triggers. A trigger can be as simple as {Host:cpu.temp.avg(5m)} > 100.

However, in some situations, it is difficult to decide right from wrong. Some cases can’t be evaluated without a proper context. For instance, is it OK if RAM is 70% full?  The answer is our favorite ‘it depends’. If RAM was just 20% full a week ago, chances are big that some application is leaking and your memory usage will continue growing. But if your RAM usage stays at 70% for three years in a row, there are even better chances that it stays so for another three years.

Another it-depends example is web traffic monitoring. Intuitively, we know that it’s perfectly normal to have uneven traffic distribution across days of week or months. But every website has its usage patterns, so even when we figure out what is normal and what’s not for one specific website, it’s difficult to scale this knowledge to other websites.

Web traffic monitoring

So, in the grand scheme of things, it all boils down to finding a good baseline for parameters we want to monitor. And baselines are usually defined by previous knowledge.

So, in such cases, instead of figuring out a fixed threshold (some fixed value or percentage), we need to figure out data points in the past that we want to compare to our current data points.

  • Compare values to known thresholds.
{Host:cpu.temp.avg(5m)} > 100
  • Baseline — compare to unknown thresholds.

Finding the right points in the past (or rather, finding a good interval to look back to) is still something that the user must supply manually, even though we are also working on automating this in the future. But Zabbix 5.2 gives you some tools to make comparisons to baseline way easier.

Web traffic monitoring example

Let’s consider a history of website visits for an imaginary commerce site — shop.example.com.

Commercial site web traffic monitoring

The numbers are different at any given point in time, yet all these are normal in a certain context. Overall, we see a growing trend in 2020 as compared to 2019. But there are seasonal traffic spikes. The biggest ones are around Christmas.

Site administrators like to be informed of any traffic anomalies (such as fraud traffic, for example), but hate false positives caused by seasonal spikes.

If we want to detect anomalies here, we can get an average for some period and compare it to an average for the same period a year before.

If we know that our organic year-to-year growth is not likely to exceed, for instance, 15 %, then it’s seemingly easy to do this in virtually any version of Zabbix: we take the average traffic over 30 days and check if it exceeds the same period a year ago by more than 15 %.

However, there are a few problems with this trigger expression.

1. First, we look 1 year back in history. But if we look into Zabbix 5.0 documentation about triggers, we see this:

This means that we need to keep a full and detailed history for at least 1 year (13 months, in this specific case). It is a passable solution if we ingest the traffic data daily. But what if we do it every minute? What if we do it every minute for a thousand websites?

2. In Zabbix, we specify time as 30d and 365d. As you may know, in Zabbix, this is just a fancy way to specify 187,200 and 68,328,000 seconds. Zabbix 5.0 doesn’t have the time suffix for a month and a year just because this cannot be simply translated to the number of seconds. Even though 30d is very close to 28d and 31d, it’s still not the same.

3. The result of avg() function with or without the second time shift parameter always depends on the specific time of the calculation. This is because Zabbix calculates time shifts by subtracting the interval from the current time. This makes it impossible to calculate aggregates between, for instance, the first and the last day of a week, a month, or a year.

Zabbix 5.2 new functions

That is why we introduce new trigger functions, which address all the specified issues. We also added few other trigger features, which improve event presentation. These functions are similar to the non-trend counterparts but are optimized for baseline monitoring use cases.

trendavg(period, period_shift)
trendcount(period, period_shift)
trenddelta(period, period_shift)
trendmax(period, period_shift)
trendmin(period, period_shift)
trendmin(period, period_shift)
  • The new functions use trends tables instead of history (do not forget to set proper trend storage period):

  • period and period_shift parameters use the Gregorian calendar instead of the number of seconds.

h (hour), d (day), w (week), M (month), and y (year).

  • These functions are easy on system resources because they do calculations only when a period ends.

In addition to the new trigger functions, we also added the ability to set customized event name.

The customized event name lets you fine-tune how the event looks in the Zabbix UI (in screens like problems and problem widget) and include trigger expression calculation results.

This field is optional, you can continue using the trigger Name field instead.

There is also a new macro {? … }. It can be used for expressions inside the event name.

Triggers

Let’s reconfigure our trigger in the Zabbix 5.2 style.

Zabbix 5.2-style triggers

Let’s see what are the arguments for trendavg() function: 1M and now/M.

  • The first argument means that we use calendar month as an aggregation period. So, depending on the month’s trendavg() will be doing calculations for, it will pick up the first and the last date of the month. The same goes for other possible interval suffixes — h for hour, d for day, w for week, and y for year.
  • The second parameter, as in regular aggregate functions, means a time shift. But to distinguish between old and new types of shifts, we call them period shifts. The period shift denotes the last point in the timeline for our aggregation.

For instance, for October 13, 2020, trendavg(1M) will calculate the value for the period from September 1, 2020, to September 30, 2020, and trendavg(1M, 1M-1y) will calculate the value for the period from September 1, 2019, to September 30, 2019.

Event name field

In Zabbix 5.2, you can continue using the Name field with the content copied to the Event name field. But if you specify the Event name, it will be used for all corresponding events instead.

The Event name supports the new macro {?…}, so you can put another trigger expression inside this macro to show some related calculations. We call it the expression macro. For instance, the Event name will be displayed on the Problem screen as follows:

Formatting functions

This trigger generates problems like this:

It’s already very useful, but this percentage will look better if we could round it up. It wouldn’t hurt to show what month we compare our traffic against. To do that, we have added two formatting functions:

  • fmtnum(digits)

— applicable to ITEM.VALUE, ITEM.LASTVALUE, and expression macros.
fmtnum(2) gives 14.85 instead of 14.8512345.

  • fmttime(format, time_shift)

— applicable to {TIME}.
— uses strftime format codes.
— formats time, for instance, {TIME}.fmttime(“%B,%Y”) gives October,2020.

Let’s see how we can improve our Event name with new formatting functions:

It looks somewhat scary on the trigger configuration screen, but Zabbix will reward us by generating events like this:

But the new functions are not limited to a single use case of comparing some data from a recent period to some past period.

Cloud budget monitoring example

Let’s consider another real-world example. Imagine that your IT department runs some very important services in the Cloud. And, of course, your finance department sends a monthly budget you don’t want to overrun. You receive cloud usage records from one or more cloud providers and ingest this data periodically into monitoring.

You could set up a trigger with a trendsum() by one month to check whether you exceeded your fixed budget in the previous months or not. But you want to know about the budget overrun ASAP. If you exceed your monthly budget in the middle of a month, your quick reaction might save the company money.

In the chart, we see the even distribution of cloud usage costs up to the last dates of September. Then the usage starts going up. When should you start worrying?

Again, the new trend functions come to the rescue.

The solution is to use the period_shift parameter, just not in the past, but rather in the future. For instance, if today’s date is October 22, this expression will calculate the sum() from October 1 to October 31.

  • trendsum(1M,now/M+1M)

There is one problem, though. To save precious computing resources, Zabbix evaluates these functions in triggers only when the period is over. However, these functions are also available in calculated items, and we can use arbitrary calculation intervals there.

So, the solution is to set up a calculated item, use trendsum() in the formula, and specify some reasonable update interval (for instance, one hour or one day).

Here, on the right-hand side of the chart, we see the current period, which is not over yet. Let’s take a look at the item definition.

This is the formula to calculate the current calendar period. Then, we can add a simple trigger referencing this calculated item:

Formula to calculate the current calendar period

You can also use the new expression macro in this trigger. You don’t need to have trend functions anywhere in the formula for this.

 

Once the trigger fires, you will see the following problem on the Problem screen — a nice and clean message containing all the information we need.

Use cases

There are many more possible applications for the new functions besides the examples above. Generally, these trend functions can be applied not only to IT metrics but also to many other real-world KPIs, for example:

— Business performance (to calculate annual revenue, profitability, etc.).
— Sales and marketing (for instance, monthly average, customer acquisition costs, sales target rate).
— Warehousing (such as weekly shipments, return rates, etc.).
— Human resources (for instance, annual training costs, overtime hours, etc.).
— Customer support (such as average response time or the number of issues per month).

We expect these functions to pave the way for Zabbix to new territories, which have been previously occupied by CRMs and other business analytics systems.

In a nutshell

  • Zabbix trend functions — a new way to analyze history without storing historical data.
  • Zabbix trend functions support calendar hours, days, weeks, months, and years.
  • New trigger field Event name – lets us display events with context.
  • New formatting functions let us present numbers and dates in a flexible manner.
  • Long-term data analysis just got easier and better with the new Zabbix 5.2.

Questions & Answers

Question. What’s the maximum time period for these new trigger functions? For how long can we analyze the data?

Answer. The maximum time period is not limited by any hardcoded values. The only limit you should keep in mind is just the size of your trend data history. But there are no limitations in the code whatsoever that would limit this use. You also should keep in mind that the longer is the period the bigger the database load is. That’s also a factor to consider.

Question. Is this trend data that we’re analyzing also going to be stored in the value cache or some other place?

Answer. it’s not stored in the value cache at the moment. These trigger functions recalculate their values only after the period is over. So it’s not of much use for value cache. But if this is required by some demanding applications, we’ll add this in the later versions.

Scaling Zabbix with containers

Post Syndicated from Robert Silva original https://blog.zabbix.com/scaling-zabbix-with-containers/13155/

In this post, a new approach with Zabbix in High Availability is explained, as well as discussed challenges when implementing Zabbix using Docker Swarm with CI / CD and such technologies as Containers, Docker Swarm, Gitlab, and CI/CD.

Contents

I. Zabbix project requirements (0:33)
II. New approach (3:06)

III. Compose file and Deploy (8:08)
IV. Notes (16:32)
V. Gitlab CI/CD (20:34)
VI. Benefits of the architecture (24:57)
VII. Questions & Answers (25:53)

Zabbix project requirements

The first time using Docker was a challenge. The Zabbix environment needed to meet the following requirements:

  • to monitor more than 3,000 NVPS;
  • to be fault-tolerant;
  • to be resilient;
  • to scale the environment horizontally.

There are five ways to install Zabbix — using packages, compiling, Docker, cloud, or appliance.

We used virtual machines or physical servers to install Zabbix directly on the operation system. In this scenario, it is necessary to install the operating system and update it to improve performance. Then you need to install Zabbix, configure the backup of the configuration files and the database.

However, with such an installation, when the services are unavailable as Zabbix Server or Zabbix frontend is down, the usual solution is a human intervention to restart the service or the server, create a new instance, or restore the backup.

Still, we don’t need to assign a specialist to manually solve such issues. The services must be able to restore themselves.

To create a more intelligent environment, we can use some standard solutions — Corosync and Pacemaker. However, there are better solutions for High Availability.

New approach

Zabbix can be deployed using advanced technologies, such as:

  • Docker,
  • Docker Swarm,
  • Reverse Proxy,
  • GIT,
  • CI/CD.

Initially, the instance was divided into various components.

Initial architecture

HAProxy

HAProxy is responsible for receiving incoming connections and directing them to the nodes of the Docker Swarm cluster. So, with each attempt to access the Zabbix frontend, the request is sent to the HAProxy. And it will detect where there is the service listening to HAProxy and redirect the request.

Accessing the frontend.domain

We are sending the request to the HAProxy address to check which nodes are available. If a node is unavailable, the HAProxy will not send the requests to these nodes anymore.

HAProxy configuration file (haproxy.cfg)

When you configure load balancing using HAProxy, two types of nodes need to be defined: frontend and backend. Here, the traefik service is used as an example.

HAProxy listens for connections by the frontend node. In the frontend, we configure the port to receive communications and associate the backend to it.

frontend traefik
mode http
bind 0.0.0.0:80
option forwardfor
monitor-uri /health
default_backend backend_traefik

HAProxy can forward requests by the backend nodes. In the backend we define, which services are using the traefik service, the check mode, the servers running the application, and the port to listen to. 

backend backend_traefik
mode http
cookie Zabbix prefix
server DOCKERHOST1 10.250.6.52:8080 cookie DOCKERHOST1 check
server DOCKERHOST2 10.250.6.53:8080 cookie DOCKERHOST2 check
server DOCKERHOST3 10.250.6.54:8080 cookie DOCKERHOST3 check
stats admin if TRUE
option tcp-check

We also can define where the Zabbix Server can run. Here, we have only one Zabbix Server container running.

frontend zabbix_server
mode tcp
bind 0.0.0.0:10051
default_backend backend_zabbix_server
backend backend_zabbix_server
mode tcp
server DOCKERHOST1 10.250.6.52:10051 check
server DOCKERHOST2 10.250.6.53:10051 check
server DOCKERHOST3 10.250.6.54:10051 check
stats admin if TRUE
option tcp-check

NFS Server

NFS Server is responsible for storing the mapped files in the containers.

NFS Server

After installing the packages, you need to run the following commands to configure the NFS Server and NFS Client:

NFS Server

mkdir /data/data-docker
vim /etc/exports
/data/data-docker/ *(rw,sync,no_root_squash,no_subtree_check)

NFS Client

vim /etc/fstab :/data/data-docker /mnt/data-docker nfs defaults 0 0

Hosts Docker and Docker Swarm

Hosts Docker and Docker Swarm are responsible for running and orchestrating the containers.

Swarm consists of one or more nodes. The cluster can be of two types:

  • Managers that are responsible for managing the cluster and can perform workloads.
  • Workers that are responsible for performing the services or the loads.

Reverse Proxy

Reverse Proxy, another essential component of this architecture, is responsible for receiving an HTTP and HTTPS connections, identifying destinations, and redirecting to the responsible containers.

Reverse Proxy can be executed using nginx and traefik.

In this example, we have three containers running traefik. After receiving the connection from HAProxy, it will search for a destination container and send the package to it.

Compose file and Deploy

The Compose file — ./docker-compose.yml — a YAML file defining services, networks, and volumes. In this file, we determine what image of Zabbix Server is used, what network the container is going to connect to, what are the service names, and other necessary service settings.

Reverse Proxy

Here is the example of configuring Reverse Proxy using traefik.

traefik:
image: traefik:v2.2.8
deploy:
placement:
constraints:
- node.role == manager
replicas: 1
restart_policy:
condition: on-failure
labels:
# Dashboard traefik
- "traefik.enable=true"
- "traefik.http.services.justAdummyService.loadbalancer.server.port=1337"
- "traefik.http.routers.traefik.tls=true"
- "traefik.http.routers.traefik.rule=Host(`zabbix-traefik.mydomain`)"
- "[email protected]"

where:

traefik: — the name of the service (in the first line).
image: — here, we can define which image we can use.
deploy: — rules for creating the deploy.
constraints: — a place of deployment.
replicas: — how many replicas we can create for this service.
restart_policy: — which policy to use if the service has a problem.
labels: — defining labels for traefik, including the rules for calling the service.

Then we can define how to configure authentication for the dashboard and how to redirect all HTTP connections to HTTPS.

# Auth Dashboard - "traefik.http.routers.traefik.middlewares=traefik-auth" - "traefik.http.middlewares.traefik-auth.basicauth.users=admin:" 
# Redirect all HTTP to HTTPS permanently - "traefik.http.routers.http_catchall.rule=HostRegexp(`{any:.+}`)" - "traefik.http.routers.http_catchall.entrypoints=web" - "traefik.http.routers.http_catchall.middlewares=https_redirect" - "traefik.http.middlewares.https_redirect.redirectscheme.scheme=https" - "traefik.http.middlewares.https_redirect.redirectscheme.permanent=true"

Finally, we define the command to be executed after the container is started.

command:
- "--api=true"
- "--log.level=INFO"
- "--providers.docker.endpoint=unix:///var/run/docker.sock"
- "--providers.docker.swarmMode=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.file.directory=/etc/traefik/dynamic"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"

Zabbix Server

Zabbix Server configuration can be defined in this environment — the name of the Zabbix Server, image, OS, etc.

zabbix-server:
image: zabbix/zabbix-server-mysql:centos-5.0-latest
env_file:
- ./envs/zabbix-server/common.env
networks:
- "monitoring-network"
volumes:
- /mnt/data-docker/zabbix-server/externalscripts:/usr/lib/zabbix/externalscripts:ro
- /mnt/data-docker/zabbix-server/alertscripts:/usr/lib/zabbix/alertscripts:ro
ports:
- "10051:10051"
deploy:
<<: *template-deploy
labels:
- "traefik.enable=false"

In this case, we can use environment 5.0. Here, we can define, for instance, database address, database username, number of pollers we will start, the path for external and alert scripts, and other options.

In this example, we use two volumes — for external scripts and for alert scripts that must be stored in the NFS Server.

For this Zabbix, Server traefik is not enabled.

Frontend

For the frontend, we have another option, for instance, using the Zabbix image.

zabbix-frontend:
image: zabbix/zabbix-web-nginx-mysql:alpine-5.0.1
env_file:
- ./envs/zabbix-frontend/common.env
networks:
- "monitoring-network"
deploy:
<<: *template-deploy
replicas: 5
labels:
- "traefik.enable=true"
- "traefik.http.routers.zabbix-frontend.tls=true"
- "traefik.http.routers.zabbix-frontend.rule=Host(`frontend.domain`)"
- "traefik.http.routers.zabbix-frontend.entrypoints=web"
- "traefik.http.routers.zabbix-frontend.entrypoints=websecure"
- "traefik.http.services.zabbix-frontend.loadbalancer.server.port=8080"

Here, 5 replicas mean that we can start 5 Zabbix frontends. This can be used for more extensive environments, which also means that we have 5 containers and 5 connections.

Here, to access the frontend, we can use the ‘frontend.domain‘ name. If we use a different name, access to the frontend will not be available.

The load balancer server port defines to which port the container is listening and where the official Zabbix frontend image is stored.

Deploy

Up to now, deployment has been done manually. You needed to connect to one of the services with the Docker Swarm Manager function, enter the NFS directory, and deploy the service:

# docker stack deploy -c docker-compose.yaml zabbix

where -c defines the compose file’s name and ‘zabbix‘ — the name of the stack.

Notes

Docker Image

Typically, Docker official images from Zabbix are used. However, for the Zabbix Server and Zabbix Proxy is not enough. In production environments, additional patches are needed — scripts, ODBC drivers to monitor the database. You should learn to work with Docker and to create custom images.

Networks

When creating environments using Docker, you should be careful. The Docker environment has some internal networks, which can be in conflict with the physical network. So, it is necessary to change the default networks — Docker network overlay and Docker bridge.

Custom image

Example of customizing the Zabbix image to install ODBC drive.

ARG ZABBIX_BASE=centos 
ARG ZABBIX_VERSION=5.0.3 
FROM zabbix/zabbix-proxy-sqlite3:${ZABBIX_BASE}-${ZABBIX_VERSION}
ENV ORACLE_HOME=/usr/lib/oracle/12.2/client64
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/oracle/12.2/client64/lib
ENV PATH=$PATH:/usr/lib/oracle/12.2/client64/lib

Then we install ODBC drivers. This script allows for using ODBC drivers for Oracle, MySQL, etc.

# Install ODBC 
COPY ./drivers-oracle-12.2.0.1.0 /root/ 
COPY odbc.sh /root 
RUN chmod +x /root/odbc.sh && \ 
/root/odbc.sh

Then we install Python packages.

# Install Python3 
COPY requirements.txt /requirements.txt
WORKDIR /
RUN yum install -y epel-release && \ 
yum search python3 && \ 
yum install -y python36 python36-pip && \ 
python3 -m pip install -r requirements.txt
# Install SNMP 
RUN yum install -y net-snmp-utils net-snmp wget vim telnet traceroute

With this image, we can monitor databases, network devices, HTTP connections, etc.

To complete the image customization, we need to:

  1. build the image,
  2. push to the registry,
  3. deploy the services.

This process is performed manually and should be automated.

Gitlab CI/CD

With CI/CD, you don’t need to run the process manually to create the image and deploy the services.

1. Create a repository for each component.

  • Zabbix Server
  • Frontend
  • Zabbix Proxy

2. Enable pipelines.
3. Create .gitlab-ci.yml.

Creating .gitlab-ci.yml file

Benefits of the architecture

  • If any Zabbix component stops, Docker Swarm will automatically start a new service/container.
  • We don’t need to connect to the terminal to start the environment.
  • Simple deployment.
  • Simple administration.

Questions & Answers

Question. Can such a Docker approach be used in extremely large environments?

Answer. Docker Swarm is already used to monitor extremely large environments with over 90,000 and over 50 proxies.

Question. Do you think it’s possible to set up a similar environment with Kubernetes?

Answer. I think it is possible, though scaling Zabbix with Kubernetes is more complex than with Docker Swarm. 

Save 2 clicks, test data preprocessing

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/save-2-clicks-test-data-preprocessing/13249/

This topic is related to template development from scratch, bulk data input, and a lot of dependable items having different preprocessing steps each.

If these keywords resonate with you, keep reading.

Story stars back in a day when a “Test now” button was invented inside the item preprocessing section. In this way, we can simulate the entire preprocessing stack. A very cool feature to have.

Nevertheless, we tend to copy over and over again the data input:

While this is fine for small projects with simple preprocessing steps which match our knowledge league. It is not so OK in we have ambition to solve the impossible. Figure out a data preprocessing rule(s) which suit our needs.

For a template development process, the solution is to skip data input and inject a static value in the very first preprocessing step. Let me introduce the concept.

JavaScript preprocessing step 1:

return 'this is input text';

JavaScript preprocessing step 2:

return value.replace("text","data");

Now we have static input, no need to spend time to “click” the input data.

Sometimes the input is not just one line but multiple lines, and tabs, and spaces and double quotes and single quotes and special characters. To respect all these things, we must get our hands dirty with the base64 format.

To prepare input data as base64 string, on windows systems it can be easily done with Notepad++. Just select all text and select “Plugin commands” => “Base64 Encode” (functionality is not there with a lite version of Notepad++):

After that, we need to copy all content to clipboard:

Create the first JavasSript preprocessing with the content from the clipboard. Here is the same example:

return 'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTE2Ij8+DQo8am9ibG9nPg0KICA8am9iX2xvZ192ZXJzaW9uIHZlcnNpb249IjIuMCIvPg0KICA8aGVhZGVyPg0KICAgIDxzdGFydF90aW1lPkpvYiBzdGFydGVkOiBNb25kYXksIEF1Z3VzdCAxMCwgMjAyMCBhdCAxOjAwOjA1IFBNPC9zdGFydF90aW1lPg0KICA8L2hlYWRlcj4NCiAgPGZvb3Rlcj4NCiAgICA8ZW5kX3RpbWU+Sm9iIGVuZGVkOiBNb25kYXksIEF1Z3VzdCAxMCwgMjAyMCBhdCAzOjE3OjUwIFBNPC9lbmRfdGltZT4NCiAgICA8T3BlcmF0aW9uRXJyb3JzIFR5cGU9ImpvYmZ0cl9qb2Jjb21wbF9zaHV0ZG93biI+Sm9iIGNvbXBsZXRpb24gc3RhdHVzOiBDYW5jZWxlZCBieSBzZXJ2aWNlIHNodXRkb3duPC9PcGVyYXRpb25FcnJvcnM+DQogICAgPGNvbXBsZXRlU3RhdHVzPjE8L2NvbXBsZXRlU3RhdHVzPg0KICAgIDxhYm9ydFVzZXJOYW1lPlRoZSBqb2Igd2FzIGNhbmNlbGVkIGJlY2F1c2UgdGhlIHJlc3BvbnNlIHRvIGEgbWVkaWEgcmVxdWVzdCBhbGVydCB3YXMgQ2FuY2VsLCBvciBiZWNhdXNlIHRoZSBhbGVydCB3YXMgY29uZmlndXJlZCB0byBhdXRvbWF0aWNhbGx5IHJlc3BvbmQgd2l0aCBDYW5jZWwsIG9yIGJlY2F1c2UgdGhlIEJhY2t1cCBFeGVjIEpvYiBFbmdpbmUgc2VydmljZSB3YXMgc3RvcHBlZC48L2Fib3J0VXNlck5hbWU+DQogIDwvZm9vdGVyPg0KPC9qb2Jsb2c+DQo=';

In the next step, there must be decoding scheduled. Kindly copy the code 1:1. Configure it as a second preprocessing step:

var k = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
function d(e) {
    var t, n, o, r, a = "",
        i = "",
        c = "",
        l = 0;
    for (/[^A-Za-z0-9+/=]/g.exec(e) && alert("1"), e = e.replace(/[^A-Za-z0-9+/=]/g, ""); t = k.indexOf(e.charAt(l++)) << 2 | (o = k.indexOf(e.charAt(l++))) >> 4, n = (15 & o) << 4 | (r = k.indexOf(e.charAt(l++))) >> 2, i = (3 & r) << 6 | (c = k.indexOf(e.charAt(l++))), a += String.fromCharCode(t), 64 != r && (a += String.fromCharCode(n)), 64 != c && (a += String.fromCharCode(i)), t = n = i = "", o = r = c = "", l < e.length;);
    return unescape(a)
}
return d(value);

This is how it looks like:

Go to testing section and ensure the data in Zabbix is similar as it was in Notepad++:

Data has been successfully decoded. Multiple lines, quite original stuff. The tabs are not visible with a naked human eye but they are there, I promise!

Now we can “play” out the next preprocessing steps and try out different things:

When one preprocessing has been figured out, just clone the item and start to developing a next one. Sure, if we succeed the ambition, it will be required to spend 5 minutes to go through all items, remove first 2 steps and link the item to master key 😉

Ok. That is it for today. Bye.

By the way, on Linux system to have base64 string we only need:

  1. A command where the output entertains us
  2. Pipe it to ‘base64 -w0’
systemctl list-unit-files --type=service | base64 -w0