Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/what-takes-disk-space/13349/
In today’s class let’s talk about where the disk space goes. Which items and hosts objects consume the disk space the most.
The post will cover things like:
Biggest tables in a database
Biggest data coming to the instance right now
Biggest data inside one partition of the DB table
Print hosts and items which consumes the most disk space
Biggest tables
In general, the leading tables are:
history history_uint |
history_str history_text history_log |
events |
‘history_uint’ will store integers. ‘history’ will store decimal numbers.
‘history_str’, ‘history_text’, ‘history_log’ stores textual data.
In the table ‘events’ goes problem events, internal events, agent auto-registration events, discovery events.
Have a look yourself in a database which tables take the most space. On MySQL:
SELECT table_name, table_rows, data_length, index_length, round(((data_length + index_length) / 1024 / 1024 / 1024),2) "Size in GB" FROM information_schema.tables WHERE table_schema = "zabbix" ORDER BY round(((data_length + index_length) / 1024 / 1024 / 1024),2) DESC LIMIT 8;
On PostgreSQL:
SELECT *, pg_size_pretty(total_bytes) AS total , pg_size_pretty(index_bytes) AS index , pg_size_pretty(toast_bytes) AS toast , pg_size_pretty(table_bytes) AS table FROM (SELECT *, total_bytes-index_bytes-coalesce(toast_bytes, 0) AS table_bytes FROM (SELECT c.oid, nspname AS table_schema, relname AS table_name , c.reltuples AS row_estimate , pg_total_relation_size(c.oid) AS total_bytes , pg_indexes_size(c.oid) AS index_bytes , pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' ) a) a;
Detect big data coming to instance right now
Analyze ‘history_log’ table for the last 30 minutes:
SELECT hosts.host,items.itemid,items.key_, COUNT(history_log.itemid) AS 'count', AVG(LENGTH(history_log.value)) AS 'avg size', (COUNT(history_log.itemid) * AVG(LENGTH(history_log.value))) AS 'Count x AVG' FROM history_log JOIN items ON (items.itemid=history_log.itemid) JOIN hosts ON (hosts.hostid=items.hostid) WHERE clock > UNIX_TIMESTAMP(NOW() - INTERVAL 30 MINUTE) GROUP BY hosts.host,history_log.itemid ORDER BY 6 DESC LIMIT 1\G
With PostgreSQL:
SELECT hosts.host,history_log.itemid,items.key_, COUNT(history_log.itemid) AS "count", AVG(LENGTH(history_log.value))::NUMERIC(10,2) AS "avg size", (COUNT(history_log.itemid) * AVG(LENGTH(history_log.value)))::NUMERIC(10,2) AS "Count x AVG" FROM history_log JOIN items ON (items.itemid=history_log.itemid) JOIN hosts ON (hosts.hostid=items.hostid) WHERE clock > EXTRACT(epoch FROM NOW()-INTERVAL '30 MINUTE') GROUP BY hosts.host,history_log.itemid,items.key_ ORDER BY 6 DESC LIMIT 5 \gx
Re-run the same query but replace ‘history_log’ (in all places) with ‘history_text’ or ‘history_str’.
Which hosts consume the most space
This is a very heavy query. We will go back one day and analyze 6 minutes of that data:
SELECT ho.hostid, ho.name, count(*) AS records, (count(*)* (SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_NAME = 'history_text' and TABLE_SCHEMA = 'zabbix')/1024/1024) AS 'Total size average (Mb)', sum(length(history_text.value))/1024/1024 + sum(length(history_text.clock))/1024/1024 + sum(length(history_text.ns))/1024/1024 + sum(length(history_text.itemid))/1024/1024 AS 'history_text Column Size (Mb)' FROM history_text LEFT OUTER JOIN items i on history_text.itemid = i.itemid LEFT OUTER JOIN hosts ho on i.hostid = ho.hostid WHERE ho.status IN (0,1) AND clock > UNIX_TIMESTAMP(now() - INTERVAL 1 DAY - INTERVAL 6 MINUTE) AND clock < UNIX_TIMESTAMP(now() - INTERVAL 1 DAY) GROUP BY ho.hostid ORDER BY 4 DESC LIMIT 5\G
If “6-minute query” works in a relatively good time frame, try “INTERVAL 60 MINUTE”.
If “INTERVAL 60 MINUTE” works good, try “INTERVAL 600 MINUTE”.
Analyze in partition level (MySQL)
On MySQL, if database table partitioning is enabled we can list the biggest partitions on a filesystem:
ls -lh history_log#*
It will print:
-rw-r-----. 1 mysql mysql 44M Jan 24 20:23 history_log#p#p2021_02w.ibd -rw-r-----. 1 mysql mysql 24M Jan 24 21:20 history_log#p#p2021_03w.ibd -rw-r-----. 1 mysql mysql 128K Jan 11 00:59 history_log#p#p2021_04w.ibd
From previous output, we can take partition name ‘p2021_02w’ and use it in a query:
SELECT ho.hostid, ho.name, count(*) AS records, (count(*)* (SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_NAME = 'history_log' and TABLE_SCHEMA = 'zabbix')/1024/1024) AS 'Total size average (Mb)', sum(length(history_log.value))/1024/1024 + sum(length(history_log.clock))/1024/1024 + sum(length(history_log.ns))/1024/1024 + sum(length(history_log.itemid))/1024/1024 AS 'history_log Column Size (Mb)' FROM history_log PARTITION (p2021_02w) LEFT OUTER JOIN items i on history_log.itemid = i.itemid LEFT OUTER JOIN hosts ho on i.hostid = ho.hostid WHERE ho.status IN (0,1) GROUP BY ho.hostid ORDER BY 4 DESC LIMIT 10;
You can reproduce a similar scenario while listing:
ls -lh history_text#* ls -lh history_str#*
Free up disk space (MySQL)
Deleting a host in GUI will not free up data space on MySQL. It will create empty rows in table where the new data can be inserted. If you want to really free up disk space, we can rebuild partition. At first list all possible partition names:
SHOW CREATE TABLE history\G
To rebuild partition:
ALTER TABLE history REBUILD PARTITION p202101160000;
Free up disk space (PostgreSQL)
On PostgreSQL, there is a process which is responsible for vacuuming the table. To ensure a vacuum has been done lately, kindly run:
SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum FROM pg_stat_all_tables WHERE n_dead_tup > 0 ORDER BY n_dead_tup DESC;
In output, we look at ‘n_dead_tup’ it means a dead tuple.
If the last auto vacuum has not occurred in last 10 days, it’s bad. We have to install a different definition. We can increase vacuum priority by having:
vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 autovacuum_vacuum_threshold = 50 autovacuum_vacuum_scale_factor = 0.01 autovacuum_vacuum_cost_delay = 20ms autovacuum_vacuum_cost_limit = 3000 autovacuum_max_workers = 6
Alright. That is it for today.