监控指标
- 系统指标(内存、CPU、硬盘)
- 文件监控
- 网络监控
- 硬件监控(硬盘温度、电源是否异常、CPU温度),通过IPMI实现
- 业务监控
自定义监控流程:
-
开启自定义监控的功能:
- 在agentd.conf中设置,
- UnsafeUserParameters=1
- UserParameters=key,command
-
写脚本
-
网页上配置监控项、触发器
环境说明
环境 | IP地址 | 主机名 | 需要安装的应用 | 系统版本 |
---|---|---|---|---|
服务端 | 192.168.23.140 | zabbix | lamp zabbix_server zabbix_agent | CentOS 8 |
客户端 | 192.168.23.141 | yc1 | zabbix_agent | CentOS 8 |
写脚本,添加开启自定义监控功能
//创建脚本目录
[root@yc1 ~]# mkdir /scripts
//编写监控脚本
[root@yc1 ~]# vim /scripts/check_process.sh
#!/bin/bash
count=$(ps -ef | grep -Ev "grep|$0" | grep -c "$1")
if [ $count -eq 0 ];then
echo "1"
else
echo "0"
fi
//赋予脚本执行权限
[root@yc1 ~]# chmod +x /scripts/check_process.sh
//安装httpd
[root@yc1 ~]# yum -y install httpd
//开启httpd测试
[root@yc1 ~]# systemctl start httpd
[root@yc1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 *:80 *:*
[root@yc1 ~]# /scripts/check_process.sh httpd
0
//关闭httpd测试
[root@yc1 ~]# systemctl stop httpd
[root@yc1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@yc1 ~]# /scripts/check_process.sh httpd
1
//开启自定义监控并添加指标
[root@yc1 ~]# vim /usr/local/etc/zabbix_agentd.conf
······
#在最后面添加以下内容
UnsafeUserParameters=1 //开启自定义监控
UserParameter=check_apache,/scripts/check_process.sh httpd //监控指标
//重启zabbix
[root@yc1 ~]# pkill zabbix
[root@yc1 ~]# zabbix_agentd
[root@yc1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
//使用服务端测试是否能获取客户端的指标
[root@zabbix ~]# zabbix_get -s 192.168.23.141 -k check_apache
1
网页配置
添加监控项
Configuration --- Hosts --- 客户机的Items --- 新建一个监控项
添加触发器
Configuration --- Hosts --- 客户机的Triggers --- 新建一个触发器
验证
受控机上未安装apache服务,直接发送告警
自定义监控日志
//下载log.py
[root@yc1 ~]# ls /scripts/
check_process.sh log.py
[root@yc1 ~]# chmod +x /scripts/log.py
log.py的作用:检查日志文件中是否有指定的关键字
- 第一个参数为日志文件名(必须有,相对路径、绝对路径均可)
- 第二个参数为seek position文件的路径(可选项,若不设置则默认为/tmp/logseek文件。相对路径、绝对路径均可)
- 第三个参数为搜索关键字,默认为 Error
//安装python36
[root@yc1 ~]# yum -y install python36
//测试脚本
# 监控/etc/httpd/logs/error.log文件,seek position文件为默认的/tmp/logseek,关键字为Error
[root@yc1 ~]# /scripts/log.py /etc/httpd/logs/error.log
0
[root@yc1 ~]# cat /tmp/logseek
0
[root@yc1 ~]# echo 'Error' >> /etc/httpd/logs/error.log
[root@yc1 ~]# /scripts/log.py /etc/httpd/logs/error.log
1
[root@yc1 ~]# cat /tmp/logseek
6
# 监控/etc/httpd/logs/error.log文件,seek position文件为/tmp/myseek,关键字为Failed
[root@yc1 ~]# /scripts/log.py /etc/httpd/logs/error.log /tmp/myseek Failed
0
[root@yc1 ~]# cat /tmp/myseek
6
[root@yc1 ~]# echo 'Failed' >> /etc/httpd/logs/error.log
[root@yc1 ~]# /scripts/log.py /etc/httpd/logs/error.log /tmp/myseek Failed
1
[root@yc1 ~]# cat /tmp/myseek
13
//添加指标
[root@yc1 ~]# vim /usr/local/etc/zabbix_agentd.conf
······
UnsafeUserParameters=1
UserParameter=check_apache,/scripts/check_process.sh httpd
//在最后面添加以下内容
UserParameter=check_logs[*],/scripts/log.py $1 $2 $3
//重启zabbix
[root@yc1 ~]# pkill zabbix
[root@yc1 ~]# zabbix_agentd
[root@yc1 ~]# chmod o+x /var/log/httpd
//使用服务端测试是否能获取客户端的指标
[root@zabbix ~]# zabbix_get -s 192.168.23.141 -k check_logs["/etc/httpd/logs/error.log","/tmp/seek","Error"]
0
网页配置
添加监控项
Configuration --- Hosts --- 客户机的Items --- 新建一个监控项
添加触发器
Configuration --- Hosts --- 客户机的Triggers --- 新建一个触发器
触发验证
[root@yc1 ~]# echo 'Error' >> /etc/httpd/logs/error.log
邮箱验证
自定义监控mysql主从状态
环境说明
环境 | IP地址 | 主机名 | 需要安装的应用 | 系统版本 |
---|---|---|---|---|
master | 192.168.23.143 | master | mariadb | CentOS 8 |
slave | 192.168.23.141 | slave | mariadb | CentOS 8 |
准备工作
//mysql主master机上
[root@master ~]# yum -y install mariadb*
[root@master ~]# systemctl enable --now mariadb
[root@master ~]# systemctl disable --now firewalld
[root@master ~]# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
[root@master ~]# setenforce 0
//mysql从slave机上
[root@slave ~]# yum -y install mariadb*
[root@slave ~]# systemctl enable --now mariadb
[root@slave ~]# systemctl disable --now firewalld
[root@slave ~]# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
[root@slave ~]# setenforce 0
配置mysql主从
//master主配置
[root@master ~]# mysql -uroot
MariaDB [(none)]> grant replication slave on *.* to 'repl'@'192.168.23.141' identified by '123456';
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> exit
Bye
[root@master ~]# vim /etc/my.cnf
#在最后加入如下信息
[mysqld]
log-bin=mysql-bin
server-id=1
[root@master ~]# systemctl restart mariadb
[root@master ~]# mysql -uroot
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 | 328 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.000 sec)
//slave从配置
[root@slave ~]# vim /etc/my.cnf
[mysqld]
server-id=20
relay-log=myrelay
[root@slave ~]# systemctl restart mariadb
[root@slave ~]# mysql -uroot
MariaDB [(none)]> change master to
-> master_host='192.168.23.142',
-> master_user='repl',
-> master_password='123456',
-> master_log_file='mysql-bin.000001',
-> master_log_pos=328;
Query OK, 0 rows affected (0.003 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
//查看从服务器状态
MariaDB [(none)]> show slave status G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.23.142
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 328
Relay_Log_File: myrelay.000003
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
编写脚本
[root@slave ~]# vim /scripts/check_mysql_repl.sh
#!/bin/bash
count=$(mysql -uroot -e 'show slave statusG'|grep 'Running:'|awk '{print $2}'|grep -c 'Yes')
if [ $count -ne 2 ];then
echo '1'
else
echo '0'
fi
//给脚本执行权限
[root@slave ~]# chmod +x /scripts/check_mysql_repl.sh
//测试脚本
[root@slave ~]# /scripts/check_mysql_repl.sh
0
添加指标
[root@slave ~]# vim /usr/local/etc/zabbix_agentd.conf
······
UnsafeUserParameters=1
UserParameter=check_apache,/scripts/check_process.sh httpd
UserParameter=check_logs[*],/scripts/log.py $1 $2 $3
#在最后面添加以下内容
UserParameter=check_mysql_repl,/scripts/check_mysql_repl.sh
//重启zabbix
[root@slave ~]# pkill zabbix
[root@slave ~]# zabbix_agentd
//使用服务端测试是否能获取客户端的指标
[root@zabbix ~]# zabbix_get -s 192.168.23.141 -k check_mysql_repl
0
网页配置
添加监控项
Configuration --- Hosts --- 客户机的Items --- 新建一个监控项
添加触发器
Configuration --- Hosts --- 客户机的Triggers --- 新建一个触发器
注意:该项中的级别应选为high
触发验证
[root@slave ~]# mysql -uroot
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> show slave status G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.23.142
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 328
Relay_Log_File: myrelay.000003
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
邮箱验证