第五章 Zabbix 监控基础架构
zabbix-agent(数据采集)—>zabbix-server(数据分析|报警)—> 数据库(数据存储)<—zabbix web(数据展示)
data:image/s3,"s3://crabby-images/730f7/730f775a7ecd0f78962630e96405fb263f3f72a5" alt=""
第六章 zabbix 快速监控主机
1.安装zabbix-agent
[root@web01 ~]# rpm -ivh https://mirror.tuna.tsinghua.edu.cn/zabbix/zabbix/4.0/rhel/7/x86_64/zabbix-agent-4.0.11-1.el7.x86_64.rpm
2.配置zabbix-agent
[root@web01 ~]# grep "^[a-Z]" /etc/zabbix/zabbix_agentd.conf
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.1.61
ServerActive=127.0.0.1
Hostname=Zabbix server
Include=/etc/zabbix/zabbix_agentd.d/*.conf
3.启动zabbix-agent并检查
[root@web01 ~]# systemctl start zabbix-agent.service
[root@web01 ~]# systemctl enable zabbix-agent.service
[root@web01 ~]# netstat -lntup|grep 10050
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 10351/zabbix_agentd
tcp6 0 0 :::10050 :::* LISTEN 10351/zabbix_agentd
4.zabbix-web界面,添加主机
data:image/s3,"s3://crabby-images/c2ad4/c2ad47bf9a3e56c3b862fd40d43e771276493fba" alt=""
data:image/s3,"s3://crabby-images/9a383/9a383798c29acdda6f2bafd6dc52e9597d81d253" alt=""
data:image/s3,"s3://crabby-images/d7b04/d7b047d0a24abcbee130455eb35f47fe258771e5" alt=""
第七章 自定义监控主机小试身手
1.监控需求
监控TCP11种状态集
2.命令行实现
[root@web01 ~]# netstat -ant|grep -c TIME_WAIT
55
[root@web01 ~]# netstat -ant|grep -c LISTEN
12
3.编写zabbix监控文件(传参形式)
[root@web01 ~]# cat /etc/zabbix/zabbix_agentd.d/tcp_status.conf
UserParameter=tcp_state[*],netstat -ant|grep -c $1
root@web01 ~]# systemctl restart zabbix-agent.service
4.server端进行测试
[root@m01 ~]# rpm -ivh https://mirrors.tuna.tsinghua.edu.cn/zabbix/zabbix/4.0/rhel/7/x86_64/zabbix-release-4.0-1.el7.noarch.rpm
[root@m01 ~]# yum install zabbix-get.x86_64 -y
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k tcp_state[TIME_WAIT]
51
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k tcp_state[LISTEN]
12
5.web端添加
data:image/s3,"s3://crabby-images/ae339/ae339d0c6b0d7bec444cd7d4f7b15d3becf6cbe7" alt=""
data:image/s3,"s3://crabby-images/6b5de/6b5dee49746b9493289231644adf342c570c63a7" alt=""
6.克隆监控项
由于TCP有多种状态,需要添加多个监控项,我们可以使用克隆快速达到创建的效果
data:image/s3,"s3://crabby-images/efd40/efd4079c666918bf74e5259a1329420a80566056" alt=""
data:image/s3,"s3://crabby-images/aebd6/aebd6697badfba5515a3a4f0b736e140d6d31775" alt=""
data:image/s3,"s3://crabby-images/b7f64/b7f64a2544b77f73140a75eeb674a6f7e578aeab" alt=""
其他的状态依次添加即可
7.创建图形
data:image/s3,"s3://crabby-images/b2375/b2375f4e1051ed64a9c6b2670ea3c26762010ea3" alt=""
8.查看图形
data:image/s3,"s3://crabby-images/98d4a/98d4a623ad2d4c3f3d58e09964f89c8d2c6ae241" alt=""
9.设置触发器
data:image/s3,"s3://crabby-images/9f45f/9f45f1f7ff7debdccc8fb100c96f5edfd3779514" alt=""
data:image/s3,"s3://crabby-images/76be9/76be902e2d33cd4c0bd06ea684d6ef74171aaf18" alt=""
data:image/s3,"s3://crabby-images/ea7f0/ea7f06ace8989ea10a099dd4f6ec7781708ef58e" alt=""
data:image/s3,"s3://crabby-images/3bd62/3bd62b64ad41f0386c0b5fe2710954427912042f" alt=""
第八章 邮件报警
1.定义发件人
data:image/s3,"s3://crabby-images/5656a/5656a53453a6b7d01ad19778e933da12317fe58b" alt=""
data:image/s3,"s3://crabby-images/ae7fa/ae7fa424acd7bb5a62a8084e650da1d7b3de7397" alt=""
2.定义收件人
data:image/s3,"s3://crabby-images/cb0d6/cb0d6e380fb6ea5cb069328685bab37843584f92" alt=""
data:image/s3,"s3://crabby-images/fb888/fb888179ba70fa8189568120505f78f693ffab93" alt=""
data:image/s3,"s3://crabby-images/3f10b/3f10bcd4eba4c9776c7aee6bec51ce12c5817a82" alt=""
data:image/s3,"s3://crabby-images/c51c4/c51c4b3a53894cd4417717c549d9cc3864d1d241" alt=""
3.自定义报警内容过
定制报警内容:
https://www.zabbix.com/documentation/4.0/zh/manual/appendix/macros/supported_by_location
参考博客
https://www.cnblogs.com/bixiaoyu/p/7302541.html
发送警告
报警邮件标题可以使用默认信息,亦可使用如下中文报警内容
名称:Action-Email
默认标题:故障{TRIGGER.STATUS},服务器:{HOSTNAME1}发生: {TRIGGER.NAME}故障!
告警主机:{HOSTNAME1}
告警时间:{EVENT.DATE} {EVENT.TIME}
告警等级:{TRIGGER.SEVERITY}
告警信息: {TRIGGER.NAME}
告警项目:{TRIGGER.KEY1}
问题详情:{ITEM.NAME}:{ITEM.VALUE}
当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}
事件ID:{EVENT.ID}
恢复警告
恢复标题:恢复{TRIGGER.STATUS}, 服务器:{HOSTNAME1}: {TRIGGER.NAME}已恢复!
恢复信息:
告警主机:{HOSTNAME1}
告警时间:{EVENT.DATE} {EVENT.TIME}
告警等级:{TRIGGER.SEVERITY}
告警信息: {TRIGGER.NAME}
告警项目:{TRIGGER.KEY1}
问题详情:{ITEM.NAME}:{ITEM.VALUE}
当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}
事件ID:{EVENT.ID}
第九章 微信报警
1.查看配置文件里的脚本目录路径
[root@m01 ~]# grep "^AlertScriptsPath" /etc/zabbix/zabbix_server.conf
AlertScriptsPath=/usr/lib/zabbix/alertscripts
2.将weixin.py放在zabbix特定目录
[root@m01 /usr/lib/zabbix/alertscripts]# ll
总用量 4
-rwxr-xr-x 1 root root 1344 8月 7 21:58 weixin.py
3.配置发信人
data:image/s3,"s3://crabby-images/33ba8/33ba8cf18b4cfa369fd2fa40f3332f558474e684" alt=""
data:image/s3,"s3://crabby-images/b36a2/b36a223b2d8db4e1e88f00f204cdfc7e2aa971b6" alt=""
4.配置收信人
data:image/s3,"s3://crabby-images/47f02/47f029466fb28b975ec2bcc36437b918e45ca817" alt=""
5.登陆企业微信公众号添加账户
https://work.weixin.qq.com/wework_admin/loginpage_wx
1.登陆后在企业号上新建应用
data:image/s3,"s3://crabby-images/61caf/61caf5ca70e6a4f48e4e286a3af2606cf7495397" alt=""
2.上传logo,填写应用名称 ,应用介绍等
data:image/s3,"s3://crabby-images/993df/993df8a4089dbade9f789c05f2298a34462d1555" alt=""
3.查看启动应用
同时会生成应用的AgentId以及Secret,这个在后面步骤会有用
data:image/s3,"s3://crabby-images/c4503/c45034eb37b7719fed9329511c0fc4cde38c1785" alt=""
4.接口调用测试
http://work.weixin.qq.com/api/devtools/devtool.php
data:image/s3,"s3://crabby-images/a4abd/a4abdcd4c3eb2835782e34d27d4c2fae3315099f" alt=""
这里的corpid为公司ID
data:image/s3,"s3://crabby-images/25ed0/25ed01b18fbcd521b2bf637f737eeb32c2f4ae56" alt=""
Corpsecret就是刚才创建应用生成的Secrt,确认没问题填写进去然后下一步
如果没问题会显示200状态码
data:image/s3,"s3://crabby-images/af183/af183d93e4b090b0d97eaaee79dea3ef654522b6" alt=""
6.添加成员
data:image/s3,"s3://crabby-images/5bfb6/5bfb695841866dc37a97b279ec2d6e7ee956b40d" alt=""
7.关注公众号
data:image/s3,"s3://crabby-images/02ef5/02ef5dc72c89d2d88dcfd41d7401f7ec41416b9e" alt=""
8.查看自己的账号
data:image/s3,"s3://crabby-images/1cf14/1cf142461000d483638fad27d31d744a2b03ed6e" alt=""
9.修改脚本里的信息
[root@m01 /usr/lib/zabbix/alertscripts]# cat weixin.py
..............
corpid='微信企业号corpid'
appsecret='应用的Secret'
agentid=应用的id
..............
10.发信测试
[root@m01 /usr/lib/zabbix/alertscripts]# python weixin.py 你的账号 '发信测试' ‘微信测试消息’
11.微信号上查看
data:image/s3,"s3://crabby-images/7a8aa/7a8aa83061679e20ed340f21011173bf6af2acca" alt=""
12.发送到整个微信组
虽然我们实现了发送到单个用户的功能,但是如果我们的用户比较多,这样还是麻烦的,不过我们可以发送到整个组,其实脚本里已经预留好了配置,只不过默认注释了。
将脚本修改为以下内容,注释掉用户,打开组设置
#!/usr/bin/env python
import requests
import sys
import os
import json
import logging
logging.basicConfig(level = logging.DEBUG, format = '%(asctime)s, %(filename)s, %(levelname)s, %(message)s',
datefmt = '%a, %d %b %Y %H:%M:%S',
filename = os.path.join('/tmp','weixin.log'),
filemode = 'a')
corpid='wwd26fdfb9940e7efa'
appsecret='Btg89FnZfMu0k7l6b4iagmAR5Z9TCgKknYbx-SMQvmg'
agentid=1000005
token_url='https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid=' + corpid + '&corpsecret=' + appsecret
req=requests.get(token_url)
accesstoken=req.json()['access_token']
msgsend_url='https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token=' + accesstoken
#touser=sys.argv[1]
toparty=sys.argv[1]
subject=sys.argv[2]
message=sys.argv[2] + "
" +sys.argv[3]
params={
#"touser": touser,
"toparty": toparty,
"msgtype": "text",
"agentid": agentid,
"text": {
"content": message
},
"safe":0
}
req=requests.post(msgsend_url, data=json.dumps(params))
logging.info('sendto:' + toparty + ';;subject:' + subject + ';;message:' + message)
12.随机发送到指定用户玩笑脚本
#!/bin/bash
num=$(echo $(($RANDOM%28+1)))
name=$(sed -n "${num}p" name.txt)
ok_boy=$(grep -v "${name}" name.txt)
for ok in ${ok_boy}
do
python weixin.py ${ok} "$1" "$2"
done
第十章 自定义模版
1.监控TCP11种状态
编写zabbix配置文件
[root@web01 /etc/zabbix/zabbix_agentd.d]# cat zbx_tcp.conf
UserParameter=ESTABLISHED,netstat -ant|grep -c 'ESTABLISHED'
UserParameter=SYN_SENT,netstat -ant|grep -c 'SYN_SENT'
UserParameter=SYN_RECV,netstat -ant|grep -c 'SYN_RECV'
UserParameter=FIN_WAIT1,netstat -ant|grep -c 'FIN_WAIT1'
UserParameter=FIN_WAIT2,netstat -ant|grep -c 'FIN_WAIT2'
UserParameter=TIME_WAIT,netstat -ant|grep -c 'TIME_WAIT'
UserParameter=CLOSE,netstat -ant|grep -c 'CLOSE'
UserParameter=CLOSE_WAIT,netstat -ant|grep -c 'CLOSE_WAIT'
UserParameter=LAST_ACK,netstat -ant|grep -c 'LAST_ACK'
UserParameter=LISTEN,netstat -ant|grep -c 'LISTEN'
UserParameter=CLOSING,netstat -ant|grep -c 'CLOSING'
2.重启zabbix-agent
[root@web01 ~]# systemctl restart zabbix-agent.service
3.测试监控项
使用zabbix-get命令测试
[root@m01 ~]# yum install zabbix-get.x86_64 -y
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k ESTABLISHED
2
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k LISTEN
12
3.导入模版文件
data:image/s3,"s3://crabby-images/c3610/c3610838f6725a9a03ee7db4f4b55676a789cc97" alt=""
data:image/s3,"s3://crabby-images/256b5/256b545614c851094136c60f71318d3880d38c69" alt=""
data:image/s3,"s3://crabby-images/e8b25/e8b25565264c89b6585bce1ab2e10ea092d5f77b" alt=""
4.主机关联模版文件
data:image/s3,"s3://crabby-images/7a4df/7a4dfc4dae25d7db9edfabcf3cee7ea1c0dce076" alt=""
data:image/s3,"s3://crabby-images/07714/077144d2aec1b0f8f30f0a1b92f39f12af34483b" alt=""
5.查看最新数据
data:image/s3,"s3://crabby-images/af308/af30885f6c3cb4adf5653eb6ce67ad7e0ebc440a" alt=""
6.查看图形
data:image/s3,"s3://crabby-images/b1536/b15361644d99940c61bc0d5060171fe008dfcec9" alt=""
第x章 自定义模版监控nginx状态
1.开启监控页面并访问测试
[root@web01 ~]# cat /etc/nginx/conf.d/status.conf
server {
listen 80;
server_name localhost;
location /nginx_status {
stub_status on;
access_log off;
}
}
[root@web01 ~]# curl 127.0.0.1/nginx_status/
Active connections: 1
server accepts handled requests
6 6 6
Reading: 0 Writing: 1 Waiting: 0
2.准备nginx监控状态脚本
[root@web01 /etc/zabbix/zabbix_agentd.d]# cat nginx_monitor.sh
#!/bin/bash
NGINX_COMMAND=$1
CACHEFILE="/tmp/nginx_status.txt"
CMD="/usr/bin/curl http://127.0.0.1/nginx_status/"
if [ ! -f $CACHEFILE ];then
$CMD >$CACHEFILE 2>/dev/null
fi
# Check and run the script
TIMEFLM=`stat -c %Y $CACHEFILE`
TIMENOW=`date +%s`
if [ `expr $TIMENOW - $TIMEFLM` -gt 60 ]; then
rm -f $CACHEFILE
fi
if [ ! -f $CACHEFILE ];then
$CMD >$CACHEFILE 2>/dev/null
fi
nginx_active(){
grep 'Active' $CACHEFILE| awk '{print $NF}'
exit 0;
}
nginx_reading(){
grep 'Reading' $CACHEFILE| awk '{print $2}'
exit 0;
}
nginx_writing(){
grep 'Writing' $CACHEFILE | awk '{print $4}'
exit 0;
}
nginx_waiting(){
grep 'Waiting' $CACHEFILE| awk '{print $6}'
exit 0;
}
nginx_accepts(){
awk NR==3 $CACHEFILE| awk '{print $1}'
exit 0;
}
nginx_handled(){
awk NR==3 $CACHEFILE| awk '{print $2}'
exit 0;
}
nginx_requests(){
awk NR==3 $CACHEFILE| awk '{print $3}'
exit 0;
}
case $NGINX_COMMAND in
active)
nginx_active;
;;
reading)
nginx_reading;
;;
writing)
nginx_writing;
;;
waiting)
nginx_waiting;
;;
accepts)
nginx_accepts;
;;
handled)
nginx_handled;
;;
requests)
nginx_requests;
;;
*)
echo 'Invalid credentials';
exit 2;
esac
3.编写zabbix监控配置文件
[root@web01 ~]# cat /etc/zabbix/zabbix_agentd.d/nginx_status.conf
UserParameter=nginx_status[*],/bin/bash /etc/zabbix/zabbix_agentd.d/nginx_monitor.sh $1
[root@web01 ~]# systemctl restart zabbix-agent.service
4.使用zabbix_get取值
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k nginx_status[accepts]
7
5.导入模版
data:image/s3,"s3://crabby-images/a73cf/a73cf040aae5d722c1bdc947577f023f0f951828" alt=""
6.链接模版
data:image/s3,"s3://crabby-images/702d3/702d30f53f9fd84e2c7dea01f7973335b6861195" alt=""
7.查看数据
data:image/s3,"s3://crabby-images/24a52/24a52f62e7de62c8120303f858d7c231492a28e1" alt=""
第x章 自定义模版监控php状态
1.开启监控页面
[root@web01 ~]# tail -1 /etc/php-fpm.d/www.conf
pm.status_path = /php_status
[root@web01 ~]# cat /etc/nginx/conf.d/status.conf
server {
listen 80;
server_name localhost;
location /nginx_status {
stub_status on;
access_log off;
}
location /php_status {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME html$fastcgi_script_name;
include fastcgi_params;
}
}
[root@web01 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@web01 ~]# systemctl restart nginx.service php-fpm.service
2.访问测试
[root@web01 ~]# curl 127.0.0.1/php_status
pool: www
process manager: dynamic
start time: 08/Aug/2019:22:31:27 +0800
start since: 37
accepted conn: 1
listen queue: 0
max listen queue: 0
listen queue len: 128
idle processes: 4
active processes: 1
total processes: 5
max active processes: 1
max children reached: 0
slow requests: 0
3.准备访问脚本
[root@web01 ~]# cat /etc/zabbix/zabbix_agentd.d/fpm.sh
#!/bin/bash
##################################
# Zabbix monitoring script
#
# php-fpm:
# - anything available via FPM status page
#
##################################
# Contact:
# vincent.viallet@gmail.com
##################################
# ChangeLog:
# 20100922 VV initial creation
##################################
# Zabbix requested parameter
ZBX_REQ_DATA="$1"
ZBX_REQ_DATA_URL="$2"
# Nginx defaults
NGINX_STATUS_DEFAULT_URL="http://localhost/fpm/status"
WGET_BIN="/usr/bin/wget"
#
# Error handling:
# - need to be displayable in Zabbix (avoid NOT_SUPPORTED)
# - items need to be of type "float" (allow negative + float)
#
ERROR_NO_ACCESS_FILE="-0.91"
ERROR_NO_ACCESS="-0.92"
ERROR_WRONG_PARAM="-0.93"
ERROR_DATA="-0.94" # either can not connect / bad host / bad port
# Handle host and port if non-default
if [ ! -z "$ZBX_REQ_DATA_URL" ]; then
URL="$ZBX_REQ_DATA_URL"
else
URL="$NGINX_STATUS_DEFAULT_URL"
fi
# save the nginx stats in a variable for future parsing
NGINX_STATS=$($WGET_BIN -q $URL -O - 2>/dev/null)
# error during retrieve
if [ $? -ne 0 -o -z "$NGINX_STATS" ]; then
echo $ERROR_DATA
exit 1
fi
#
# Extract data from nginx stats
#
#RESULT=$(echo "$NGINX_STATS" | awk 'print $0;match($0, "^'"$ZBX_REQ_DATA"':[[:space:]]+(.*)", a) { print a[1] }')
#RESULT=$(echo "$NGINX_STATS" | grep "$ZBX_REQ_DATA" | awk -F : '{print $2}')
RESULT=$(echo "$NGINX_STATS" | awk -F : "{if($1=="$ZBX_REQ_DATA") print $2}")
if [ $? -ne 0 -o -z "$RESULT" ]; then
echo $ERROR_WRONG_PARAM
exit 1
fi
echo $RESULT
exit 0
[root@web01 ~]# bash /etc/zabbix/zabbix_agentd.d/fpm.sh "total processes" http://127.0.0.1/php_status
5
4.准备zabbix配置文件
[root@web01 ~]# cat /etc/zabbix/zabbix_agentd.d/fpm.conf
UserParameter=php-fpm[*],/etc/zabbix/zabbix_agentd.d/fpm.sh "$1" "$2"
[root@web01 ~]# systemctl restart zabbix-agent.service
4.使用zabbix_get取值
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k php-fpm["total processes",http://127.0.0.1/php_status]
5
5.导入模版
导入之后需要修改一下模版里的宏配置
data:image/s3,"s3://crabby-images/c657d/c657d4041d7d2c884e4086e3ffcee186fc52cd6a" alt=""
第x章 WEB监控
需求,监控页面状态码
data:image/s3,"s3://crabby-images/416b0/416b0cd097ef5d4b628c515130132bf0bca19535" alt=""
data:image/s3,"s3://crabby-images/bd770/bd7704fe8a319be8a5bb4e373c195edc6bb4a44d" alt=""
data:image/s3,"s3://crabby-images/bc4b6/bc4b6d6eff6f861272a87e996a05cb585b292e6a" alt=""
第x章 故障记录
故障1
故障现象:
提示zabbix-server is not running
data:image/s3,"s3://crabby-images/5942b/5942be87dba4cb9bd4026e0b15136a275b24d79e" alt=""
报错日志:
34983:20190807:202215.171 database is down: reconnecting in 10 seconds
34983:20190807:202225.172 [Z3001] connection to database 'zabbix' failed: [1045] Access denied for user 'zabbix'@'localhost' (using password: NO)
故障原因:
zabbix-server的配置文件里配有配置数据库密码
故障解决:
添加正确的数据库账号密码信息
[root@m01 ~]# grep "^DB" /etc/zabbix/zabbix_server.conf
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=zabbix
故障2
故障现象:微信报警失败
报错日志:
[root@m01 ~]# tail -f /var/log/zabbix/zabbix_server.log
Problem name: TIME_WAIT过多
Host: web01
Severity: Average
Original problem ID: 51
'": Traceback (most recent call last):
File "/usr/lib/zabbix/alertscripts/weixin.py", line 7, in <module>
import requests
ImportError: No module named requests
问题原因:
缺少模块 requests
问题解决:
安装缺失的依赖包
[root@m01 ~]# yum install python-pip
[root@m01 ~]# pip install --upgrade pip
[root@m01 ~]# pip install requests
故障3
故障现象:
在server端使用zabbix_get命令测试键值命令时提示警告
[root@m01 ~]# zabbix_get -s 10.0.1.7 -k ESTABLISHED
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
2
问题原因:
zabbix_agent是以普通用户zabbix运行的,而普通用户执行netstat -antp时会有警告,网上查找发现只要不是用p参数就可以以普通用户运行
解决方案:
监控脚本里的命令修改为netstat -ant