监控mysql主从同步状态是否异常,如果异常,则发生短信或邮寄给管理员
阶段1:开发一个守护进程脚本每30秒实现检测一次。
阶段2:如果同步出现如下错误号(1158,1159,1008,1007,1062),请跳过错误
阶段3:请使用数组技术实现上述脚本(获取主从判断及错误号部分)
[root@slave ~]# mysql -u root -proot -e "show slave statusG;" *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.1.2 #当前的mysql master服务器主机 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 471 Relay_Log_File: relay-log-bin.000002 Relay_Log_Pos: 252 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Master_SSL_Key: Seconds_Behind_Master: 0 #和主库比同步延迟的秒数
准备: egrep "_Running|Behind_Master" slave.log #过滤 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0 [root@slave ~]# egrep "_Running|Behind_Master" slave.log | awk ‘{print $NF}‘ Yes Yes 0
阶段一:开发一个守护进程脚本每30秒实现检测一次。
#!/bin/bash while true do array=($(egrep "_Running|Behind_Master" slave.log|awk ‘{print $NF}‘)) if [ "${array[0]}" == "Yes" -a "${array[1]}" == "Yes" -a "${array[2]}" == "0" ] then echo "MySQL is slave is ok" else char="MySQL slave is not ok" echo "$char" echo "$char"|mail -s "$char" 995345781@qq.com break fi sleep 30 done 执行结果: [root@slave ~]# sh test.sh MySQL is slave is ok MySQL is slave is ok
终极版:
#!/bin/bash #Date:2017-7-3 #Author:xcn(baishuchao@yeah.net) #version 1.0 mysql_cmd="mysql -u root -proot" errorno=(1158 1159 1008 1007 1062) while true do array=($($mysql_cmd -e "show slave statusG"|egrep ‘_Running|Behind_Master|Last_SQL_Errno‘|awk ‘{print $NF}‘)) if [ "${array[0]}" == "Yes" -a "${array[1]}" == "Yes" -a "${array[2]}" == "0" ] then echo "MySQL is slave is ok" else for ((i=0;i<${#errorno[*]};i++)) do if [ "${array[3]}" = "${errorno[$i]}" ];then $mysql_cmd -e "stop slave &&set global sql_slave_skip_counter=1;start slave;" fi done char="MySQL slave is not ok" echo "$char" echo "$char"|mail -s "$char" 995345781@qq.com break fi sleep 30 done
提示:这个脚本可以用于生产环境中,监控mysql主从同步状态是否异常,根据
‘_Running|Behind_Master|Last_SQL_Errno‘
这个进行判断,如果不正常的话则会进一步判断状态码,然后进行输出,则会发邮件或短信给运维人员