MHA工作原理
1. MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃的master保存二进制日志事件(binlog events)
2. 识别含有最新更新的slave
3. 应用差异的中继日志(relay log)到其他的slave
4. 应用从master保存的二进制日志事件(binlog events)
5. 提升一个slave为新的master
6. 使其他的slave连接新的master进行复制
将 i(1)--->i(2)--->i(x) 全部组成一个二进制日志
注意:为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL的半同步复制
案例:实现 MHA 实战案例
注意:CentOS8系统运行报错,不推荐使用
环境:四台主机
172.31.0.17 CentOS7 MHA管理端
172.31.0.28 CentOS8 MySQL8.0 Master
172.31.0.38 CentOS8 MySQL8.0 Slave1
172.31.0.48 CentOS8 MySQL8.0 Slave2
在管理节点上安装两个包mha4mysql-manager和mha4mysql-node
说明:
mha4mysql-manager-0.56-0.el6.noarch.rpm 不支持CentOS 8,只支持CentOS7 以下版本
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 支持MySQL5.7和MySQL8.0 ,但和CentOS8
版本上的Mariadb -10.3.17不兼容
[root@centos8 ~]# ls
anaconda-ks.cfg mha4mysql-manager-0.58-0.el7.centos.noarch.rpm mha4mysql-node-0.58-0.el7.centos.noarch.rpm original-ks.cfg
[root@centos8 ~]# yum install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
在所有MySQL服务器上安装mha4mysql-node包,
此包支持CentOS 8,7,6
[root@sz-kx-centos8 ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
在所有节点实现相互之间ssh key验证
MHA管理端
[root@centos8 ~]# yum install rsync -y
[05:47:52 root@centos8 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:GA6eD2oTXm2a30Nq3oo0VjiEUfPy9YS/h9vIjfnkdFo root@centos8.longxuan.vip
The key's randomart image is:
+---[RSA 3072]----+
| ..o |
| o o . |
| . + o o . |
| o O + + |
| . B B S o |
| . + O . o |
| = * .o o + E |
| . + +oo.. % + |
| .o+.o.*.* |
+----[SHA256]-----+
[05:48:02 root@centos8 ~]# ssh-copy-id 127.0.0.1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:UxQsAjgLsmA4tpc7HO0xU9txsXgxqhyba9KbywIvZTA.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@127.0.0.1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh '127.0.0.1'"
and check to make sure that only the key(s) you wanted were added.
[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.18:/root/
[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.38:/root/
[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.48:/root/
在管理节点建立配置文件
注意: 此文件的行尾不要加空格等符号
[root@centos8 ~]# mkdir /etc/mastermha/
[root@centos8 ~]# vim /etc/mastermha/app1.cnf
[server default]
user=mhauser
password=centos
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=123456
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
report_script=/usr/local/bin/sendmail.sh
check_repl_delay=0
master_binlog_dir=/data/mysql/
[server1]
hostname=172.31.0.28
candidate_master=1
[server2]
hostname=172.31.0.38
candidate_master=1
[server3]
hostname=172.31.0.48
说明: 主库宕机谁来接管新的master
1. 所有从节点日志都是一致的,默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致,自动选择最接近于主库的从库充当新主
3. 如果对于某节点设定了权重(candidate_master=1),权重节点会优先选择。但是此节点日志量落后主库超过100M日志的话,也不会被选择。可以配合check_repl_delay=0,关闭日志量的检查,强制选择候选节点
相关文件脚本
# 安装邮件软件用于报警
[root@centos8 ~]# yum install postfix mailx -y
# 启动
[root@centos8 ~]# systemctl start postfix.service
# 配置邮件
[root@centos8 ~]# vim /etc/mail.rc
set from=llxuan@163.com
set smtp=smtp.163.com
set smtp-auth-user=llxuan@163.com
set smtp-auth-password=xxxxxxxxxxx # smtp授权码
# 报警脚本
[root@centos8 ~]# cat /usr/local/bin/sendmail.sh
#!/bin/bash
echo "MySQL is down" | mail -s "MHA Warning" llxuan@162.com
授权
[root@centos8 ~]# chmod +x /usr/local/bin/sendmail.sh
相关脚本
[06:02:52 root@centos8 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '172.31.0.100/16';
my $gateway = '172.31.0.254';
my $interface = 'eth0';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1";
my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down";
GetOptions(
'command=s' => $command,
'ssh_user=s' => $ssh_user,
'orig_master_host=s' => $orig_master_host,
'orig_master_ip=s' => $orig_master_ip,
'orig_master_port=i' => $orig_master_port,
'new_master_host=s' => $new_master_host,
'new_master_ip=s' => $new_master_ip,
'new_master_port=i' => $new_master_port,
);
exit &main();
sub main {
print "
IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host
";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@
";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host
";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK
";
`ssh $ssh_user@$orig_master_host " $ssh_start_vip "`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
";
}
脚本授权
[root@centos8 ~]# chmod +x /usr/local/bin/master_ip_failover
实现Master
[root@sz-kx-centos8 ~]# yum install mysql-server -y
[root@sz-kx-centos8 ~]# mkdir /data/mysql/
[root@sz-kx-centos8 ~]# chown mysql.mysql /data/mysql/
[root@sz-kx-centos8 ~]# vim /etc/my.cnf
[mysqld]
server-id=28
log-bin=/data/mysql/mysql-bin
skip-name-resolve=1
general-log
[root@sz-kx-centos8 ~]# systemctl restart mysqld
# 查询二进制日志位置
mysql> show master logs;
# 创建主从复制用户并授权
mysql> create user repluser@'172.31.0.%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
mysql> grant replication slave on *.* to repluser@'172.31.0.%';
Query OK, 0 rows affected (0.01 sec)
# 创建mha用户并授权
mysql> create user mhauser@'172.31.0.%' identified by 'centos';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to mhauser@'172.31.0.%';
Query OK, 0 rows affected (0.01 sec)
# 使用标签做个VIP地址
[root@sz-kx-centos8 ~]# ifconfig eth0:1 172.31.0.100/16
[root@sz-kx-centos8 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.31.0.18 netmask 255.255.0.0 broadcast 172.31.255.255
inet6 fe80::20c:29ff:fe43:49b prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:43:04:9b txqueuelen 1000 (Ethernet)
RX packets 42588 bytes 55076155 (52.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20092 bytes 1443183 (1.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.31.0.100 netmask 255.255.0.0 broadcast 172.31.255.255
ether 00:0c:29:43:04:9b txqueuelen 1000 (Ethernet)
实现两台slave
[root@centos8 ~]# yum install mysql-server -y
[root@centos8 ~]# mkdir /data/mysql -p
[root@centos8 ~]# chown mysql.mysql /data/mysql/
[root@centos8 ~]# vim /etc/my.cnf
[mysqld]
server-id=48
log-bin=/data/mysql/mysql-bin
read-only
relay_log_purge=0
skip_name_resolve=1
general_log
[root@centos8 ~]# systemctl start mysqld
# 添加主的二进制日志,注意:如果之后重新添加不能添加之前的,只能添加当前的
CHANGE MASTER TO
MASTER_HOST='172.31.0.28',
MASTER_USER='repluser',
MASTER_PASSWORD='123456',
MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000002',
MASTER_LOG_POS=156;
mysql> start slave;
Query OK, 0 rows affected (0.05 sec)
检查MHA的环境
# ssh互信检测
[root@centos8 ~]# masterha_check_ssh --conf=/etc/mastermha/app1.cnf
Sat May 22 06:32:13 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 06:32:13 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 06:32:13 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 06:32:13 2021 - [info] Starting SSH connection tests..
Sat May 22 06:32:14 2021 - [debug]
Sat May 22 06:32:13 2021 - [debug] Connecting via SSH from root@172.31.0.18(172.31.0.18:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 06:32:13 2021 - [debug] ok.
Sat May 22 06:32:13 2021 - [debug] Connecting via SSH from root@172.31.0.18(172.31.0.18:22) to root@172.31.0.48(172.31.0.48:22)..
Warning: Permanently added '172.31.0.48' (ECDSA) to the list of known hosts.
Sat May 22 06:32:14 2021 - [debug] ok.
Sat May 22 06:32:14 2021 - [debug]
Sat May 22 06:32:13 2021 - [debug] Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.18(172.31.0.18:22)..
Sat May 22 06:32:14 2021 - [debug] ok.
Sat May 22 06:32:14 2021 - [debug] Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 06:32:14 2021 - [debug] ok.
Sat May 22 06:32:15 2021 - [debug]
Sat May 22 06:32:14 2021 - [debug] Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.18(172.31.0.18:22)..
Sat May 22 06:32:14 2021 - [debug] ok.
Sat May 22 06:32:14 2021 - [debug] Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 06:32:15 2021 - [debug] ok.
Sat May 22 06:32:15 2021 - [info] All SSH connection tests passed successfully.
Use of uninitialized value in exit at /usr/bin/masterha_check_ssh line 44.
# 主从复制检测
[root@centos8 ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf
Sat May 22 18:00:02 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:00:02 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:02 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:02 2021 - [info] Starting SSH connection tests..
Sat May 22 18:00:03 2021 - [debug]
Sat May 22 18:00:02 2021 - [debug] Connecting via SSH from root@172.31.0.28(172.31.0.28:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 18:00:02 2021 - [debug] ok.
Sat May 22 18:00:02 2021 - [debug] Connecting via SSH from root@172.31.0.28(172.31.0.28:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 18:00:02 2021 - [debug] ok.
Sat May 22 18:00:03 2021 - [debug]
Sat May 22 18:00:02 2021 - [debug] Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.28(172.31.0.28:22)..
Sat May 22 18:00:02 2021 - [debug] ok.
Sat May 22 18:00:02 2021 - [debug] Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 18:00:03 2021 - [debug] ok.
Sat May 22 18:00:04 2021 - [debug]
Sat May 22 18:00:03 2021 - [debug] Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.28(172.31.0.28:22)..
Sat May 22 18:00:03 2021 - [debug] ok.
Sat May 22 18:00:03 2021 - [debug] Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 18:00:03 2021 - [debug] ok.
Sat May 22 18:00:04 2021 - [info] All SSH connection tests passed successfully.
[root@localhost ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf
Sat May 22 18:00:08 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:00:08 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:08 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:08 2021 - [info] MHA::MasterMonitor version 0.58.
Sat May 22 18:00:09 2021 - [info] GTID failover mode = 0
Sat May 22 18:00:09 2021 - [info] Dead Servers:
Sat May 22 18:00:09 2021 - [info] Alive Servers:
Sat May 22 18:00:09 2021 - [info] 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] 172.31.0.48(172.31.0.48:3306)
Sat May 22 18:00:09 2021 - [info] 172.31.0.38(172.31.0.38:3306)
Sat May 22 18:00:09 2021 - [info] Alive Slaves:
Sat May 22 18:00:09 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:00:09 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:00:09 2021 - [info] 172.31.0.38(172.31.0.38:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:00:09 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] Current Alive Master: 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] Checking slave configurations..
Sat May 22 18:00:09 2021 - [info] Checking replication filtering settings..
Sat May 22 18:00:09 2021 - [info] binlog_do_db= , binlog_ignore_db=
Sat May 22 18:00:09 2021 - [info] Replication filtering check ok.
Sat May 22 18:00:09 2021 - [info] GTID (with auto-pos) is not supported
Sat May 22 18:00:09 2021 - [info] Starting SSH connection tests..
Sat May 22 18:00:12 2021 - [info] All SSH connection tests passed successfully.
Sat May 22 18:00:12 2021 - [info] Checking MHA Node version..
Sat May 22 18:00:12 2021 - [info] Version check ok.
Sat May 22 18:00:12 2021 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 22 18:00:13 2021 - [info] HealthCheck: SSH to 172.31.0.28 is reachable.
Sat May 22 18:00:13 2021 - [info] Master MHA Node version is 0.58.
Sat May 22 18:00:13 2021 - [info] Checking recovery script configurations on 172.31.0.28(172.31.0.28:3306)..
Sat May 22 18:00:13 2021 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000002
Sat May 22 18:00:13 2021 - [info] Connecting to root@172.31.0.28(172.31.0.28:22)..
Creating /data/mastermha/app1 if not exists.. Creating directory /data/mastermha/app1.. done.
ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/, up to mysql-bin.000002
Sat May 22 18:00:13 2021 - [info] Binlog setting check done.
Sat May 22 18:00:13 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat May 22 18:00:13 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.31.0.48 --slave_ip=172.31.0.48 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=centos8-relay-bin.000002 --slave_pass=xxx
Sat May 22 18:00:13 2021 - [info] Connecting to root@172.31.0.48(172.31.0.48:22)..
Creating directory /data/mastermha/app1/.. done.
Checking slave recovery environment settings..
Relay log found at /var/lib/mysql, up to centos8-relay-bin.000002
Temporary relay log file is /var/lib/mysql/centos8-relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat May 22 18:00:13 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.31.0.38 --slave_ip=172.31.0.38 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=centos8-relay-bin.000002 --slave_pass=xxx
Sat May 22 18:00:13 2021 - [info] Connecting to root@172.31.0.38(172.31.0.38:22)..
Creating directory /data/mastermha/app1/.. done.
Checking slave recovery environment settings..
Relay log found at /var/lib/mysql, up to centos8-relay-bin.000002
Temporary relay log file is /var/lib/mysql/centos8-relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat May 22 18:00:14 2021 - [info] Slaves settings check done.
Sat May 22 18:00:14 2021 - [info]
172.31.0.28(172.31.0.28:3306) (current master)
+--172.31.0.48(172.31.0.48:3306)
+--172.31.0.38(172.31.0.38:3306)
Sat May 22 18:00:14 2021 - [info] Checking replication health on 172.31.0.48..
Sat May 22 18:00:14 2021 - [info] ok.
Sat May 22 18:00:14 2021 - [info] Checking replication health on 172.31.0.38..
Sat May 22 18:00:14 2021 - [info] ok.
Sat May 22 18:00:14 2021 - [info] Checking master_ip_failover_script status:
Sat May 22 18:00:14 2021 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===
Checking the Status of the script.. OK
Sat May 22 18:00:14 2021 - [info] OK.
Sat May 22 18:00:14 2021 - [warning] shutdown_script is not defined.
Sat May 22 18:00:14 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
# 查看状态
[root@centos8 ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
# 启动
[root@centos8 ~]# masterha_manager --conf=/etc/mastermha/app1.cnf &> /dev/null
# master查看到健康性检查
[root@sz-kx-centos8 ~]# tail -f /var/lib/mysql/centos8.log
2021-05-22T18:05:00.408005Z 24 Query SELECT 1 As Value
2021-05-22T18:05:01.408492Z 24 Query SELECT 1 As Value
2021-05-22T18:05:02.409002Z 24 Query SELECT 1 As Value
2021-05-22T18:05:03.409469Z 24 Query SELECT 1 As Value
2021-05-22T18:05:04.410620Z 24 Query SELECT 1 As Value
2021-05-22T18:05:05.411095Z 24 Query SELECT 1 As Value
# 查看状态
[root@localhost ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 (pid:27237) is running(0:PING_OK), master:172.31.0.28
模拟故障
# 当 master down机后,mha管理程序自动退出
# 追踪日志
[root@localhost ~]# tail /data/mastermha/app1/manager.log -f
Sat May 22 18:08:32 2021 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Sat May 22 18:08:32 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Sat May 22 18:08:32 2021 - [info] HealthCheck: SSH to 172.31.0.28 is reachable.
Sat May 22 18:08:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:33 2021 - [warning] Connection failed 2 time(s)..
Sat May 22 18:08:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:34 2021 - [warning] Connection failed 3 time(s)..
Sat May 22 18:08:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:35 2021 - [warning] Connection failed 4 time(s)..
Sat May 22 18:08:35 2021 - [warning] Master is not reachable from health checker!
Sat May 22 18:08:35 2021 - [warning] Master 172.31.0.28(172.31.0.28:3306) is not reachable!
Sat May 22 18:08:35 2021 - [warning] SSH is reachable.
Sat May 22 18:08:35 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mastermha/app1.cnf again, and trying to connect to all servers to check server status..
Sat May 22 18:08:35 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:08:35 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:08:35 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:08:36 2021 - [info] GTID failover mode = 0
Sat May 22 18:08:36 2021 - [info] Dead Servers:
Sat May 22 18:08:36 2021 - [info] 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info] Alive Servers:
Sat May 22 18:08:36 2021 - [info] 172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:36 2021 - [info] 172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:36 2021 - [info] Alive Slaves:
Sat May 22 18:08:36 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:36 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:36 2021 - [info] 172.31.0.38(172.31.0.38:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:36 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info] Checking slave configurations..
Sat May 22 18:08:36 2021 - [info] Checking replication filtering settings..
Sat May 22 18:08:36 2021 - [info] Replication filtering check ok.
Sat May 22 18:08:36 2021 - [info] Master is down!
Sat May 22 18:08:36 2021 - [info] Terminating monitoring script.
Sat May 22 18:08:36 2021 - [info] Got exit code 20 (Master dead).
Sat May 22 18:08:36 2021 - [info] MHA::MasterFailover version 0.58.
Sat May 22 18:08:36 2021 - [info] Starting master failover.
Sat May 22 18:08:36 2021 - [info]
Sat May 22 18:08:36 2021 - [info] * Phase 1: Configuration Check Phase..
Sat May 22 18:08:36 2021 - [info]
Sat May 22 18:08:37 2021 - [info] GTID failover mode = 0
Sat May 22 18:08:37 2021 - [info] Dead Servers:
Sat May 22 18:08:37 2021 - [info] 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Checking master reachability via MySQL(double check)...
Sat May 22 18:08:37 2021 - [info] ok.
Sat May 22 18:08:37 2021 - [info] Alive Servers:
Sat May 22 18:08:37 2021 - [info] 172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:37 2021 - [info] 172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:37 2021 - [info] Alive Slaves:
Sat May 22 18:08:37 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info] 172.31.0.38(172.31.0.38:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Starting Non-GTID based failover.
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] ** Phase 1: Configuration Check Phase completed.
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] Forcing shutdown so that applications never connect to the current master..
Sat May 22 18:08:37 2021 - [info] Executing master IP deactivation script:
Sat May 22 18:08:37 2021 - [info] /usr/local/bin/master_ip_failover --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===
Disabling the VIP on old master: 172.31.0.28
Sat May 22 18:08:37 2021 - [info] done.
Sat May 22 18:08:37 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat May 22 18:08:37 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] * Phase 3: Master Recovery Phase..
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:1391
Sat May 22 18:08:37 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat May 22 18:08:37 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info] 172.31.0.38(172.31.0.38:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:1391
Sat May 22 18:08:37 2021 - [info] Oldest slaves:
Sat May 22 18:08:37 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info] 172.31.0.38(172.31.0.38:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sat May 22 18:08:37 2021 - [info]
Sat May 22 18:08:37 2021 - [info] Fetching dead master's binary logs..
Sat May 22 18:08:37 2021 - [info] Executing command on the dead master 172.31.0.28(172.31.0.28:3306): save_binary_logs --command=save --start_file=mysql-bin.000002 --start_pos=1391 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /data/mastermha/app1 if not exists.. ok.
Concat binary/relay logs from mysql-bin.000002 pos 1391 to mysql-bin.000002 EOF into /data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 156.. ok.
No need to dump effective binlog data from /data/mysql//mysql-bin.000002 (pos starts 1391, filesize 1391). Skipping.
Binlog Checksum enabled
/data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog has no effective data events.
Event not exists.
Sat May 22 18:08:38 2021 - [info] Additional events were not found from the orig master. No need to save.
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] * Phase 3.3: Determining New Master Phase..
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sat May 22 18:08:38 2021 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sat May 22 18:08:38 2021 - [info] Searching new master from slaves..
Sat May 22 18:08:38 2021 - [info] Candidate masters from the configuration file:
Sat May 22 18:08:38 2021 - [info] 172.31.0.48(172.31.0.48:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:38 2021 - [info] Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:38 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:38 2021 - [info] Non-candidate masters:
Sat May 22 18:08:38 2021 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sat May 22 18:08:38 2021 - [info] New master is 172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:38 2021 - [info] Starting master failover..
Sat May 22 18:08:38 2021 - [info]
From:
172.31.0.28(172.31.0.28:3306) (current master)
+--172.31.0.48(172.31.0.48:3306)
+--172.31.0.38(172.31.0.38:3306)
To:
172.31.0.48(172.31.0.48:3306) (new master)
+--172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] * Phase 3.5: Master Log Apply Phase..
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sat May 22 18:08:38 2021 - [info] Starting recovery on 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:38 2021 - [info] This server has all relay logs. Waiting all logs to be applied..
Sat May 22 18:08:38 2021 - [info] done.
Sat May 22 18:08:38 2021 - [info] All relay logs were successfully applied.
Sat May 22 18:08:38 2021 - [info] Getting new master's binlog name and position..
Sat May 22 18:08:38 2021 - [info] mysql-bin.000002:1426
Sat May 22 18:08:38 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.0.48', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=1426, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Sat May 22 18:08:38 2021 - [info] Executing master IP activate script:
Sat May 22 18:08:38 2021 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306 --new_master_host=172.31.0.48 --new_master_ip=172.31.0.48 --new_master_port=3306 --new_master_user='mhauser' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===
Enabling the VIP - 172.31.0.100/16 on the new master - 172.31.0.48
Sat May 22 18:08:38 2021 - [info] OK.
Sat May 22 18:08:38 2021 - [info] Setting read_only=0 on 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:38 2021 - [info] ok.
Sat May 22 18:08:38 2021 - [info] ** Finished master recovery successfully.
Sat May 22 18:08:38 2021 - [info] * Phase 3: Master Recovery Phase completed.
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] * Phase 4: Slaves Recovery Phase..
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sat May 22 18:08:38 2021 - [info]
Sat May 22 18:08:38 2021 - [info] -- Slave diff file generation on host 172.31.0.38(172.31.0.38:3306) started, pid: 27571. Check tmp log /data/mastermha/app1//172.31.0.38_3306_20210522180836.log if it takes time..
Sat May 22 18:08:39 2021 - [info]
Sat May 22 18:08:39 2021 - [info] Log messages from 172.31.0.38 ...
Sat May 22 18:08:39 2021 - [info]
Sat May 22 18:08:38 2021 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sat May 22 18:08:39 2021 - [info] End of log messages from 172.31.0.38.
Sat May 22 18:08:39 2021 - [info] -- 172.31.0.38(172.31.0.38:3306) has the latest relay log events.
Sat May 22 18:08:39 2021 - [info] Generating relay diff files from the latest slave succeeded.
Sat May 22 18:08:39 2021 - [info]
Sat May 22 18:08:39 2021 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sat May 22 18:08:39 2021 - [info]
Sat May 22 18:08:39 2021 - [info] -- Slave recovery on host 172.31.0.38(172.31.0.38:3306) started, pid: 27573. Check tmp log /data/mastermha/app1//172.31.0.38_3306_20210522180836.log if it takes time..
Sat May 22 18:08:40 2021 - [info]
Sat May 22 18:08:40 2021 - [info] Log messages from 172.31.0.38 ...
Sat May 22 18:08:40 2021 - [info]
Sat May 22 18:08:39 2021 - [info] Starting recovery on 172.31.0.38(172.31.0.38:3306)..
Sat May 22 18:08:39 2021 - [info] This server has all relay logs. Waiting all logs to be applied..
Sat May 22 18:08:39 2021 - [info] done.
Sat May 22 18:08:39 2021 - [info] All relay logs were successfully applied.
Sat May 22 18:08:39 2021 - [info] Resetting slave 172.31.0.38(172.31.0.38:3306) and starting replication from the new master 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:39 2021 - [info] Executed CHANGE MASTER.
Sat May 22 18:08:39 2021 - [info] Slave started.
Sat May 22 18:08:40 2021 - [info] End of log messages from 172.31.0.38.
Sat May 22 18:08:40 2021 - [info] -- Slave recovery on host 172.31.0.38(172.31.0.38:3306) succeeded.
Sat May 22 18:08:40 2021 - [info] All new slave servers recovered successfully.
Sat May 22 18:08:40 2021 - [info]
Sat May 22 18:08:40 2021 - [info] * Phase 5: New master cleanup phase..
Sat May 22 18:08:40 2021 - [info]
Sat May 22 18:08:40 2021 - [info] Resetting slave info on the new master..
Sat May 22 18:08:40 2021 - [info] 172.31.0.48: Resetting slave info succeeded.
Sat May 22 18:08:40 2021 - [info] Master failover to 172.31.0.48(172.31.0.48:3306) completed successfully.
Sat May 22 18:08:40 2021 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.31.0.28(172.31.0.28:3306) to 172.31.0.48(172.31.0.48:3306) succeeded
Master 172.31.0.28(172.31.0.28:3306) is down!
Check MHA Manager logs at localhost.localdomain:/data/mastermha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 172.31.0.28(172.31.0.28:3306)
The latest slave 172.31.0.48(172.31.0.48:3306) has all relay logs for recovery.
Selected 172.31.0.48(172.31.0.48:3306) as a new master.
172.31.0.48(172.31.0.48:3306): OK: Applying all logs succeeded.
172.31.0.48(172.31.0.48:3306): OK: Activated master IP address.
172.31.0.38(172.31.0.38:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.31.0.38(172.31.0.38:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.31.0.48(172.31.0.48:3306)
172.31.0.48(172.31.0.48:3306): Resetting slave info succeeded.
Master failover to 172.31.0.48(172.31.0.48:3306) completed successfully.
Sat May 22 18:08:40 2021 - [info] Sending mail..
sh: /usr/local/bin/sendmail.sh: No such file or directory
Sat May 22 18:08:40 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 127:0
再次检查状态
[root@localhost ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
原来的master追踪日志检测也会停止
[root@centos8 ~]# tail -f /var/lib/mysql/centos8.log
验证VIP漂移至新的Master上
[root@centos8 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:0c:29:16:9a:81 brd ff:ff:ff:ff:ff:ff
inet 172.31.0.48/16 brd 172.31.255.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 172.31.0.100/16 brd 172.31.255.255 scope global secondary eth0:1
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe16:9a81/64 scope link
报错:
# 检查主从复制repl报错
[root@centos8 ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf
Sat May 22 19:11:34 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 19:11:34 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 19:11:34 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 19:11:34 2021 - [info] MHA::MasterMonitor version 0.58.
Sat May 22 19:11:36 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm line 201.
Sat May 22 19:11:36 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sat May 22 19:11:36 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
思路:
一般情况下是主从关系没有搭建成功,首先要保证主库数据要和其他从库数据保持一致,主库和主备的配置文件配置要正确,都要开启半同步复制,主库要授权从库同步数据的用户权限,从库进行相应配置
mysql8.0版本尚未得到mha4mysql的支持,改源码(这次没有用到改源码,是因为使用了CentOS8,感觉MHA对于CentOS8很不友好)
[root@centos8 ~]# grep -rn 'sub parse_mysql_major_version($)' /usr/share/perl5/vendor_perl/MHA/
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm:199:sub parse_mysql_major_version($)
# 原代码
#sub parse_mysql_major_version($) {
# my $str = shift;
# my $result = sprintf( '%03d%03d', $str =~ m/(d+)/g );
# return $result;
#}
# 改动后代码
sub parse_mysql_major_version($) {
my $str = shift;
$str =~ /(d+).(d+)/;
my $strmajor = "$1.$2";
my $result = sprintf( '%03d%03d', $strmajor =~ m/(d+)/g );
return $result;
}
CentOS7 mha4安装失败
--> Finished Dependency Resolution
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
Requires: perl(Log::Dispatch)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
Requires: perl(Parallel::ForkManager)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
Requires: perl(Log::Dispatch::File)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
Requires: perl(Log::Dispatch::Screen)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
解决方法:
[root@localhost ~]# yum install epel-release -y
# 重新安装即可
[root@localhost ~]# yum instll mha4mysql-*.rpm -y