zoukankan      html  css  js  c++  java
  • MySQL高可用方案 MHA之二 master_ip_failover

    异步主从复制架构
    master:
    10.150.20.90 ed3jrdba90
    slave:
    10.15.20.97 ed3jrdba97
    10.150.20.132 ed3jrdba132
    manager:
    10.150.20.95 ed3jrdba95
    #新增VIP
    vip:10.150.20.200

    四台机器的系统情况:
    OS:CentOS7.3
    MySQL:5.7.21
    MHA:0.58
    网卡名:ens3

    mha manager节点

    1:配置app1.cnf文件
    添加master_ip_failover_script的文件路径,mysql master失败时执行的切换脚本。
    #vi /etc/mysql_mha/app1.cnf
    #自动failover时候的切换脚本
    master_ip_failover_script= /usr/local/bin/master_ip_failover

    [root@dev05 ~]# cat /etc/mysql_mha/app1.cnf

    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server1]
    hostname=10.150.20.90
    port=33061

    [server1]
    hostname=10.150.20.97
    port=33061

    [server3]
    hostname=10.150.20.132
    port=33061

    编辑master_ip_failover脚本文件:没有使用 keepalived ,通过脚本的方式管理vip

    # cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak
    # vi /usr/local/bin/master_ip_failover

    #!/usr/bin/env perl
    use strict;
    use warnings FATAL => 'all';
    
    use Getopt::Long;
    
    my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
    );
    #############################添加内容部分#########################################
    my $vip = '10.150.20.200';
    my $brdc = '10.150.20.255';
    my $ifdev = 'ens3';
    my $key = '1';
    my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
    my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
    ##################################################################################
    GetOptions(
    'command=s' => $command,
    'ssh_user=s' => $ssh_user,
    'orig_master_host=s' => $orig_master_host,
    'orig_master_ip=s' => $orig_master_ip,
    'orig_master_port=i' => $orig_master_port,
    'new_master_host=s' => $new_master_host,
    'new_master_ip=s' => $new_master_ip,
    'new_master_port=i' => $new_master_port,
    );
    
    exit &main();
    
    sub main {
    
    print "
    
    IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
    
    ";
    
    if ( $command eq "stop" || $command eq "stopssh" ) {
    
    my $exit_code = 1;
    eval {
    print "Disabling the VIP on old master: $orig_master_host 
    ";
    &stop_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn "Got Error: $@
    ";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "start" ) {
    
    my $exit_code = 10;
    eval {
    print "Enabling the VIP - $vip on the new master - $new_master_host 
    ";
    &start_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn $@;
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK 
    ";
    exit 0;
    }
    else {
    &usage();
    exit 1;
    }
    }
    sub start_vip() {
    `ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
    }
    # A simple system call that disable the VIP on the old_master
    sub stop_vip() {
    `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
    }
    
    sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
    ";
    }
    master_ip_failover

    更换ip后,一定要执行下arping

    检查复制环境ssh
    # masterha_check_ssh --conf=/etc/mysql_mha/app1.cnf
    Wed Dec 12 14:43:27 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Starting SSH connection tests..
    Wed Dec 12 14:43:28 2018 - [debug]
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.97(10.150.20.97:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.132(10.150.20.132:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:28 2018 - [debug]
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.90(10.150.20.90:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.132(10.150.20.132:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:29 2018 - [debug]
    Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.90(10.150.20.90:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.97(10.150.20.97:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:29 2018 - [info] All SSH connection tests passed successfully.
    检查整个复制环境
    # masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
    Wed Dec 12 14:44:35 2018 - [info] Slaves settings check done.
    Wed Dec 12 14:44:35 2018 - [info]
    10.150.20.90(10.150.20.90:33061) (current master)
    +--10.150.20.97(10.150.20.97:33061)
    +--10.150.20.132(10.150.20.132:33061)

    Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.97..
    Wed Dec 12 14:44:35 2018 - [info] ok.
    Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.132..
    Wed Dec 12 14:44:35 2018 - [info] ok.
    Wed Dec 12 14:44:35 2018 - [info] Checking master_ip_failover_script status:
    Wed Dec 12 14:44:35 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061
    Wed Dec 12 14:44:35 2018 - [info] OK.
    Wed Dec 12 14:44:35 2018 - [warning] shutdown_script is not defined.
    Wed Dec 12 14:44:35 2018 - [info] Got exit code 0 (Not master dead).

    MySQL Replication Health is OK.

    启动 mha manager
    # nohup masterha_manager --conf=/etc/mysql_mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mysql_mha/app1-manager.log 2>&1 &
    查看 manager status
    # masterha_check_status --conf=/etc/mysql_mha/app1.cnf
    查看 manager log
    # tail -n 1000 -f /var/log/masterha/app1-manager.log

    验证 failover

    在主库qa05.010150020090.yz节点,进行vip绑定:
    [root@qa05 ~]#ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
    [root@qa05 ~]#/usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200

    #vip解绑:
    # ip addr del 10.150.20.200/24 dev ens3 label ens3:1

    模拟故障,在qa05.010150020090.yz上 kill 掉 mysqld 进程
    [root@qa05 ~]## ps -ef|grep -i mysql
    mysql 3114 1 0 Aug06 ? 00:00:51 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
    root 15551 10466 0 Aug06 pts/1 00:00:00 mysql
    root 25521 21213 0 03:52 pts/2 00:00:00 grep --color=auto -i mysql
    [root@qa05 ~]## kill -9 27593 26101

    观察 mha manager 之前打开的日志输出
    [root@dev05 ~]# tail -n 1000 -f /data/mysql_mha/app1-manager.log

    Wed Dec 12 14:54:10 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
    Wed Dec 12 14:54:10 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
    Wed Dec 12 14:54:10 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
    Wed Dec 12 14:54:15 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:15 2018 - [warning] Connection failed 2 time(s)..
    Wed Dec 12 14:54:20 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:20 2018 - [warning] Connection failed 3 time(s)..
    Wed Dec 12 14:54:25 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:25 2018 - [warning] Connection failed 4 time(s)..
    Wed Dec 12 14:54:25 2018 - [warning] Master is not reachable from health checker!
    Wed Dec 12 14:54:25 2018 - [warning] Master 10.150.20.90(10.150.20.90:33061) is not reachable!
    Wed Dec 12 14:54:25 2018 - [warning] SSH is reachable.
    Wed Dec 12 14:54:25 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status..
    Wed Dec 12 14:54:25 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 14:54:25 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:54:25 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:54:26 2018 - [info] GTID failover mode = 0
    Wed Dec 12 14:54:26 2018 - [info] Dead Servers:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] Alive Servers:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:26 2018 - [info] Alive Slaves:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] Checking slave configurations..
    Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).
    Wed Dec 12 14:54:26 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).
    Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
    Wed Dec 12 14:54:26 2018 - [info] Checking replication filtering settings..
    Wed Dec 12 14:54:26 2018 - [info] Replication filtering check ok.
    Wed Dec 12 14:54:26 2018 - [info] Master is down!
    Wed Dec 12 14:54:26 2018 - [info] Terminating monitoring script.
    Wed Dec 12 14:54:26 2018 - [info] Got exit code 20 (Master dead).
    Wed Dec 12 14:54:26 2018 - [info] MHA::MasterFailover version 0.58.
    Wed Dec 12 14:54:26 2018 - [info] Starting master failover.
    Wed Dec 12 14:54:26 2018 - [info] 
    Wed Dec 12 14:54:26 2018 - [info] * Phase 1: Configuration Check Phase..
    Wed Dec 12 14:54:26 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] GTID failover mode = 0
    Wed Dec 12 14:54:27 2018 - [info] Dead Servers:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] Checking master reachability via MySQL(double check)...
    Wed Dec 12 14:54:27 2018 - [info] ok.
    Wed Dec 12 14:54:27 2018 - [info] Alive Servers:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:27 2018 - [info] Alive Slaves:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] Starting Non-GTID based failover.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] ** Phase 1: Configuration Check Phase completed.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] Forcing shutdown so that applications never connect to the current master..
    Wed Dec 12 14:54:27 2018 - [info] Executing master IP deactivation script:
    Wed Dec 12 14:54:27 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --command=stopssh --ssh_user=root 
    Wed Dec 12 14:54:27 2018 - [info] done.
    Wed Dec 12 14:54:27 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3: Master Recovery Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000006:154
    Wed Dec 12 14:54:27 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000006:154
    Wed Dec 12 14:54:27 2018 - [info] Oldest slaves:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] Fetching dead master's binary logs..
    Wed Dec 12 14:54:27 2018 - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:33061): save_binary_logs --command=save --start_file=mysql-bin.000006 --start_pos=154 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
    Creating /data/mysql_mha/app1 if not exists.. ok.
    Concat binary/relay logs from mysql-bin.000006 pos 154 to mysql-bin.000006 EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog ..
    Binlog Checksum enabled
    Dumping binlog format description event, from position 0 to 154.. ok.
    No need to dump effective binlog data from /data/mysql_33061/logs/mysql-bin.000006 (pos starts 154, filesize 154). Skipping.
    Binlog Checksum enabled
    /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog has no effective data events.
    Event not exists.
    Wed Dec 12 14:54:28 2018 - [info] Additional events were not found from the orig master. No need to save.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.3: Determining New Master Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
    Wed Dec 12 14:54:28 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
    Wed Dec 12 14:54:28 2018 - [info] Searching new master from slaves..
    Wed Dec 12 14:54:28 2018 - [info] Candidate masters from the configuration file:
    Wed Dec 12 14:54:28 2018 - [info] Non-candidate masters:
    Wed Dec 12 14:54:28 2018 - [info] New master is 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:28 2018 - [info] Starting master failover..
    Wed Dec 12 14:54:28 2018 - [info] 
    From:
    10.150.20.90(10.150.20.90:33061) (current master)
    +--10.150.20.97(10.150.20.97:33061)
    +--10.150.20.132(10.150.20.132:33061)
    
    To:
    10.150.20.97(10.150.20.97:33061) (new master)
    +--10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.5: Master Log Apply Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
    Wed Dec 12 14:54:28 2018 - [info] Starting recovery on 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
    Wed Dec 12 14:54:28 2018 - [info] done.
    Wed Dec 12 14:54:28 2018 - [info] All relay logs were successfully applied.
    Wed Dec 12 14:54:28 2018 - [info] Getting new master's binlog name and position..
    Wed Dec 12 14:54:28 2018 - [info] mysql-bin.000010:2774
    Wed Dec 12 14:54:28 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=33061, MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=2774, MASTER_USER='replicator', MASTER_PASSWORD='xxx';
    Wed Dec 12 14:54:28 2018 - [info] Executing master IP activate script:
    Wed Dec 12 14:54:28 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port=33061 --new_master_user='mha_monitor' --new_master_password=xxx
    Set read_only=0 on the new master.
    Creating app user on the new master..
    Wed Dec 12 14:54:28 2018 - [info] OK.
    Wed Dec 12 14:54:28 2018 - [info] ** Finished master recovery successfully.
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3: Master Recovery Phase completed.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 4: Slaves Recovery Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:33061) started, pid: 25117. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] Log messages from 10.150.20.132 ...
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
    Wed Dec 12 14:54:29 2018 - [info] End of log messages from 10.150.20.132.
    Wed Dec 12 14:54:29 2018 - [info] -- 10.150.20.132(10.150.20.132:33061) has the latest relay log events.
    Wed Dec 12 14:54:29 2018 - [info] Generating relay diff files from the latest slave succeeded.
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) started, pid: 25119. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] Log messages from 10.150.20.132 ...
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] Starting recovery on 10.150.20.132(10.150.20.132:33061)..
    Wed Dec 12 14:54:29 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
    Wed Dec 12 14:54:29 2018 - [info] done.
    Wed Dec 12 14:54:29 2018 - [info] All relay logs were successfully applied.
    Wed Dec 12 14:54:29 2018 - [info] Resetting slave 10.150.20.132(10.150.20.132:33061) and starting replication from the new master 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 14:54:29 2018 - [info] Executed CHANGE MASTER.
    Wed Dec 12 14:54:29 2018 - [info] Slave started.
    Wed Dec 12 14:54:30 2018 - [info] End of log messages from 10.150.20.132.
    Wed Dec 12 14:54:30 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) succeeded.
    Wed Dec 12 14:54:30 2018 - [info] All new slave servers recovered successfully.
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] * Phase 5: New master cleanup phase..
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] Resetting slave info on the new master..
    Wed Dec 12 14:54:30 2018 - [info] 10.150.20.97: Resetting slave info succeeded.
    Wed Dec 12 14:54:30 2018 - [info] Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
    Wed Dec 12 14:54:30 2018 - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf .
    Wed Dec 12 14:54:30 2018 - [info]
    
    ----- Failover Report -----
    
    app1: MySQL Master failover 10.150.20.90(10.150.20.90:33061) to 10.150.20.97(10.150.20.97:33061) succeeded
    
    Master 10.150.20.90(10.150.20.90:33061) is down!
    
    Check MHA Manager logs at dev05.010150020095.yz:/data/mysql_mha/app1-manager.log for details.
    
    Started automated(non-interactive) failover.
    Invalidated master IP address on 10.150.20.90(10.150.20.90:33061)
    The latest slave 10.150.20.97(10.150.20.97:33061) has all relay logs for recovery.
    Selected 10.150.20.97(10.150.20.97:33061) as a new master.
    10.150.20.97(10.150.20.97:33061): OK: Applying all logs succeeded.
    10.150.20.97(10.150.20.97:33061): OK: Activated master IP address.
    10.150.20.132(10.150.20.132:33061): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    10.150.20.132(10.150.20.132:33061): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:33061)
    10.150.20.97(10.150.20.97:33061): Resetting slave info succeeded.
    Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
    app1-manager.log

    从日志,可以看出new master切换至10.150.20.97,此时manager节点mha manager关闭

    [root@dev05 ~]# masterha_check_status --conf=/etc/mysql_mha/app1.cnf

    app1 is stopped(2:NOT_RUNNING).

    而新主qa06.010150020097.yz,vip绑定到ens3网卡上
    [root@qa06 ~]# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff
    inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3
    valid_lft forever preferred_lft forever
    inet 10.150.20.200/24 brd 10.150.20.255 scope global secondary ens3:1
    valid_lft forever preferred_lft forever
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:7f:36:38:fe brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
    valid_lft forever preferred_lft forever

    此时的mha manager节点的配置文件app1.cnf被修改为:
    [root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server2]
    hostname=10.150.20.97
    port=33061

    [server3]
    hostname=10.150.20.132
    port=33061


    重新编辑app1.cnf
    [root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server1]
    hostname=10.150.20.97
    port=33061
    [server2]
    hostname=10.150.20.90
    port=33061
    [server3]
    hostname=10.150.20.132
    port=33061

    重启qa05.010150020090.yz的MySQL,搭建主从,指向新主
    mysql> change master to
    master_host='10.150.20.97',
    master_user='replicator',
    master_password='replicator',
    master_port=33061,
    master_log_file='mysql-bin.000010',
    master_log_pos=2774;
    mysql> start slave;

    检测复制环境
    # masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

    Wed Dec 12 15:06:56 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 15:06:56 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 15:06:56 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 15:06:56 2018 - [info] MHA::MasterMonitor version 0.58.
    Wed Dec 12 15:06:57 2018 - [info] GTID failover mode = 0
    Wed Dec 12 15:06:57 2018 - [info] Dead Servers:
    Wed Dec 12 15:06:57 2018 - [info] Alive Servers:
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 15:06:57 2018 - [info] Alive Slaves:
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] Current Alive Master: 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] Checking slave configurations..
    Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.90(10.150.20.90:33061).
    Wed Dec 12 15:06:57 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.90(10.150.20.90:33061).
    Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
    Wed Dec 12 15:06:57 2018 - [info] Checking replication filtering settings..
    Wed Dec 12 15:06:57 2018 - [info] binlog_do_db= , binlog_ignore_db= 
    Wed Dec 12 15:06:57 2018 - [info] Replication filtering check ok.
    Wed Dec 12 15:06:57 2018 - [info] GTID (with auto-pos) is not supported
    Wed Dec 12 15:06:57 2018 - [info] Starting SSH connection tests..
    Wed Dec 12 15:06:59 2018 - [info] All SSH connection tests passed successfully.
    Wed Dec 12 15:06:59 2018 - [info] Checking MHA Node version..
    Wed Dec 12 15:07:00 2018 - [info] Version check ok.
    Wed Dec 12 15:07:00 2018 - [info] Checking SSH publickey authentication settings on the current master..
    Wed Dec 12 15:07:00 2018 - [info] HealthCheck: SSH to 10.150.20.97 is reachable.
    Wed Dec 12 15:07:00 2018 - [info] Master MHA Node version is 0.58.
    Wed Dec 12 15:07:00 2018 - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 15:07:00 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000010 
    Wed Dec 12 15:07:00 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22).. 
    Creating /data/mysql_mha/app1 if not exists.. ok.
    Checking output directory is accessible or not..
    ok.
    Binlog found at /data/mysql_33061/logs, up to mysql-bin.000010
    Wed Dec 12 15:07:01 2018 - [info] Binlog setting check done.
    Wed Dec 12 15:07:01 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.90 --slave_ip=10.150.20.90 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
    Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22).. 
    Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/logs, up to relaylog.000002
    Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
    mysql: [Warning] Using a password on the command line interface can be insecure.
    done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
    Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
    Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22).. 
    Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
    Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
    mysql: [Warning] Using a password on the command line interface can be insecure.
    done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
    Wed Dec 12 15:07:01 2018 - [info] Slaves settings check done.
    Wed Dec 12 15:07:01 2018 - [info] 
    10.150.20.97(10.150.20.97:33061) (current master)
    +--10.150.20.90(10.150.20.90:33061)
    +--10.150.20.132(10.150.20.132:33061)
    
    Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.90..
    Wed Dec 12 15:07:01 2018 - [info] ok.
    Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.132..
    Wed Dec 12 15:07:01 2018 - [info] ok.
    Wed Dec 12 15:07:01 2018 - [info] Checking master_ip_failover_script status:
    Wed Dec 12 15:07:01 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=33061 
    Wed Dec 12 15:07:02 2018 - [info] OK.
    Wed Dec 12 15:07:02 2018 - [warning] shutdown_script is not defined.
    Wed Dec 12 15:07:02 2018 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.
    复制环境

    小结:
    1:搭建MHA时,vip绑定需要自行绑定到主库;当主库发生failover,vip会绑定到新主
    2:发生master_ip_failover之后,mha监控程序自动断掉;
    3:vip绑定:
    # ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
    # /usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200
    vip解绑:
    # ip addr del 10.150.20.200/24 dev ens3 label ens3:1
    4:关闭mha监控程序为:
    # masterha_stop --conf=/etc/mysql_mha/app1.cnf
    Stopped app1 successfully.

    5:failover的过程,基本为以下步骤:
    1).配置文件检查阶段,这个阶段会检查整个集群配置文件
    2).宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作
    3).复制dead master和最新slave相差的relay log,并保存到MHA Manger具体的目录下
    4).识别含有最新更新的slave
    5).应用从master保存的二进制日志事件(binlog events)
    6).提升一个slave为新的master进行复制
    7).使其他的slave连接新的master进行复制

  • 相关阅读:
    aps.net 图形验证码(转)
    js浮点数计算问题 + 金额大写转换
    meta标签总结
    Asp.net Session 保存到MySql中
    css3实现边框圆角样式
    iOS开发之NSOperation & NSOperationQueue
    iOS开发之多线程
    iOS开发之Block
    iOS开发之核心动画(Core Animation)
    iOS开发之CALayer
  • 原文地址:https://www.cnblogs.com/elontian/p/10095199.html
Copyright © 2011-2022 走看看