zoukankan      html  css  js  c++  java
  • MySQL高可用方案 MHA之二 master_ip_failover

    异步主从复制架构
    master:
    10.150.20.90 ed3jrdba90
    slave:
    10.15.20.97 ed3jrdba97
    10.150.20.132 ed3jrdba132
    manager:
    10.150.20.95 ed3jrdba95
    #新增VIP
    vip:10.150.20.200

    四台机器的系统情况:
    OS:CentOS7.3
    MySQL:5.7.21
    MHA:0.58
    网卡名:ens3

    mha manager节点

    1:配置app1.cnf文件
    添加master_ip_failover_script的文件路径,mysql master失败时执行的切换脚本。
    #vi /etc/mysql_mha/app1.cnf
    #自动failover时候的切换脚本
    master_ip_failover_script= /usr/local/bin/master_ip_failover

    [root@dev05 ~]# cat /etc/mysql_mha/app1.cnf

    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server1]
    hostname=10.150.20.90
    port=33061

    [server1]
    hostname=10.150.20.97
    port=33061

    [server3]
    hostname=10.150.20.132
    port=33061

    编辑master_ip_failover脚本文件:没有使用 keepalived ,通过脚本的方式管理vip

    # cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak
    # vi /usr/local/bin/master_ip_failover

    #!/usr/bin/env perl
    use strict;
    use warnings FATAL => 'all';
    
    use Getopt::Long;
    
    my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
    );
    #############################添加内容部分#########################################
    my $vip = '10.150.20.200';
    my $brdc = '10.150.20.255';
    my $ifdev = 'ens3';
    my $key = '1';
    my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
    my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
    ##################################################################################
    GetOptions(
    'command=s' => $command,
    'ssh_user=s' => $ssh_user,
    'orig_master_host=s' => $orig_master_host,
    'orig_master_ip=s' => $orig_master_ip,
    'orig_master_port=i' => $orig_master_port,
    'new_master_host=s' => $new_master_host,
    'new_master_ip=s' => $new_master_ip,
    'new_master_port=i' => $new_master_port,
    );
    
    exit &main();
    
    sub main {
    
    print "
    
    IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
    
    ";
    
    if ( $command eq "stop" || $command eq "stopssh" ) {
    
    my $exit_code = 1;
    eval {
    print "Disabling the VIP on old master: $orig_master_host 
    ";
    &stop_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn "Got Error: $@
    ";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "start" ) {
    
    my $exit_code = 10;
    eval {
    print "Enabling the VIP - $vip on the new master - $new_master_host 
    ";
    &start_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn $@;
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK 
    ";
    exit 0;
    }
    else {
    &usage();
    exit 1;
    }
    }
    sub start_vip() {
    `ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
    }
    # A simple system call that disable the VIP on the old_master
    sub stop_vip() {
    `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
    }
    
    sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
    ";
    }
    master_ip_failover

    更换ip后,一定要执行下arping

    检查复制环境ssh
    # masterha_check_ssh --conf=/etc/mysql_mha/app1.cnf
    Wed Dec 12 14:43:27 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:43:27 2018 - [info] Starting SSH connection tests..
    Wed Dec 12 14:43:28 2018 - [debug]
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.97(10.150.20.97:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.132(10.150.20.132:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:28 2018 - [debug]
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.90(10.150.20.90:22)..
    Wed Dec 12 14:43:27 2018 - [debug] ok.
    Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.132(10.150.20.132:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:29 2018 - [debug]
    Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.90(10.150.20.90:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.97(10.150.20.97:22)..
    Wed Dec 12 14:43:28 2018 - [debug] ok.
    Wed Dec 12 14:43:29 2018 - [info] All SSH connection tests passed successfully.
    检查整个复制环境
    # masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
    Wed Dec 12 14:44:35 2018 - [info] Slaves settings check done.
    Wed Dec 12 14:44:35 2018 - [info]
    10.150.20.90(10.150.20.90:33061) (current master)
    +--10.150.20.97(10.150.20.97:33061)
    +--10.150.20.132(10.150.20.132:33061)

    Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.97..
    Wed Dec 12 14:44:35 2018 - [info] ok.
    Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.132..
    Wed Dec 12 14:44:35 2018 - [info] ok.
    Wed Dec 12 14:44:35 2018 - [info] Checking master_ip_failover_script status:
    Wed Dec 12 14:44:35 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061
    Wed Dec 12 14:44:35 2018 - [info] OK.
    Wed Dec 12 14:44:35 2018 - [warning] shutdown_script is not defined.
    Wed Dec 12 14:44:35 2018 - [info] Got exit code 0 (Not master dead).

    MySQL Replication Health is OK.

    启动 mha manager
    # nohup masterha_manager --conf=/etc/mysql_mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mysql_mha/app1-manager.log 2>&1 &
    查看 manager status
    # masterha_check_status --conf=/etc/mysql_mha/app1.cnf
    查看 manager log
    # tail -n 1000 -f /var/log/masterha/app1-manager.log

    验证 failover

    在主库qa05.010150020090.yz节点,进行vip绑定:
    [root@qa05 ~]#ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
    [root@qa05 ~]#/usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200

    #vip解绑:
    # ip addr del 10.150.20.200/24 dev ens3 label ens3:1

    模拟故障,在qa05.010150020090.yz上 kill 掉 mysqld 进程
    [root@qa05 ~]## ps -ef|grep -i mysql
    mysql 3114 1 0 Aug06 ? 00:00:51 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
    root 15551 10466 0 Aug06 pts/1 00:00:00 mysql
    root 25521 21213 0 03:52 pts/2 00:00:00 grep --color=auto -i mysql
    [root@qa05 ~]## kill -9 27593 26101

    观察 mha manager 之前打开的日志输出
    [root@dev05 ~]# tail -n 1000 -f /data/mysql_mha/app1-manager.log

    Wed Dec 12 14:54:10 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
    Wed Dec 12 14:54:10 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
    Wed Dec 12 14:54:10 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
    Wed Dec 12 14:54:15 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:15 2018 - [warning] Connection failed 2 time(s)..
    Wed Dec 12 14:54:20 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:20 2018 - [warning] Connection failed 3 time(s)..
    Wed Dec 12 14:54:25 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
    Wed Dec 12 14:54:25 2018 - [warning] Connection failed 4 time(s)..
    Wed Dec 12 14:54:25 2018 - [warning] Master is not reachable from health checker!
    Wed Dec 12 14:54:25 2018 - [warning] Master 10.150.20.90(10.150.20.90:33061) is not reachable!
    Wed Dec 12 14:54:25 2018 - [warning] SSH is reachable.
    Wed Dec 12 14:54:25 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status..
    Wed Dec 12 14:54:25 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 14:54:25 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:54:25 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 14:54:26 2018 - [info] GTID failover mode = 0
    Wed Dec 12 14:54:26 2018 - [info] Dead Servers:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] Alive Servers:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:26 2018 - [info] Alive Slaves:
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:26 2018 - [info] Checking slave configurations..
    Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).
    Wed Dec 12 14:54:26 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).
    Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
    Wed Dec 12 14:54:26 2018 - [info] Checking replication filtering settings..
    Wed Dec 12 14:54:26 2018 - [info] Replication filtering check ok.
    Wed Dec 12 14:54:26 2018 - [info] Master is down!
    Wed Dec 12 14:54:26 2018 - [info] Terminating monitoring script.
    Wed Dec 12 14:54:26 2018 - [info] Got exit code 20 (Master dead).
    Wed Dec 12 14:54:26 2018 - [info] MHA::MasterFailover version 0.58.
    Wed Dec 12 14:54:26 2018 - [info] Starting master failover.
    Wed Dec 12 14:54:26 2018 - [info] 
    Wed Dec 12 14:54:26 2018 - [info] * Phase 1: Configuration Check Phase..
    Wed Dec 12 14:54:26 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] GTID failover mode = 0
    Wed Dec 12 14:54:27 2018 - [info] Dead Servers:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] Checking master reachability via MySQL(double check)...
    Wed Dec 12 14:54:27 2018 - [info] ok.
    Wed Dec 12 14:54:27 2018 - [info] Alive Servers:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:27 2018 - [info] Alive Slaves:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] Starting Non-GTID based failover.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] ** Phase 1: Configuration Check Phase completed.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] Forcing shutdown so that applications never connect to the current master..
    Wed Dec 12 14:54:27 2018 - [info] Executing master IP deactivation script:
    Wed Dec 12 14:54:27 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --command=stopssh --ssh_user=root 
    Wed Dec 12 14:54:27 2018 - [info] done.
    Wed Dec 12 14:54:27 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3: Master Recovery Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000006:154
    Wed Dec 12 14:54:27 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000006:154
    Wed Dec 12 14:54:27 2018 - [info] Oldest slaves:
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
    Wed Dec 12 14:54:27 2018 - [info] 
    Wed Dec 12 14:54:27 2018 - [info] Fetching dead master's binary logs..
    Wed Dec 12 14:54:27 2018 - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:33061): save_binary_logs --command=save --start_file=mysql-bin.000006 --start_pos=154 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
    Creating /data/mysql_mha/app1 if not exists.. ok.
    Concat binary/relay logs from mysql-bin.000006 pos 154 to mysql-bin.000006 EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog ..
    Binlog Checksum enabled
    Dumping binlog format description event, from position 0 to 154.. ok.
    No need to dump effective binlog data from /data/mysql_33061/logs/mysql-bin.000006 (pos starts 154, filesize 154). Skipping.
    Binlog Checksum enabled
    /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog has no effective data events.
    Event not exists.
    Wed Dec 12 14:54:28 2018 - [info] Additional events were not found from the orig master. No need to save.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.3: Determining New Master Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
    Wed Dec 12 14:54:28 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
    Wed Dec 12 14:54:28 2018 - [info] Searching new master from slaves..
    Wed Dec 12 14:54:28 2018 - [info] Candidate masters from the configuration file:
    Wed Dec 12 14:54:28 2018 - [info] Non-candidate masters:
    Wed Dec 12 14:54:28 2018 - [info] New master is 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 14:54:28 2018 - [info] Starting master failover..
    Wed Dec 12 14:54:28 2018 - [info] 
    From:
    10.150.20.90(10.150.20.90:33061) (current master)
    +--10.150.20.97(10.150.20.97:33061)
    +--10.150.20.132(10.150.20.132:33061)
    
    To:
    10.150.20.97(10.150.20.97:33061) (new master)
    +--10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3.5: Master Log Apply Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
    Wed Dec 12 14:54:28 2018 - [info] Starting recovery on 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
    Wed Dec 12 14:54:28 2018 - [info] done.
    Wed Dec 12 14:54:28 2018 - [info] All relay logs were successfully applied.
    Wed Dec 12 14:54:28 2018 - [info] Getting new master's binlog name and position..
    Wed Dec 12 14:54:28 2018 - [info] mysql-bin.000010:2774
    Wed Dec 12 14:54:28 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=33061, MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=2774, MASTER_USER='replicator', MASTER_PASSWORD='xxx';
    Wed Dec 12 14:54:28 2018 - [info] Executing master IP activate script:
    Wed Dec 12 14:54:28 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port=33061 --new_master_user='mha_monitor' --new_master_password=xxx
    Set read_only=0 on the new master.
    Creating app user on the new master..
    Wed Dec 12 14:54:28 2018 - [info] OK.
    Wed Dec 12 14:54:28 2018 - [info] ** Finished master recovery successfully.
    Wed Dec 12 14:54:28 2018 - [info] * Phase 3: Master Recovery Phase completed.
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 4: Slaves Recovery Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
    Wed Dec 12 14:54:28 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:33061) started, pid: 25117. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] Log messages from 10.150.20.132 ...
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
    Wed Dec 12 14:54:29 2018 - [info] End of log messages from 10.150.20.132.
    Wed Dec 12 14:54:29 2018 - [info] -- 10.150.20.132(10.150.20.132:33061) has the latest relay log events.
    Wed Dec 12 14:54:29 2018 - [info] Generating relay diff files from the latest slave succeeded.
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
    Wed Dec 12 14:54:29 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) started, pid: 25119. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] Log messages from 10.150.20.132 ...
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:29 2018 - [info] Starting recovery on 10.150.20.132(10.150.20.132:33061)..
    Wed Dec 12 14:54:29 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
    Wed Dec 12 14:54:29 2018 - [info] done.
    Wed Dec 12 14:54:29 2018 - [info] All relay logs were successfully applied.
    Wed Dec 12 14:54:29 2018 - [info] Resetting slave 10.150.20.132(10.150.20.132:33061) and starting replication from the new master 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 14:54:29 2018 - [info] Executed CHANGE MASTER.
    Wed Dec 12 14:54:29 2018 - [info] Slave started.
    Wed Dec 12 14:54:30 2018 - [info] End of log messages from 10.150.20.132.
    Wed Dec 12 14:54:30 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) succeeded.
    Wed Dec 12 14:54:30 2018 - [info] All new slave servers recovered successfully.
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] * Phase 5: New master cleanup phase..
    Wed Dec 12 14:54:30 2018 - [info] 
    Wed Dec 12 14:54:30 2018 - [info] Resetting slave info on the new master..
    Wed Dec 12 14:54:30 2018 - [info] 10.150.20.97: Resetting slave info succeeded.
    Wed Dec 12 14:54:30 2018 - [info] Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
    Wed Dec 12 14:54:30 2018 - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf .
    Wed Dec 12 14:54:30 2018 - [info]
    
    ----- Failover Report -----
    
    app1: MySQL Master failover 10.150.20.90(10.150.20.90:33061) to 10.150.20.97(10.150.20.97:33061) succeeded
    
    Master 10.150.20.90(10.150.20.90:33061) is down!
    
    Check MHA Manager logs at dev05.010150020095.yz:/data/mysql_mha/app1-manager.log for details.
    
    Started automated(non-interactive) failover.
    Invalidated master IP address on 10.150.20.90(10.150.20.90:33061)
    The latest slave 10.150.20.97(10.150.20.97:33061) has all relay logs for recovery.
    Selected 10.150.20.97(10.150.20.97:33061) as a new master.
    10.150.20.97(10.150.20.97:33061): OK: Applying all logs succeeded.
    10.150.20.97(10.150.20.97:33061): OK: Activated master IP address.
    10.150.20.132(10.150.20.132:33061): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    10.150.20.132(10.150.20.132:33061): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:33061)
    10.150.20.97(10.150.20.97:33061): Resetting slave info succeeded.
    Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
    app1-manager.log

    从日志,可以看出new master切换至10.150.20.97,此时manager节点mha manager关闭

    [root@dev05 ~]# masterha_check_status --conf=/etc/mysql_mha/app1.cnf

    app1 is stopped(2:NOT_RUNNING).

    而新主qa06.010150020097.yz,vip绑定到ens3网卡上
    [root@qa06 ~]# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff
    inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3
    valid_lft forever preferred_lft forever
    inet 10.150.20.200/24 brd 10.150.20.255 scope global secondary ens3:1
    valid_lft forever preferred_lft forever
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:7f:36:38:fe brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
    valid_lft forever preferred_lft forever

    此时的mha manager节点的配置文件app1.cnf被修改为:
    [root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server2]
    hostname=10.150.20.97
    port=33061

    [server3]
    hostname=10.150.20.132
    port=33061


    重新编辑app1.cnf
    [root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
    [server default]
    manager_log=/data/mysql_mha/app1-manager.log
    manager_workdir=/data/mysql_mha/app1
    master_binlog_dir=/data/mysql_33061/logs
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha_monitor
    ping_interval=5
    remote_workdir=/data/mysql_mha/app1
    repl_password=replicator
    repl_user=replicator
    shutdown_script=""
    ssh_user=root
    user=mha_monitor

    [server1]
    hostname=10.150.20.97
    port=33061
    [server2]
    hostname=10.150.20.90
    port=33061
    [server3]
    hostname=10.150.20.132
    port=33061

    重启qa05.010150020090.yz的MySQL,搭建主从,指向新主
    mysql> change master to
    master_host='10.150.20.97',
    master_user='replicator',
    master_password='replicator',
    master_port=33061,
    master_log_file='mysql-bin.000010',
    master_log_pos=2774;
    mysql> start slave;

    检测复制环境
    # masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

    Wed Dec 12 15:06:56 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
    Wed Dec 12 15:06:56 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 15:06:56 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
    Wed Dec 12 15:06:56 2018 - [info] MHA::MasterMonitor version 0.58.
    Wed Dec 12 15:06:57 2018 - [info] GTID failover mode = 0
    Wed Dec 12 15:06:57 2018 - [info] Dead Servers:
    Wed Dec 12 15:06:57 2018 - [info] Alive Servers:
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061)
    Wed Dec 12 15:06:57 2018 - [info] Alive Slaves:
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] Current Alive Master: 10.150.20.97(10.150.20.97:33061)
    Wed Dec 12 15:06:57 2018 - [info] Checking slave configurations..
    Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.90(10.150.20.90:33061).
    Wed Dec 12 15:06:57 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.90(10.150.20.90:33061).
    Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
    Wed Dec 12 15:06:57 2018 - [info] Checking replication filtering settings..
    Wed Dec 12 15:06:57 2018 - [info] binlog_do_db= , binlog_ignore_db= 
    Wed Dec 12 15:06:57 2018 - [info] Replication filtering check ok.
    Wed Dec 12 15:06:57 2018 - [info] GTID (with auto-pos) is not supported
    Wed Dec 12 15:06:57 2018 - [info] Starting SSH connection tests..
    Wed Dec 12 15:06:59 2018 - [info] All SSH connection tests passed successfully.
    Wed Dec 12 15:06:59 2018 - [info] Checking MHA Node version..
    Wed Dec 12 15:07:00 2018 - [info] Version check ok.
    Wed Dec 12 15:07:00 2018 - [info] Checking SSH publickey authentication settings on the current master..
    Wed Dec 12 15:07:00 2018 - [info] HealthCheck: SSH to 10.150.20.97 is reachable.
    Wed Dec 12 15:07:00 2018 - [info] Master MHA Node version is 0.58.
    Wed Dec 12 15:07:00 2018 - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:33061)..
    Wed Dec 12 15:07:00 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000010 
    Wed Dec 12 15:07:00 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22).. 
    Creating /data/mysql_mha/app1 if not exists.. ok.
    Checking output directory is accessible or not..
    ok.
    Binlog found at /data/mysql_33061/logs, up to mysql-bin.000010
    Wed Dec 12 15:07:01 2018 - [info] Binlog setting check done.
    Wed Dec 12 15:07:01 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.90 --slave_ip=10.150.20.90 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
    Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22).. 
    Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/logs, up to relaylog.000002
    Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
    mysql: [Warning] Using a password on the command line interface can be insecure.
    done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
    Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
    Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22).. 
    Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
    Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
    mysql: [Warning] Using a password on the command line interface can be insecure.
    done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
    Wed Dec 12 15:07:01 2018 - [info] Slaves settings check done.
    Wed Dec 12 15:07:01 2018 - [info] 
    10.150.20.97(10.150.20.97:33061) (current master)
    +--10.150.20.90(10.150.20.90:33061)
    +--10.150.20.132(10.150.20.132:33061)
    
    Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.90..
    Wed Dec 12 15:07:01 2018 - [info] ok.
    Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.132..
    Wed Dec 12 15:07:01 2018 - [info] ok.
    Wed Dec 12 15:07:01 2018 - [info] Checking master_ip_failover_script status:
    Wed Dec 12 15:07:01 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=33061 
    Wed Dec 12 15:07:02 2018 - [info] OK.
    Wed Dec 12 15:07:02 2018 - [warning] shutdown_script is not defined.
    Wed Dec 12 15:07:02 2018 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.
    复制环境

    小结:
    1:搭建MHA时,vip绑定需要自行绑定到主库;当主库发生failover,vip会绑定到新主
    2:发生master_ip_failover之后,mha监控程序自动断掉;
    3:vip绑定:
    # ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
    # /usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200
    vip解绑:
    # ip addr del 10.150.20.200/24 dev ens3 label ens3:1
    4:关闭mha监控程序为:
    # masterha_stop --conf=/etc/mysql_mha/app1.cnf
    Stopped app1 successfully.

    5:failover的过程,基本为以下步骤:
    1).配置文件检查阶段,这个阶段会检查整个集群配置文件
    2).宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作
    3).复制dead master和最新slave相差的relay log,并保存到MHA Manger具体的目录下
    4).识别含有最新更新的slave
    5).应用从master保存的二进制日志事件(binlog events)
    6).提升一个slave为新的master进行复制
    7).使其他的slave连接新的master进行复制

  • 相关阅读:
    java操作生成jar包 和写入jar包
    jboss配置jndi连接池
    windows 域的LDAP查询相关举例
    LDAP error Code 及解决方法
    HDU 6417
    CF1299D Around the World
    codechef Chef and The Colored Grid
    Educational Codeforces Round 82 (Rated for Div. 2)
    CF1237F Balanced Domino Placements
    CF1254E Send Tree to Charlie
  • 原文地址:https://www.cnblogs.com/elontian/p/10095199.html
Copyright © 2011-2022 走看看