zoukankan      html  css  js  c++  java
  • MHA-手动Failover流程(传统复制&GTID复制)

    本文仅梳理手动Failover流程。MHA的介绍详见:MySQL高可用架构之MHA

    一、基本环境

    1.1、复制结构

    VMware10.0+CentOS6.9+MySQL5.7.21

    ROLE HOSTNAME BASEDIR DATADIR IP PORT
    Node1 ZST1 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.132 3307
    Node2 ZST2 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.133 3307
    Node3 ZST3 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.134 3307

    传统复制基于Row+Position,GTID复制基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3}

    1.2、MHA配置文件

    文中使用的MHA版本是0.56,并且在Node1、Node2、Node3全部安装manager、node包
    MHA的配置文件如下

    # 全局级配置文件:/etc/masterha/masterha_default.conf
    [root@ZST1 masterha]# cat masterha_default.conf 
    [server default]
    #MySQL的用户和密码
    user=mydba
    password=mysql5721
    
    #系统ssh用户
    ssh_user=root
    
    #复制用户
    repl_user=repl
    repl_password=repl
    
    #监控
    ping_interval=5
    #shutdown_script=/etc/masterha/send_report.sh
    
    #切换调用的脚本
    master_ip_failover_script=/etc/masterha/master_ip_failover
    master_ip_online_change_script=/etc/masterha/master_ip_online_change
    
    log_level=debug
    [root@ZST1 masterha]# 
    
    
    # 集群1配置文件:/etc/masterha/app1.conf
    [root@ZST1 masterha]# cat app1.conf 
    [server default]
    #mha manager工作目录
    manager_workdir=/var/log/masterha/app1
    manager_log=/var/log/masterha/app1/app1.log
    remote_workdir=/var/log/masterha/app1
    
    [server1]
    hostname=192.168.85.132
    port=3307
    master_binlog_dir=/data/mysql/mysql3307/logs
    candidate_master=1
    check_repl_delay=0
    
    [server2]
    hostname=192.168.85.133
    port=3307
    master_binlog_dir=/data/mysql/mysql3307/logs
    candidate_master=1
    check_repl_delay=0
    
    [server3]
    hostname=192.168.85.134
    port=3307
    master_binlog_dir=/data/mysql/mysql3307/logs
    candidate_master=1
    check_repl_delay=0
    [root@ZST1 masterha]# 
    View Code

    1.3、测试数据

    通过停止从节点的io_thread,再往主节点写入数据,模拟出主从数据、从从数据不一致~

    #首先清空表中记录
    mydba@192.168.85.132,3307 [replcrash]> truncate table py_user;
    
    #Node1写入第一条记录
    mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
    #Node3停止io_thread
    mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread;
    
    #Node1写入第二条记录
    mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
    #Node2停止io_thread
    mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread;
    
    #Node1写入第三条记录
    mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
    
    # 最终各节点记录如下
    #Node1有三条记录
    mydba@192.168.85.132,3307 [replcrash]> select * from py_user;
    +-----+----------------------------------+---------------------+-----------+
    | uid | name                             | add_time            | server_id |
    +-----+----------------------------------+---------------------+-----------+
    |   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
    |   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
    |   3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307   |
    +-----+----------------------------------+---------------------+-----------+
    3 rows in set (0.00 sec)
    mydba@192.168.85.132,3307 [replcrash]> show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000004 |     1303 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)
    #Node2有两条记录
    mydba@192.168.85.133,3307 [replcrash]> select * from py_user;
    +-----+----------------------------------+---------------------+-----------+
    | uid | name                             | add_time            | server_id |
    +-----+----------------------------------+---------------------+-----------+
    |   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
    |   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
    +-----+----------------------------------+---------------------+-----------+
    2 rows in set (0.00 sec)
    mydba@192.168.85.133,3307 [replcrash]> show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000007 |     8859 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)
    #Node1有一条记录
    mydba@192.168.85.134,3307 [replcrash]> select * from py_user;
    +-----+----------------------------------+---------------------+-----------+
    | uid | name                             | add_time            | server_id |
    +-----+----------------------------------+---------------------+-----------+
    |   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
    +-----+----------------------------------+---------------------+-----------+
    1 row in set (0.00 sec)
    mydba@192.168.85.134,3307 [replcrash]> show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000002 |    10322 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)
    View Code

    很明显从节点Node3落后于从节点Node2、从节点Node2落后于主节点Node1

    二、传统复制下手动Failover

    手动Failover场景,Master挂掉,但是mha_manager没有开启,可以通过手动Failover

    2.1、手动Failover

    • 关闭Node1节点数据库服务

    # 关闭Node1节点数据库服务
    mydba@192.168.85.132,3307 [replcrash]> shutdown;
    
    # Node2、Node3节点复制状态
    mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
    PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
    mydba@192.168.85.133,3307 [replcrash]> show slave statusG
                  Master_Log_File: mysql-bin.000004
              Read_Master_Log_Pos: 973
            Relay_Master_Log_File: mysql-bin.000004
                 Slave_IO_Running: No
                Slave_SQL_Running: Yes
              Exec_Master_Log_Pos: 973
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
    1 row in set (0.00 sec)
    mydba@192.168.85.133,3307 [replcrash]> 
    
    mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
    PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
    mydba@192.168.85.134,3307 [replcrash]> show slave statusG
                  Master_Log_File: mysql-bin.000004
              Read_Master_Log_Pos: 643
            Relay_Master_Log_File: mysql-bin.000004
                 Slave_IO_Running: No
                Slave_SQL_Running: Yes
              Exec_Master_Log_Pos: 643
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
    1 row in set (0.00 sec)
    mydba@192.168.85.134,3307 [replcrash]> 
    View Code

    此时,是否开启从库的io_thread没啥影响,主库已经down掉,从库的io_thread肯定是连不上去
    • 手动Failover脚本,指定新Master为Node3

    # Node1节点手动故障切换
    [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
    View Code

    此时复制结构为Node1->{Node2、Node3},手动故障切换后结构为:Node3->{Node2}

    2.2、切换流程

    手动Failover日志输出

    # 手动Failover 
    [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
    --dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
    Wed Mar 28 16:01:07 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
    Wed Mar 28 16:01:07 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
    Wed Mar 28 16:01:07 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
    Wed Mar 28 16:01:07 2018 - [info] MHA::MasterFailover version 0.56.
    Wed Mar 28 16:01:07 2018 - [info] Starting master failover.
    Wed Mar 28 16:01:07 2018 - [info] 
    ==================== 1、配置检查阶段,Start ====================
    Wed Mar 28 16:01:07 2018 - [info] * Phase 1: Configuration Check Phase..
    Wed Mar 28 16:01:07 2018 - [info] 
    Wed Mar 28 16:01:08 2018 - [debug] Connecting to servers..
    Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba
    Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0
    Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba
    Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0
    Wed Mar 28 16:01:09 2018 - [debug]  Comparing MySQL versions..
    Wed Mar 28 16:01:09 2018 - [debug]   Comparing MySQL versions done.
    Wed Mar 28 16:01:09 2018 - [debug] Connecting to servers done.
    Wed Mar 28 16:01:09 2018 - [info] GTID failover mode = 0
    Wed Mar 28 16:01:09 2018 - [info] Dead Servers:
    Wed Mar 28 16:01:09 2018 - [info]   192.168.85.132(192.168.85.132:3307)
    Wed Mar 28 16:01:09 2018 - [info] Checking master reachability via MySQL(double check)...
    Wed Mar 28 16:01:09 2018 - [info]  ok.
    Wed Mar 28 16:01:09 2018 - [info] Alive Servers:
    Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)
    Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)
    Wed Mar 28 16:01:09 2018 - [info] Alive Slaves:
    Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
    Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
    Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    ******************** 选择是否继续进行 ********************
    Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes
    Wed Mar 28 16:01:30 2018 - [info] Starting Non-GTID based failover.
    Wed Mar 28 16:01:30 2018 - [info] 
    Wed Mar 28 16:01:30 2018 - [info] ** Phase 1: Configuration Check Phase completed.
    ==================== 1、配置检查阶段,End ====================
    Wed Mar 28 16:01:30 2018 - [info] 
    ==================== 2、故障Master关闭阶段,Start ====================
    Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
    Wed Mar 28 16:01:30 2018 - [info] 
    Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:3307)..
    Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:3307)..
    Wed Mar 28 16:01:30 2018 - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:3307) done.
    Wed Mar 28 16:01:30 2018 - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:3307) done.
    Wed Mar 28 16:01:30 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
    Wed Mar 28 16:01:30 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
    Wed Mar 28 16:01:30 2018 - [info] Forcing shutdown so that applications never connect to the current master..
    Wed Mar 28 16:01:30 2018 - [info] Executing master IP deactivation script:
    Wed Mar 28 16:01:30 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --command=stopssh --ssh_user=root  
    Wed Mar 28 16:01:30 2018 - [info]  done.
    Wed Mar 28 16:01:30 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    ==================== 2、故障Master关闭阶段,End ====================
    Wed Mar 28 16:01:30 2018 - [info] 
    ==================== 3、新Master恢复阶段,Start ====================
    Wed Mar 28 16:01:30 2018 - [info] * Phase 3: Master Recovery Phase..
    Wed Mar 28 16:01:30 2018 - [info] 
    ==================== 3.1、获取最新的Slave ====================
    ******************** 最新Slave,用途1:用于补全其他Slave缺少的relay-log;用途2:用于save故障Master的binlog的起始点 ********************
    Wed Mar 28 16:01:30 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Wed Mar 28 16:01:30 2018 - [info] 
    Wed Mar 28 16:01:30 2018 - [debug] Fetching current slave status..
    Wed Mar 28 16:01:30 2018 - [debug]  Fetching current slave status done.
    Wed Mar 28 16:01:30 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:973
    Wed Mar 28 16:01:30 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Wed Mar 28 16:01:30 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 28 16:01:30 2018 - [debug]    Relay log info repository: FILE
    Wed Mar 28 16:01:30 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Wed Mar 28 16:01:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Wed Mar 28 16:01:30 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:643
    Wed Mar 28 16:01:30 2018 - [info] Oldest slaves:
    Wed Mar 28 16:01:30 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 28 16:01:30 2018 - [debug]    Relay log info repository: FILE
    Wed Mar 28 16:01:30 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Wed Mar 28 16:01:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Wed Mar 28 16:01:30 2018 - [info] 
    ==================== 3.2、保存故障Master的binlog ====================
    Wed Mar 28 16:01:30 2018 - [info] * Phase 3.2: Saving Dead Master''s Binlog Phase..
    Wed Mar 28 16:01:30 2018 - [info] 
    Wed Mar 28 16:01:30 2018 - [info] Fetching dead master''s binary logs..
    ******************** 在故障Master执行,取最新Slave之后的部分 ********************
    Wed Mar 28 16:01:30 2018 - [info] Executing command on the dead master 192.168.85.132(192.168.85.132:3307): save_binary_logs --command=save --start_file=mysql-bin.000004  --start_pos=973 --binlog_dir=/data/mysql/mysql3307/logs --output_file=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug 
      Creating /var/log/masterha/app1 if not exists..    ok.
     Concat binary/relay logs from mysql-bin.000004 pos 973 to mysql-bin.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog ..
    parse_init_headers: file=mysql-bin.000004 event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
     Binlog Checksum enabled
    parse_init_headers: file=mysql-bin.000004 event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
     Got previous gtids log event: 154.
    parse_init_headers: file=mysql-bin.000004 event_type=34 server_id=1323307 length=65 nextmpos=219 prevrelay=154 cur(post)relay=219
      Dumping binlog format description event, from position 0 to 154.. ok.
      Dumping effective binlog data from /data/mysql/mysql3307/logs/mysql-bin.000004 position 973 to tail(1326).. ok.
    parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
     Binlog Checksum enabled
    parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
     Got previous gtids log event: 154.
    parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=154 cur(post)relay=219
     Concat succeeded.
    saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
    ******************** 将得到的Master binlog scp到 管理节点mha-manage/手动failover 运行的工作目录 ********************
    Wed Mar 28 16:01:31 2018 - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
    Wed Mar 28 16:01:31 2018 - [debug] SSH connection test to 192.168.85.133, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
    Wed Mar 28 16:01:31 2018 - [info] HealthCheck: SSH to 192.168.85.133 is reachable.
    Wed Mar 28 16:01:37 2018 - [debug] SSH connection test to 192.168.85.134, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
    Wed Mar 28 16:01:38 2018 - [info] HealthCheck: SSH to 192.168.85.134 is reachable.
    Wed Mar 28 16:01:38 2018 - [info] 
    ==================== 3.3、选举新Master ====================
    Wed Mar 28 16:01:38 2018 - [info] * Phase 3.3: Determining New Master Phase..
    Wed Mar 28 16:01:38 2018 - [info] 
    ******************** 查找最新的Slave是否包含其他Slave缺失的Relay-log ********************
    Wed Mar 28 16:01:38 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
    Wed Mar 28 16:01:38 2018 - [info] Checking whether 192.168.85.133 has relay logs from the oldest position..
    Wed Mar 28 16:01:38 2018 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000004 --latest_rmlp=973 --target_mlf=mysql-bin.000004 --target_rmlp=643 --server_id=1333307 --workdir=/var/log/masterha/app1 --timestamp=20180328160107 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug  :
        Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
        Relay log found at /data/mysql/mysql3307/data, up to relay-bin.000005
     Fast relay log position search succeeded.
     Target relay log file/position found. start_file:relay-bin.000005, start_pos:856.
    Target relay log FOUND!
    Wed Mar 28 16:01:39 2018 - [info] OK. 192.168.85.133 has all relay logs.
    Wed Mar 28 16:01:39 2018 - [info] 192.168.85.134 can be new master.
    Wed Mar 28 16:01:39 2018 - [info] New master is 192.168.85.134(192.168.85.134:3307)
    Wed Mar 28 16:01:39 2018 - [info] Starting master failover..
    Wed Mar 28 16:01:39 2018 - [info] 
    From:
    192.168.85.132(192.168.85.132:3307) (current master)
     +--192.168.85.133(192.168.85.133:3307)
     +--192.168.85.134(192.168.85.134:3307)
    
    To:
    192.168.85.134(192.168.85.134:3307) (new master)
     +--192.168.85.133(192.168.85.133:3307)
    
    ******************** 选择是否进行切换 ********************
    Starting master switch from 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307)? (yes/NO): yes
    Wed Mar 28 16:01:42 2018 - [info] New master decided manually is 192.168.85.134(192.168.85.134:3307)
    Wed Mar 28 16:01:42 2018 - [info] 
    Wed Mar 28 16:01:42 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
    Wed Mar 28 16:01:42 2018 - [info] 
    ******************** 在最新的Slave,产生新Master与最新的Slave缺失的Relay-log ********************
    Wed Mar 28 16:01:42 2018 - [info] Server 192.168.85.134 received relay logs up to: mysql-bin.000004:643
    Wed Mar 28 16:01:42 2018 - [info] Need to get diffs from the latest slave(192.168.85.133) up to: mysql-bin.000004:973 (using the latest slave''s relay logs)
    Wed Mar 28 16:01:43 2018 - [info] Connecting to the latest slave host 192.168.85.133, generating diff relay log files..
    Wed Mar 28 16:01:43 2018 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.85.134 --latest_mlf=mysql-bin.000004 --latest_rmlp=973 --target_mlf=mysql-bin.000004 --target_rmlp=643 --server_id=1333307 --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug 
    Wed Mar 28 16:01:45 2018 - [info] 
        Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
        Relay log found at /data/mysql/mysql3307/data, up to relay-bin.000005
     Fast relay log position search succeeded.
     Target relay log file/position found. start_file:relay-bin.000005, start_pos:856.
     Concat binary/relay logs from relay-bin.000005 pos 856 to relay-bin.000005 EOF into /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog ..
    parse_init_headers: file=relay-bin.000005 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
     Binlog Checksum enabled
    parse_init_headers: file=relay-bin.000005 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
     Got previous gtids log event: 154.
    parse_init_headers: file=relay-bin.000005 event_type=4 server_id=1323307 length=47 nextmpos=0 prevrelay=154 cur(post)relay=201
    parse_init_headers: file=relay-bin.000005 event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=201 cur(post)relay=320
     Binlog Checksum enabled
    parse_init_headers: file=relay-bin.000005 event_type=4 server_id=0 length=47 nextmpos=367 prevrelay=320 cur(post)relay=367
    parse_init_headers: file=relay-bin.000005 event_type=34 server_id=1323307 length=65 nextmpos=219 prevrelay=367 cur(post)relay=432
      Dumping binlog format description event, from position 0 to 367.. ok.
      Dumping effective binlog data from /data/mysql/mysql3307/data/relay-bin.000005 position 856 to tail(1186).. ok.
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
     Binlog Checksum enabled
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
     Got previous gtids log event: 154.
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=4 server_id=1323307 length=47 nextmpos=0 prevrelay=154 cur(post)relay=201
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=201 cur(post)relay=320
     Binlog Checksum enabled
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=4 server_id=0 length=47 nextmpos=367 prevrelay=320 cur(post)relay=367
    parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=708 prevrelay=367 cur(post)relay=432
     Concat succeeded.
     Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog .
    ******************** 将得到的relay-log scp到新Master工作目录 ********************
     scp ZST2:/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to root@192.168.85.134(22) succeeded.
    Wed Mar 28 16:01:45 2018 - [info]  Generating diff files succeeded.
    Wed Mar 28 16:01:45 2018 - [info] Sending binlog..
    saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
    ******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障Master的binlog到新Master工作目录 ********************
    Wed Mar 28 16:01:45 2018 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.134:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
    Wed Mar 28 16:01:45 2018 - [info] 
    ==================== 3.4、新Master应用差异log ====================
    Wed Mar 28 16:01:45 2018 - [info] * Phase 3.4: Master Log Apply Phase..
    Wed Mar 28 16:01:45 2018 - [info] 
    Wed Mar 28 16:01:45 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
    Wed Mar 28 16:01:45 2018 - [info] Starting recovery on 192.168.85.134(192.168.85.134:3307)..
    Wed Mar 28 16:01:45 2018 - [info]  Generating diffs succeeded.
    ******************** 等待新Master应用完自己的relay-log ********************
    Wed Mar 28 16:01:45 2018 - [info] Waiting until all relay logs are applied.
    Wed Mar 28 16:01:45 2018 - [info]  done.
    Wed Mar 28 16:01:45 2018 - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Wed Mar 28 16:01:45 2018 - [debug]   done.
    Wed Mar 28 16:01:45 2018 - [info] Getting slave status..
    Wed Mar 28 16:01:45 2018 - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000004:643). No need to recover from Exec_Master_Log_Pos.
    Wed Mar 28 16:01:45 2018 - [debug] Current max_allowed_packet is 4194304.
    Wed Mar 28 16:01:45 2018 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
    Wed Mar 28 16:01:45 2018 - [info] Connecting to the target slave host 192.168.85.134, running recover script..
    ******************** 新Master按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog ********************
    Wed Mar 28 16:01:45 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134  --slave_port=3307 --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug  --slave_pass=xxx
    Wed Mar 28 16:01:46 2018 - [info] 
    ******************** 将所有缺失的relay-log、binlog汇总到total_binlog ********************
     Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog ..
     Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog.. ok.
      Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
     Binlog Checksum enabled
    parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
     Got previous gtids log event: 154.
    parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=154 cur(post)relay=219
    dumped up to pos 154. ok.
     /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog has effective binlog events from pos 154.
      Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog position 154 to tail(507).. ok.
     Concat succeeded.
    All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog .
    MySQL client version is 5.7.21. Using --binary-mode.
    Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.134:3307. This may take long time...
    Applying log files succeeded.
    Wed Mar 28 16:01:46 2018 - [debug] Setting max_allowed_packet back to 4194304 succeeded.
    Wed Mar 28 16:01:46 2018 - [info]  All relay logs were successfully applied.
    ******************** 新Master应用完所有的relay-log、binlog,得到当前位置 ********************
    Wed Mar 28 16:01:46 2018 - [info] Getting new master''s binlog name and position..
    Wed Mar 28 16:01:46 2018 - [info]  mysql-bin.000002:10948
    Wed Mar 28 16:01:46 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=3307, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=10948, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    ******************** 开启虚拟IP,新Master可以对外提供服务 ********************
    Wed Mar 28 16:01:46 2018 - [info] Executing master IP activate script:
    Wed Mar 28 16:01:46 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port=3307 --new_master_user='mydba' --new_master_password='mysql5721'  
    Set read_only=0 on the new master.
    Wed Mar 28 16:01:52 2018 - [info]  OK.
    Wed Mar 28 16:01:52 2018 - [info] ** Finished master recovery successfully.
    Wed Mar 28 16:01:52 2018 - [info] * Phase 3: Master Recovery Phase completed.
    ==================== 3、新Master恢复阶段,End ====================
    Wed Mar 28 16:01:52 2018 - [info] 
    ==================== 4、Slave恢复阶段,Start ====================
    ******************** Slave恢复过程类似新Master,首先得到与最新的Slave差异relay-log,然后获取故障Master的binlog ********************
    Wed Mar 28 16:01:52 2018 - [info] * Phase 4: Slaves Recovery Phase..
    Wed Mar 28 16:01:52 2018 - [info] 
    ==================== 4.1、生成最新Slave和Slave之间的差异log ====================
    Wed Mar 28 16:01:52 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
    Wed Mar 28 16:01:52 2018 - [info] 
    Wed Mar 28 16:01:52 2018 - [info] -- Slave diff file generation on host 192.168.85.133(192.168.85.133:3307) started, pid: 3488. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180328160107.log if it takes time..
    Wed Mar 28 16:01:52 2018 - [info] 
    Wed Mar 28 16:01:52 2018 - [info] Log messages from 192.168.85.133 ...
    Wed Mar 28 16:01:52 2018 - [info] 
    Wed Mar 28 16:01:52 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
    Wed Mar 28 16:01:52 2018 - [info] End of log messages from 192.168.85.133.
    Wed Mar 28 16:01:52 2018 - [info] -- 192.168.85.133(192.168.85.133:3307) has the latest relay log events.
    Wed Mar 28 16:01:52 2018 - [info] Generating relay diff files from the latest slave succeeded.
    Wed Mar 28 16:01:52 2018 - [info] 
    ==================== 4.2、Slave应用差异log ====================
    Wed Mar 28 16:01:52 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
    Wed Mar 28 16:01:52 2018 - [info] 
    Wed Mar 28 16:01:52 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) started, pid: 3490. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180328160107.log if it takes time..
    saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
    Wed Mar 28 16:01:54 2018 - [debug] Explicitly disabled relay_log_purge.
    Wed Mar 28 16:01:54 2018 - [info] 
    Wed Mar 28 16:01:54 2018 - [info] Log messages from 192.168.85.133 ...
    Wed Mar 28 16:01:54 2018 - [info] 
    Wed Mar 28 16:01:52 2018 - [info] Sending binlog..
    ******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障Master的binlog到Slave工作目录 ********************
    Wed Mar 28 16:01:53 2018 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.133:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
    Wed Mar 28 16:01:53 2018 - [info] Starting recovery on 192.168.85.133(192.168.85.133:3307)..
    Wed Mar 28 16:01:53 2018 - [info]  Generating diffs succeeded.
    Wed Mar 28 16:01:53 2018 - [info] Waiting until all relay logs are applied.
    Wed Mar 28 16:01:53 2018 - [info]  done.
    Wed Mar 28 16:01:53 2018 - [debug]  Stopping SQL thread on 192.168.85.133(192.168.85.133:3307)..
    Wed Mar 28 16:01:53 2018 - [debug]   done.
    Wed Mar 28 16:01:53 2018 - [info] Getting slave status..
    Wed Mar 28 16:01:53 2018 - [info] This slave(192.168.85.133)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000004:973). No need to recover from Exec_Master_Log_Pos.
    Wed Mar 28 16:01:53 2018 - [debug] Current max_allowed_packet is 4194304.
    Wed Mar 28 16:01:53 2018 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
    Wed Mar 28 16:01:53 2018 - [info] Connecting to the target slave host 192.168.85.133, running recover script..
    ******************** Slave按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog ********************
    Wed Mar 28 16:01:53 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.133 --slave_ip=192.168.85.133  --slave_port=3307 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug  --slave_pass=xxx
    Wed Mar 28 16:01:54 2018 - [info] 
    MySQL client version is 5.7.21. Using --binary-mode.
    Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.133:3307. This may take long time...
    Applying log files succeeded.
    Wed Mar 28 16:01:54 2018 - [debug] Setting max_allowed_packet back to 4194304 succeeded.
    Wed Mar 28 16:01:54 2018 - [info]  All relay logs were successfully applied.
    Wed Mar 28 16:01:54 2018 - [info]  Resetting slave 192.168.85.133(192.168.85.133:3307) and starting replication from the new master 192.168.85.134(192.168.85.134:3307)..
    Wed Mar 28 16:01:54 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
    Wed Mar 28 16:01:54 2018 - [debug]   done.
    Wed Mar 28 16:01:54 2018 - [info]  Executed CHANGE MASTER.
    Wed Mar 28 16:01:54 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
    Wed Mar 28 16:01:54 2018 - [debug]   done.
    Wed Mar 28 16:01:54 2018 - [info]  Slave started.
    Wed Mar 28 16:01:54 2018 - [info] End of log messages from 192.168.85.133.
    Wed Mar 28 16:01:54 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) succeeded.
    Wed Mar 28 16:01:54 2018 - [info] All new slave servers recovered successfully.
    ==================== 4、Slave恢复阶段,End ====================
    Wed Mar 28 16:01:54 2018 - [info] 
    ==================== 5、新Master清理阶段,Start ====================
    Wed Mar 28 16:01:54 2018 - [info] * Phase 5: New master cleanup phase..
    Wed Mar 28 16:01:54 2018 - [info] 
    Wed Mar 28 16:01:54 2018 - [info] Resetting slave info on the new master..
    Wed Mar 28 16:01:54 2018 - [debug]  Clearing slave info..
    Wed Mar 28 16:01:54 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Wed Mar 28 16:01:54 2018 - [debug]   done.
    Wed Mar 28 16:01:54 2018 - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
    Wed Mar 28 16:01:54 2018 - [info]  192.168.85.134: Resetting slave info succeeded.
    ==================== 5、新Master清理阶段,End ====================
    Wed Mar 28 16:01:54 2018 - [info] Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
    Wed Mar 28 16:01:54 2018 - [debug]  Disconnected from 192.168.85.133(192.168.85.133:3307)
    Wed Mar 28 16:01:54 2018 - [debug]  Disconnected from 192.168.85.134(192.168.85.134:3307)
    Wed Mar 28 16:01:54 2018 - [info] 
    
    ----- Failover Report -----
    
    app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307) succeeded
    
    Master 192.168.85.132(192.168.85.132:3307) is down!
    
    Check MHA Manager logs at ZST3 for details.
    
    Started manual(interactive) failover.
    Invalidated master IP address on 192.168.85.132(192.168.85.132:3307)
    The latest slave 192.168.85.133(192.168.85.133:3307) has all relay logs for recovery.
    Selected 192.168.85.134(192.168.85.134:3307) as a new master.
    192.168.85.134(192.168.85.134:3307): OK: Applying all logs succeeded.
    192.168.85.134(192.168.85.134:3307): OK: Activated master IP address.
    192.168.85.133(192.168.85.133:3307): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    192.168.85.133(192.168.85.133:3307): OK: Applying all logs succeeded. Slave started, replicating from 192.168.85.134(192.168.85.134:3307)
    192.168.85.134(192.168.85.134:3307): Resetting slave info succeeded.
    Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
    [root@ZST3 app1]# 
    View Code

    手动Failover流程

    手动Failover(传统)
    1、配置检查:连接各实例,检查服务状态,检查主从关系
    2、故障Master关闭:停止各Slave上的IO Thread,故障Master虚拟IP摘除(stopssh)
    3、新Master恢复
        3.1、获取最新的Slave
            用于补全新Master/其他Slave缺少的数据;用于save故障Master的binlog的起始点
        3.2、保存故障Master的binlog
            故障Master上执行save_binary_logs(只取最新Slave之后的部分)
    将得到的binlog scp到手动Failover运行的工作目录
        3.3、选举新Master
            查找最新的Slave是否包含最旧的Slave缺失的relay-log
            确定新Master,得到切换前后结构
            生成最新Slave和新Master之间的差异relay-log,并拷贝到新Master的工作目录
            从手动Failover运行的工作目录scp故障Master的binlog到新Master工作目录
        3.4、新Master应用差异log
            等待新Master应用完自己的relay-log
            按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog
            将所有缺失的relay-log、binlog汇总到total_binlog
            得到新Master的binlog:pos,其他Slave将从这个位置开始复制
            绑定虚拟IP,新Master可以对外提供服务
    4、其他Slave恢复
        4.1、生成差异log
            生成最新Slave和Slave之间的差异relay-log,并拷贝到Slave的工作目录;从手动Failover运行的工作目录scp故障Master的binlog到Slave工作目录
        4.2、Slave应用差异log
            等待Slave应用完自己的relay-log;按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog;重置Slave上的复制到新Master~
        4.3、如果存在多个Slaves,重复上述操作
    5、新Master清理:清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;
    View Code

    2.3、目录文件

    切换流程需要补全数据,会产生各类文件

    # 故障Master
    [root@ZST1 app1]# ll
    total 4
    -rw-r--r-- 1 root root 507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
    [root@ZST1 app1]# 
    Dead Master

    saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master生成,然后拷贝到 MHA管理节点/手动Failover 工作目录

    # 最新的Slave
    [root@ZST2 app1]# ll
    total 12
    -rw-r--r--. 1 root root  697 Mar 28 16:01 relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
    -rw-r--r--. 1 root root 2867 Mar 28 16:01 relay_log_apply_for_192.168.85.133_3307_20180328160107_err.log
    -rw-r--r--. 1 root root  507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
    [root@ZST2 app1]# 
    Latest Slave

    relay_from_read_to_latest_**:最新Slave与其他Slave之间的差异relay-log,在最新Slave生成,然后拷贝到其他对应Slave
    saved_master_binlog_from_**:从管理节点拷贝过来,源头在故障Master

    # 新Master
    [root@ZST3 app1]# ll
    total 16
    -rw-r--r--. 1 root root    0 Mar 28 16:01 app1.failover.complete
    -rw-r--r--. 1 root root  697 Mar 28 16:01 relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
    -rw-r--r--. 1 root root 3629 Mar 28 16:01 relay_log_apply_for_192.168.85.134_3307_20180328160107_err.log
    -rw-r--r--. 1 root root  507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
    -rw-r--r--. 1 root root 1050 Mar 28 16:01 total_binlog_for_192.168.85.134_3307.20180328160107.binlog
    [root@ZST3 app1]# 
    New Master

    relay_from_read_to_latest_**:从最新的Slave上拷贝过来
    saved_master_binlog_from_ **:从管理节点拷贝过来,源头在故障Master
    total_binlog_for_**:汇总所有缺失的relay-log、binlog信息
    • 解析差异log,查看文件中的日志信息

    #最新Slave与其他Slave之间的差异relay-log
    [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    # at 4
    #180328 15:41:18 server id 1333307  end_log_pos 123 CRC32 0x152b7e41    Start: binlog v 4, server v 5.7.21-log created 180328 15:41:18
    # This Format_description_event appears in a relay log and was generated by the slave thread.
    # at 123
    #180328 15:41:18 server id 1333307  end_log_pos 154 CRC32 0x5ea2e9c6    Previous-GTIDs
    # [empty]
    # at 154
    #700101  8:00:00 server id 1323307  end_log_pos 0 CRC32 0x2076d50b      Rotate to mysql-bin.000004  pos: 4
    # at 201
    #180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
    ROLLBACK/*!*/;
    # at 320
    #180328 15:41:18 server id 0  end_log_pos 367 CRC32 0x838279dd  Rotate to mysql-bin.000004  pos: 154
    # at 367
    #180328 15:53:50 server id 1323307  end_log_pos 708 CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=2        sequence_number=3       rbr_only=yes
    /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
    SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
    # at 432
    #180328 15:53:50 server id 1323307  end_log_pos 793 CRC32 0x112f5399    Query   thread_id=2     exec_time=0     error_code=0
    SET TIMESTAMP=1522223630/*!*/;
    SET @@session.pseudo_thread_id=2/*!*/;
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
    SET @@session.sql_mode=1436549152/*!*/;
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
    /*!C utf8 *//*!*/;
    SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
    SET @@session.time_zone='SYSTEM'/*!*/;
    SET @@session.lc_time_names=0/*!*/;
    SET @@session.collation_database=DEFAULT/*!*/;
    BEGIN
    /*!*/;
    # at 517
    #180328 15:53:50 server id 1323307  end_log_pos 856 CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number 108
    # at 580
    #180328 15:53:50 server id 1323307  end_log_pos 942 CRC32 0xccb038f5    Write_rows: table id 108 flags: STMT_END_F
    ### INSERT INTO `replcrash`.`py_user`
    ### SET
    ###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
    ###   @2='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
    ###   @3='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
    ###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
    # at 666
    #180328 15:53:50 server id 1323307  end_log_pos 973 CRC32 0xbfda64ba    Xid = 31
    COMMIT/*!*/;
    SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
    DELIMITER ;
    # End of log file
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
    [root@ZST3 app1]# 
    
    
    #故障Master与最新Slave之间的差异binlog
    [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    # at 4
    #180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
    ROLLBACK/*!*/;
    # at 123
    #180328 15:49:33 server id 1323307  end_log_pos 154 CRC32 0x37f9307d    Previous-GTIDs
    # [empty]
    # at 154
    #180328 15:54:01 server id 1323307  end_log_pos 1038 CRC32 0x74680cfa   Anonymous_GTID  last_committed=3        sequence_number=4       rbr_only=yes
    /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
    SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
    # at 219
    #180328 15:54:01 server id 1323307  end_log_pos 1123 CRC32 0x3774a1d0   Query   thread_id=2     exec_time=0     error_code=0
    SET TIMESTAMP=1522223641/*!*/;
    SET @@session.pseudo_thread_id=2/*!*/;
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
    SET @@session.sql_mode=1436549152/*!*/;
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
    /*!C utf8 *//*!*/;
    SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
    SET @@session.time_zone='SYSTEM'/*!*/;
    SET @@session.lc_time_names=0/*!*/;
    SET @@session.collation_database=DEFAULT/*!*/;
    BEGIN
    /*!*/;
    # at 304
    #180328 15:54:01 server id 1323307  end_log_pos 1186 CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number 108
    # at 367
    #180328 15:54:01 server id 1323307  end_log_pos 1272 CRC32 0x79523051   Write_rows: table id 108 flags: STMT_END_F
    ### INSERT INTO `replcrash`.`py_user`
    ### SET
    ###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
    ###   @2='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
    ###   @3='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
    ###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
    # at 453
    #180328 15:54:01 server id 1323307  end_log_pos 1303 CRC32 0xb93ce981   Xid = 32
    COMMIT/*!*/;
    # at 484
    #180328 15:57:10 server id 1323307  end_log_pos 1326 CRC32 0x577dc41e   Stop
    SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
    DELIMITER ;
    # End of log file
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
    [root@ZST3 app1]# 
    
    
    #所有缺失的relay-log、binlog信息
    [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows total_binlog_for_192.168.85.134_3307.20180328160107.binlog
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    # at 4
    #180328 15:41:18 server id 1333307  end_log_pos 123 CRC32 0x152b7e41    Start: binlog v 4, server v 5.7.21-log created 180328 15:41:18
    # This Format_description_event appears in a relay log and was generated by the slave thread.
    # at 123
    #180328 15:41:18 server id 1333307  end_log_pos 154 CRC32 0x5ea2e9c6    Previous-GTIDs
    # [empty]
    # at 154
    #700101  8:00:00 server id 1323307  end_log_pos 0 CRC32 0x2076d50b      Rotate to mysql-bin.000004  pos: 4
    # at 201
    #180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
    ROLLBACK/*!*/;
    # at 320
    #180328 15:41:18 server id 0  end_log_pos 367 CRC32 0x838279dd  Rotate to mysql-bin.000004  pos: 154
    # at 367
    #180328 15:53:50 server id 1323307  end_log_pos 708 CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=2        sequence_number=3       rbr_only=yes
    /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
    SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
    # at 432
    #180328 15:53:50 server id 1323307  end_log_pos 793 CRC32 0x112f5399    Query   thread_id=2     exec_time=0     error_code=0
    SET TIMESTAMP=1522223630/*!*/;
    SET @@session.pseudo_thread_id=2/*!*/;
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
    SET @@session.sql_mode=1436549152/*!*/;
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
    /*!C utf8 *//*!*/;
    SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
    SET @@session.time_zone='SYSTEM'/*!*/;
    SET @@session.lc_time_names=0/*!*/;
    SET @@session.collation_database=DEFAULT/*!*/;
    BEGIN
    /*!*/;
    # at 517
    #180328 15:53:50 server id 1323307  end_log_pos 856 CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number 108
    # at 580
    #180328 15:53:50 server id 1323307  end_log_pos 942 CRC32 0xccb038f5    Write_rows: table id 108 flags: STMT_END_F
    ### INSERT INTO `replcrash`.`py_user`
    ### SET
    ###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
    ###   @2='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
    ###   @3='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
    ###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
    # at 666
    #180328 15:53:50 server id 1323307  end_log_pos 973 CRC32 0xbfda64ba    Xid = 31
    COMMIT/*!*/;
    # at 697
    #180328 15:54:01 server id 1323307  end_log_pos 1038 CRC32 0x74680cfa   Anonymous_GTID  last_committed=3        sequence_number=4       rbr_only=yes
    /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
    SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
    # at 762
    #180328 15:54:01 server id 1323307  end_log_pos 1123 CRC32 0x3774a1d0   Query   thread_id=2     exec_time=0     error_code=0
    SET TIMESTAMP=1522223641/*!*/;
    BEGIN
    /*!*/;
    # at 847
    #180328 15:54:01 server id 1323307  end_log_pos 1186 CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number 108
    # at 910
    #180328 15:54:01 server id 1323307  end_log_pos 1272 CRC32 0x79523051   Write_rows: table id 108 flags: STMT_END_F
    ### INSERT INTO `replcrash`.`py_user`
    ### SET
    ###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
    ###   @2='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
    ###   @3='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
    ###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
    # at 996
    #180328 15:54:01 server id 1323307  end_log_pos 1303 CRC32 0xb93ce981   Xid = 32
    COMMIT/*!*/;
    # at 1027
    #180328 15:57:10 server id 1323307  end_log_pos 1326 CRC32 0x577dc41e   Stop
    SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
    DELIMITER ;
    # End of log file
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
    [root@ZST3 app1]# 
    View Code

    手动故障切换后结构为:Node3->{Node2},且数据进行了自动补全

    三、GTID复制下手动Failover

    3.1、MHA配置文件调整

    MHA在GTID模式下,需要配置[binlog*],可以是单独的Binlog Server服务器,也可以是主库的binlog目录。如果不配置[binlog*],即使主服务器没挂,也不会从主服务器拉binlog,所有未传递到从库的日志将丢失

    #app1.conf尾部添加Binlog Server信息
    [root@ZST1 masterha]# cat app1.conf 
    ...
    [binlog1]
    hostname=192.168.85.132
    master_binlog_dir=/data/mysql/mysql3307/logs
    no_master=1
    [root@ZST1 masterha]# 
    View Code

    3.2、手动Failover

    基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3},重新生成测试数据,关闭Node1节点数据库服务,执行手动Failover脚本

    # GTID+手动Failover
    [root@ZST1 masterha]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
    --dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
    Thu Mar 29 15:00:32 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
    Thu Mar 29 15:00:32 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
    Thu Mar 29 15:00:32 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
    Thu Mar 29 15:00:32 2018 - [info] MHA::MasterFailover version 0.56.
    Thu Mar 29 15:00:32 2018 - [info] Starting master failover.
    Thu Mar 29 15:00:32 2018 - [info] 
    ==================== 1、配置检查阶段,Start ====================
    Thu Mar 29 15:00:32 2018 - [info] * Phase 1: Configuration Check Phase..
    Thu Mar 29 15:00:32 2018 - [info] 
    Thu Mar 29 15:00:32 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
    Thu Mar 29 15:00:32 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
    Thu Mar 29 15:00:32 2018 - [info] Binlog server 192.168.85.132 is reachable.
    Thu Mar 29 15:00:32 2018 - [debug] Connecting to servers..
    Thu Mar 29 15:00:32 2018 - [debug]  Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba
    Thu Mar 29 15:00:32 2018 - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0
    Thu Mar 29 15:00:32 2018 - [debug]  Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba
    Thu Mar 29 15:00:32 2018 - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0
    Thu Mar 29 15:00:32 2018 - [debug]  Comparing MySQL versions..
    Thu Mar 29 15:00:32 2018 - [debug]   Comparing MySQL versions done.
    Thu Mar 29 15:00:32 2018 - [debug] Connecting to servers done.
    Thu Mar 29 15:00:32 2018 - [info] GTID failover mode = 1
    Thu Mar 29 15:00:32 2018 - [info] Dead Servers:
    Thu Mar 29 15:00:32 2018 - [info]   192.168.85.132(192.168.85.132:3307)
    Thu Mar 29 15:00:32 2018 - [info] Checking master reachability via MySQL(double check)...
    Thu Mar 29 15:00:32 2018 - [info]  ok.
    Thu Mar 29 15:00:32 2018 - [info] Alive Servers:
    Thu Mar 29 15:00:32 2018 - [info]   192.168.85.133(192.168.85.133:3307)
    Thu Mar 29 15:00:32 2018 - [info]   192.168.85.134(192.168.85.134:3307)
    Thu Mar 29 15:00:32 2018 - [info] Alive Slaves:
    Thu Mar 29 15:00:32 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Thu Mar 29 15:00:32 2018 - [info]     GTID ON
    Thu Mar 29 15:00:32 2018 - [debug]    Relay log info repository: FILE
    Thu Mar 29 15:00:32 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Thu Mar 29 15:00:32 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Thu Mar 29 15:00:32 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Thu Mar 29 15:00:32 2018 - [info]     GTID ON
    Thu Mar 29 15:00:32 2018 - [debug]    Relay log info repository: FILE
    Thu Mar 29 15:00:32 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Thu Mar 29 15:00:32 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    ******************** 选择是否继续进行 ********************
    Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes
    Thu Mar 29 15:00:34 2018 - [info] Starting GTID based failover.
    Thu Mar 29 15:00:34 2018 - [info] 
    Thu Mar 29 15:00:34 2018 - [info] ** Phase 1: Configuration Check Phase completed.
    ==================== 1、配置检查阶段,End ====================
    Thu Mar 29 15:00:34 2018 - [info] 
    ==================== 2、故障Master关闭阶段,Start ====================
    Thu Mar 29 15:00:34 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
    Thu Mar 29 15:00:34 2018 - [info] 
    Thu Mar 29 15:00:34 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
    Thu Mar 29 15:00:34 2018 - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:34 2018 - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:3307)..
    Thu Mar 29 15:00:34 2018 - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:3307) done.
    Thu Mar 29 15:00:34 2018 - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:3307) done.
    Thu Mar 29 15:00:34 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
    Thu Mar 29 15:00:35 2018 - [info] Forcing shutdown so that applications never connect to the current master..
    Thu Mar 29 15:00:35 2018 - [info] Executing master IP deactivation script:
    Thu Mar 29 15:00:35 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --command=stopssh --ssh_user=root  
    Thu Mar 29 15:00:35 2018 - [info]  done.
    Thu Mar 29 15:00:35 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Thu Mar 29 15:00:35 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    ==================== 2、故障Master关闭阶段,End ====================
    Thu Mar 29 15:00:35 2018 - [info] 
    ==================== 3、新Master恢复阶段,Start ====================
    Thu Mar 29 15:00:35 2018 - [info] * Phase 3: Master Recovery Phase..
    Thu Mar 29 15:00:35 2018 - [info] 
    ==================== 3.1、获取最新的Slave ====================
    ******************** 最新Slave,用于补全New Master缺少的数据;用于save故障Master的binlog的起始点 ********************
    Thu Mar 29 15:00:35 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Thu Mar 29 15:00:35 2018 - [info] 
    Thu Mar 29 15:00:35 2018 - [debug] Fetching current slave status..
    Thu Mar 29 15:00:35 2018 - [debug]  Fetching current slave status done.
    Thu Mar 29 15:00:35 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:1013
    Thu Mar 29 15:00:35 2018 - [info] Retrieved Gtid Set: 90b30799-9215-11e7-8645-000c29c1025c:8-11
    Thu Mar 29 15:00:35 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Thu Mar 29 15:00:35 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Thu Mar 29 15:00:35 2018 - [info]     GTID ON
    Thu Mar 29 15:00:35 2018 - [debug]    Relay log info repository: FILE
    Thu Mar 29 15:00:35 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Thu Mar 29 15:00:35 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Thu Mar 29 15:00:35 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:683
    Thu Mar 29 15:00:35 2018 - [info] Retrieved Gtid Set: 90b30799-9215-11e7-8645-000c29c1025c:8-10
    Thu Mar 29 15:00:35 2018 - [info] Oldest slaves:
    Thu Mar 29 15:00:35 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
    Thu Mar 29 15:00:35 2018 - [info]     GTID ON
    Thu Mar 29 15:00:35 2018 - [debug]    Relay log info repository: FILE
    Thu Mar 29 15:00:35 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
    Thu Mar 29 15:00:35 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Thu Mar 29 15:00:35 2018 - [info] 
    ==================== 3.3、选举新Master ====================
    Thu Mar 29 15:00:35 2018 - [info] * Phase 3.3: Determining New Master Phase..
    Thu Mar 29 15:00:35 2018 - [info] 
    Thu Mar 29 15:00:35 2018 - [info] 192.168.85.134 can be new master.
    Thu Mar 29 15:00:35 2018 - [info] New master is 192.168.85.134(192.168.85.134:3307)
    Thu Mar 29 15:00:35 2018 - [info] Starting master failover..
    Thu Mar 29 15:00:35 2018 - [info] 
    From:
    192.168.85.132(192.168.85.132:3307) (current master)
     +--192.168.85.133(192.168.85.133:3307)
     +--192.168.85.134(192.168.85.134:3307)
    
    To:
    192.168.85.134(192.168.85.134:3307) (new master)
     +--192.168.85.133(192.168.85.133:3307)
    
    ******************** 选择是否进行切换 ********************
    Starting master switch from 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307)? (yes/NO): yes
    Thu Mar 29 15:00:47 2018 - [info] New master decided manually is 192.168.85.134(192.168.85.134:3307)
    Thu Mar 29 15:00:47 2018 - [info] 
    Thu Mar 29 15:00:47 2018 - [info] * Phase 3.3: New Master Recovery Phase..
    Thu Mar 29 15:00:47 2018 - [info] 
    ******************** 等待新Master应用完自己的relay-log ********************
    Thu Mar 29 15:00:47 2018 - [info]  Waiting all logs to be applied.. 
    Thu Mar 29 15:00:47 2018 - [info]   done.
    Thu Mar 29 15:00:47 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:47 2018 - [debug]   done.
    Thu Mar 29 15:00:47 2018 - [info]  Replicating from the latest slave 192.168.85.133(192.168.85.133:3307) and waiting to apply..
    ******************** 等待最新的Slave应用完自己的relay-log ********************
    Thu Mar 29 15:00:47 2018 - [info]  Waiting all logs to be applied on the latest slave.. 
    ******************** 将新Master change到最新的Slave,以补全差异数据 ********************
    Thu Mar 29 15:00:47 2018 - [info]  Resetting slave 192.168.85.134(192.168.85.134:3307) and starting replication from the new master 192.168.85.133(192.168.85.133:3307)..
    Thu Mar 29 15:00:47 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:47 2018 - [debug]   done.
    Thu Mar 29 15:00:47 2018 - [info]  Executed CHANGE MASTER.
    Thu Mar 29 15:00:47 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:48 2018 - [debug]   done.
    Thu Mar 29 15:00:48 2018 - [info]  Slave started.
    Thu Mar 29 15:00:48 2018 - [info]  Waiting to execute all relay logs on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:48 2018 - [info]  master_pos_wait(mysql-bin.000009:3095) completed on 192.168.85.134(192.168.85.134:3307). Executed 0 events.
    Thu Mar 29 15:00:48 2018 - [info]   done.
    Thu Mar 29 15:00:48 2018 - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:48 2018 - [debug]   done.
    Thu Mar 29 15:00:48 2018 - [info]   done.
    Thu Mar 29 15:00:48 2018 - [info] -- Saving binlog from host 192.168.85.132 started, pid: 6161
    Thu Mar 29 15:00:48 2018 - [info] 
    Thu Mar 29 15:00:48 2018 - [info] Log messages from 192.168.85.132 ...
    Thu Mar 29 15:00:48 2018 - [info] 
    ******************** 在故障Master/BinlogServer执行,取最新Slave之后的部分 ********************
    Thu Mar 29 15:00:48 2018 - [info] Fetching binary logs from binlog server 192.168.85.132..
    Thu Mar 29 15:00:48 2018 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=1013 --output_file=/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.21-log  --debug  --binlog_dir=/data/mysql/mysql3307/logs 
      Creating /var/log/masterha/app1 if not exists..    ok.
     Concat binary/relay logs from mysql-bin.000009 pos 1013 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog ..
    Executing command: mysqlbinlog --start-position=1013  /data/mysql/mysql3307/logs/mysql-bin.000009 >> /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog
     Concat succeeded.
    ******************** 将得到的binlog scp到 手动failover 运行的工作目录 ********************
    Thu Mar 29 15:00:48 2018 - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog to local:/var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog succeeded.
    Thu Mar 29 15:00:48 2018 - [info] End of log messages from 192.168.85.132.
    Thu Mar 29 15:00:48 2018 - [info] Saved mysqlbinlog size from 192.168.85.132 is 2373 bytes.
    Thu Mar 29 15:00:48 2018 - [info] Applying differential binlog /var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog ..
    Thu Mar 29 15:00:48 2018 - [info] Differential log apply from binlog server succeeded.
    ******************** 新Master应用完binlog,得到当前位置 ********************
    Thu Mar 29 15:00:48 2018 - [info] Getting new master''s binlog name and position..
    Thu Mar 29 15:00:48 2018 - [info]  mysql-bin.000004:3408
    Thu Mar 29 15:00:48 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=3307, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    Thu Mar 29 15:00:48 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000004, 3408, 90b30799-9215-11e7-8645-000c29c1025c:1-12
    ******************** 开启虚拟IP,新Master可以对外提供服务 ********************
    Thu Mar 29 15:00:48 2018 - [info] Executing master IP activate script:
    Thu Mar 29 15:00:48 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port=3307 --new_master_user='mydba' --new_master_password='mysql5721'  
    Set read_only=0 on the new master.
    RTNETLINK answers: Cannot assign requested address
    RTNETLINK answers: File exists
    Thu Mar 29 15:00:49 2018 - [info]  OK.
    Thu Mar 29 15:00:49 2018 - [info] ** Finished master recovery successfully.
    Thu Mar 29 15:00:49 2018 - [info] * Phase 3: Master Recovery Phase completed.
    ==================== 3、新Master恢复阶段,End ====================
    Thu Mar 29 15:00:49 2018 - [info] 
    ==================== 4、Slave恢复阶段,Start ====================
    Thu Mar 29 15:00:49 2018 - [info] * Phase 4: Slaves Recovery Phase..
    Thu Mar 29 15:00:49 2018 - [info] 
    Thu Mar 29 15:00:49 2018 - [info] 
    ==================== 4.1、Slave直接change master to New_Master ====================
    Thu Mar 29 15:00:49 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
    Thu Mar 29 15:00:49 2018 - [info] 
    Thu Mar 29 15:00:49 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) started, pid: 6201. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180329150032.log if it takes time..
    Thu Mar 29 15:00:50 2018 - [info] 
    Thu Mar 29 15:00:50 2018 - [info] Log messages from 192.168.85.133 ...
    Thu Mar 29 15:00:50 2018 - [info] 
    Thu Mar 29 15:00:49 2018 - [info]  Resetting slave 192.168.85.133(192.168.85.133:3307) and starting replication from the new master 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:49 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
    Thu Mar 29 15:00:49 2018 - [debug]   done.
    Thu Mar 29 15:00:49 2018 - [info]  Executed CHANGE MASTER.
    Thu Mar 29 15:00:49 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
    Thu Mar 29 15:00:50 2018 - [debug]   done.
    Thu Mar 29 15:00:50 2018 - [info]  Slave started.
    Thu Mar 29 15:00:50 2018 - [info]  gtid_wait(90b30799-9215-11e7-8645-000c29c1025c:1-12) completed on 192.168.85.133(192.168.85.133:3307). Executed 0 events.
    Thu Mar 29 15:00:50 2018 - [info] End of log messages from 192.168.85.133.
    Thu Mar 29 15:00:50 2018 - [info] -- Slave on host 192.168.85.133(192.168.85.133:3307) started.
    Thu Mar 29 15:00:50 2018 - [info] All new slave servers recovered successfully.
    ==================== 4、Slave恢复阶段,End ====================
    Thu Mar 29 15:00:50 2018 - [info] 
    ==================== 5、新Master清理阶段,Start ====================
    Thu Mar 29 15:00:50 2018 - [info] * Phase 5: New master cleanup phase..
    Thu Mar 29 15:00:50 2018 - [info] 
    Thu Mar 29 15:00:50 2018 - [info] Resetting slave info on the new master..
    Thu Mar 29 15:00:50 2018 - [debug]  Clearing slave info..
    Thu Mar 29 15:00:50 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
    Thu Mar 29 15:00:50 2018 - [debug]   done.
    Thu Mar 29 15:00:50 2018 - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
    Thu Mar 29 15:00:50 2018 - [info]  192.168.85.134: Resetting slave info succeeded.
    ==================== 5、新Master清理阶段,End ====================
    Thu Mar 29 15:00:50 2018 - [info] Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
    Thu Mar 29 15:00:50 2018 - [debug]  Disconnected from 192.168.85.133(192.168.85.133:3307)
    Thu Mar 29 15:00:50 2018 - [debug]  Disconnected from 192.168.85.134(192.168.85.134:3307)
    Thu Mar 29 15:00:50 2018 - [info] 
    
    ----- Failover Report -----
    
    app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307) succeeded
    
    Master 192.168.85.132(192.168.85.132:3307) is down!
    
    Check MHA Manager logs at ZST1 for details.
    
    Started manual(interactive) failover.
    Invalidated master IP address on 192.168.85.132(192.168.85.132:3307)
    Selected 192.168.85.134(192.168.85.134:3307) as a new master.
    192.168.85.134(192.168.85.134:3307): OK: Applying all logs succeeded.
    192.168.85.134(192.168.85.134:3307): OK: Activated master IP address.
    192.168.85.133(192.168.85.133:3307): OK: Slave started, replicating from 192.168.85.134(192.168.85.134:3307)
    192.168.85.134(192.168.85.134:3307): Resetting slave info succeeded.
    Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
    [root@ZST1 masterha]#
    View Code

    手动Failover流程

    手动Failover(GTID)
    1、配置检查:连接各实例,检查服务状态,检查主从关系
    2、故障Master关闭:停止各Slave上的IO Thread,故障Master虚拟IP摘除(stopssh)
    3、新Master恢复
        3.1、获取最新的Slave
            用于补全新Master缺少的数据;用于save故障Master的binlog的起始点
        3.2、选举新Master
            确定新Master,得到切换前后结构
        3.3、新Master恢复
            3.3.1、补全新Master与最新Slave差异
                等待新Master应用完自己的relay-log;等待最新Slave应用完自己的relay-log;将新Master change到最新Slave,以补全差异数据
            3.3.2、补全新Master与故障Master差异
                故障Master/BinlogServer上执行save_binary_logs;将得到的binlog scp到手动Failover运行的工作目录;新Master应用完binlog,得到当前位置;绑定虚拟IP,新Master可以对外提供服务
    4、其他Slave恢复
        4.1、重置复制,RESET SLAVE;CHANGE MASTER TO New Master;
        4.2、如果存在多个Slaves,重复上述操作
    5、新Master清理:清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;
    View Code

    3.3、传统和GTID下手动Failover流程区别

    为了得到详细的切换日志,建议
    • MHA配置文件开启log_level=debug
    • Node1、Node2、Node3节点模拟数据差异
    • New Master分别选择Node2、Node3
    手动Failover(GTID),建议打开general-log,以查看New Master与Latest Slave之间数据补全方式

      传统 GTID
    是否补全数据 只要主节点服务器没挂,默认会将所有数据补全 需在配置文件将master/binlog server配置到[binlog*],才能补全Dead Master上的差异log,否则只应用到Latest Slave
    补全数据的方式    新Master/其他Slave拉取Latest Slave的relay-log 新master拉取Latest Slave的binlog
    所有的新Master/其他Slave生成与Latest Slave之间差异的relay-log,并应用这些relay-log(对应文件relay_from_read_to_latest_**) 新Master change to Latest Slave,以补全与Latest Slave之间的差异数据
    新Master/其他Slave应用Latest Slave与Dead Master之间的差异binlog(对应文件saved_master_binlog_from_**) 新Master追平Latest Slave后,再通过save_binary_logs生成与Dead Master之间的差异binlog,并应用(对应文件saved_binlog_binlog1_**)
      其他Slave不需应用任何差异log,直接change master to new_master即可
    生成的文件   relay_from_read_to_latest_**:最新Slave与其他Slave之间的差异relay-log,在最新Slave生成,然后拷贝到其他对应Slave saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master/BinlogServer生成,然后拷贝到手动Failover运行的工作目录
    saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master生成,先拷贝到手动Failover运行的工作目录,然后拷贝到其他Slave  
    文件可以使用mysqlbinlog解析~.~  文件不能使用mysqlbinlog解析(・ω・)也许是姿势不对~不过它们的命令确实稍有不同~~ 

    GTID环境,只有在处理Dead Master数据时,才使用save_binary_logs的方式(主库挂掉,没法change),其他都是直接通过change master to利用复制线程补全数据。同时它也不再依赖Latest Slave的relay-log
    总的来说GTID环境下MHA有点臃肿,有能力的可以自行写脚本处理:
    确定Latest_Slave->New_Master:change master to Latest_Slave->mysqlbinlog ./binlogserver/binlog --start-positon>New_Master->Other_Slave change master to New_Master
    如果使用增强半同步,基本能确保Dead_Master上的binlog全部传递到Latest_Slave,这种情况下进行故障切换更加简单(⊙_⊙)

  • 相关阅读:
    C# 把一个文件夹下所有文件复制到另一个文件夹下 把一个文件夹下所有文件删除(转)
    【总结整理】webGIS学习thinkGIS(四)WebGIS中通过行列号来换算出多种瓦片的URL 之离线地
    ARCGIS空间叠加分析(转)
    ARCGIS中怎么去除重复的面?(转)
    关于写作赚钱(转)
    【总结整理】WebGIS学习-thinkGIS(三):关于影像金字塔、瓦片行列号、分辨率resolution
    【总结整理】WebGIS学习-thinkGIS(地理常识):
    【总结整理】WebGIS学习-thinkGIS(二):关于level,比例尺scale,分辨率resolution
    【总结整理】AMAP学习AMAP.PlaceSearch()
    logging、hashlib、collections模块
  • 原文地址:https://www.cnblogs.com/Uest/p/8665478.html
Copyright © 2011-2022 走看看