zoukankan      html  css  js  c++  java
  • MHA自动切换流程

    MHA的全名叫做mysql-master-ha,配置后可以在10-30秒内完成master自动切换,切换过程如下:

    1. 检测master的状态,方法是一秒一次“ SELECT 1 As Value”,发现没有响应后会重复3次检查,如果还没有响应,shutdown并再重复一次SELECT 1 As Value确认master关闭

     

    2. 确认SSH到master所在的机器是否可达

     

    3. 给出消息:Connecting to a master server failed,并开始读取配置文件masterha_default.conf和app1.conf

     

    4. 确认复制切换模式: [info] GTID failover mode = 1

     

    5. 报告整个架构中的机器存活情况

    Fri Jul  1 13:35:33 2016 - [info] Dead Servers:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info] Alive Servers:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.62(192.168.118.62:3306)
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.64(192.168.118.64:3306)
    View Code

     

    6. 检查存活的实例版本、GTID开启情况、是否开启read_only以及复制过滤情况

    Fri Jul  1 13:35:33 2016 - [info] Alive Slaves:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.62(192.168.118.62:3306)  Version=5.6.28-log (oldest major version between slaves) log-bin:enabled
    Fri Jul  1 13:35:33 2016 - [info]     GTID ON
    Fri Jul  1 13:35:33 2016 - [info]     Replicating from 192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.64(192.168.118.64:3306)  Version=5.6.28-log (oldest major version between slaves) log-bin:enabled
    Fri Jul  1 13:35:33 2016 - [info]     GTID ON
    Fri Jul  1 13:35:33 2016 - [info]     Replicating from 192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Fri Jul  1 13:35:33 2016 - [info] Checking slave configurations..
    Fri Jul  1 13:35:33 2016 - [info]  read_only=1 is not set on slave 192.168.118.62(192.168.118.62:3306).
    Fri Jul  1 13:35:33 2016 - [info]  read_only=1 is not set on slave 192.168.118.64(192.168.118.64:3306).
    Fri Jul  1 13:35:33 2016 - [info] Checking replication filtering settings..
    Fri Jul  1 13:35:33 2016 - [info]  Replication filtering check ok.
    View Code

     

    7. 接下来就是在GTID复制基础上的切换过程

    (1) 配置检查阶段,具体检查如下

    [info] ** Phase 1: Configuration Check Phase completed.

     检查项目如下:

     7 Query     SELECT @@global.server_id As Value     
     7 Query     SELECT VERSION() AS Value                              #如果是GTID模式,版本不得小于5.6,如果是普通模式,版本不得小于5.0.45
     7 Query     SELECT @@global.gtid_mode As Value                     #MHA0.56版本开始支持GTID,之前的版本不支持
     7  Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   #binlog必须开启
      7 Query     SHOW MASTER STATUS                    
     7 Query     SELECT @@global.datadir AS Value
     7 Query     SELECT @@global.slave_parallel_workers AS Value        #确定slave是不是多线程并行复制,这个参数的影响还没整明白,再研究下
     7 Query     SHOW SLAVE STATUS                     
     7 Query     SELECT @@global.read_only As Value                     #确定read_only的设置,如果要转为新的master,这个值要设为0
     7 Query     SELECT @@global.relay_log_purge As Value               #确定relay_log是否可自动删除,默认是可以
     7 Query     SELECT @@global.relay_log_info_repository AS Value     #确定relay_log是以file还是table格式存放的,默认是file
     7 Query     SELECT @@global.datadir AS Value                       #确定数据存放位置
     7  Query              SELECT @@global.relay_log_info_file AS Value                                                   #确定relay_log的文件名,为后面slave之间的relay_log应用做准备

     备注:(1)默认情况下,从服务器上的中继日志在SQL线程执行完后会被自动删除的。但是这些中继日志在恢复其他从服务器时候可能会被用到,因此需要禁用中继日志的自动清除和定期清除旧的中继日志

                (2)binlog-do-db和replicate-ignore-db设置必须相同。MHA在启动时候会检测过滤规则,如果过滤规则不同,MHA不启动监控和故障转移

                (3)master.info和relay.info必须是file,不能是table

    (2)彻底关闭master连接的阶段,避免master未关闭导致的脑裂

    [info] * Phase 2: Dead Master Shutdown Phase..

    具体关闭命令是:

    /etc/masterha/master_ip_failover --orig_master_host=192.168.118.3 --orig_master_ip=192.168.118.3 --orig_master_port=3306 --command=stopssh --ssh_user=root  

    关闭完成后给出报告

    [info] * Phase 2: Dead Master Shutdown Phase completed.

     

    (3)master恢复阶段:

    [info] * Phase 3: Master Recovery Phase..

     

    ›1   确认relay log最新的slave实例

    ›2   确定新的master

          如果在配置文件中设置了候选master,会直接确定预设机器实例为master;如果没有预设,会选择含有最新的relay log的那个slave

    ›3   确认新的master后,会先设置sql_log_bin=0以阻塞master日志写入使其他slave赶上复制

           会选取含有最新relay log 的slave,在该slave上设置sql_log_bin=0并在其余slaves上应用该最新relay log,最终获得这个层次的数据一致性,之后再set sql_log_bin=1使恢复日志写入。可以通过半同步复制来解决无法ssh到master所在机器所造成的事务丢失问题

          待全部数据一致后,通过show master status确定新master的日志位置并在其他slave上执行change master语句创建新的主从连接:

    Fri Jul 1 13:35:33 2016 - [info] Getting new master's binlog name and position..
    Fri Jul 1 13:35:33 2016 - [info] mysql-bin.000004:191
    Fri Jul 1 13:35:33 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.118.2', MASTER_PORT=3306, MAST
    ER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

         执行该切换的具体语句是

    /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.118.3 --orig_master_ip=192.168.118.3 --orig_master_port=3306 --new_master_host=192.168.118.2 --new_master_ip=192.168.118.2 --new_master_port=3306 --new_master_user='user' --new_master_password='password'  

         该阶段成果后给出报告:

    Fri Jul  1 13:35:33 2016 - [info] ** Finished master recovery successfully.
    Fri Jul  1 13:35:33 2016 - [info] * Phase 3: Master Recovery Phase completed.

     

    (4)slaves恢复阶段:

    [info] * Phase 4: Slaves Recovery Phase..

    先停止IO线程,等待SQL线程执行完成后,stop slave,清除原slave信息,重新change master指向新的master,start slave ,over

    34 Query     SHOW SLAVE STATUS
                       34 Query     STOP SLAVE IO_THREAD
                       34 Query     SHOW SLAVE STATUS
                       34 Query     SHOW SLAVE STATUS
                       34 Query     STOP SLAVE
                       34 Query     SHOW SLAVE STATUS
                       34 Query     RESET SLAVE
                       34 Query     CHANGE MASTER TO MASTER_HOST = '192.168.118.62' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTER_PORT = 3306
                       34 Query     START SLAVE
                       35 Connect Out       repl@192.168.118.62:3306
                       34 Query     SHOW SLAVE STATUS
    View Code

     

    (5)清除新选出的master上的slave信息

    [info] * Phase 5: New master cleanup phase..

    reset slave all;

     9 Query     STOP SLAVE
                        9 Query     SHOW SLAVE STATUS
                        9 Query     RESET SLAVE /*!50516 ALL */
                        9 Query     SHOW SLAVE STATUS
    View Code

     

     至此,整个切换过程完成,最后给出切换报告,over

    ----- Failover Report -----
    
    app1: MySQL Master failover 192.168.118.3(192.168.118.3:3306) to 192.168.118.2(192.168.118.2:3306) succeeded
    
    Master 192.168.118.3(192.168.118.3:3306) is down!
    
    Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/app1.log for details.
    
    Started automated(non-interactive) failover.
    Invalidated master IP address on 192.168.118.3(192.168.118.3:3306)
    Selected 192.168.118.2(192.168.118.2:3306) as a new master.
    192.168.118.2(192.168.118.2:3306): OK: Applying all logs succeeded.
    192.168.118.2(192.168.118.2:3306): OK: Activated master IP address.
    192.168.118.4(192.168.118.4:3306): OK: Slave started, replicating from 192.168.118.2(192.168.118.2:3306)
    192.168.118.2(192.168.118.2:3306): Resetting slave info succeeded.
    Master failover to 192.168.118.2(192.168.118.2:3306) completed successfully.
  • 相关阅读:
    CentOS虚拟机和物理机共享文件夹实现
    集训第六周 数学概念与方法 概率 数论 最大公约数 G题
    集训第六周 数学概念与方法 概率 F题
    集训第六周 E题
    集训第六周 古典概型 期望 D题 Discovering Gold 期望
    集训第六周 古典概型 期望 C题
    集训第六周 数学概念与方法 UVA 11181 条件概率
    集训第六周 数学概念与方法 UVA 11722 几何概型
    DAG模型(矩形嵌套)
    集训第五周 动态规划 K题 背包
  • 原文地址:https://www.cnblogs.com/qierdan/p/5633459.html
Copyright © 2011-2022 走看看