zoukankan      html  css  js  c++  java
  • MHA自动切换流程

    MHA的全名叫做mysql-master-ha,配置后可以在10-30秒内完成master自动切换,切换过程如下:

    1. 检测master的状态,方法是一秒一次“ SELECT 1 As Value”,发现没有响应后会重复3次检查,如果还没有响应,shutdown并再重复一次SELECT 1 As Value确认master关闭

     

    2. 确认SSH到master所在的机器是否可达

     

    3. 给出消息:Connecting to a master server failed,并开始读取配置文件masterha_default.conf和app1.conf

     

    4. 确认复制切换模式: [info] GTID failover mode = 1

     

    5. 报告整个架构中的机器存活情况

    Fri Jul  1 13:35:33 2016 - [info] Dead Servers:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info] Alive Servers:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.62(192.168.118.62:3306)
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.64(192.168.118.64:3306)
    View Code

     

    6. 检查存活的实例版本、GTID开启情况、是否开启read_only以及复制过滤情况

    Fri Jul  1 13:35:33 2016 - [info] Alive Slaves:
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.62(192.168.118.62:3306)  Version=5.6.28-log (oldest major version between slaves) log-bin:enabled
    Fri Jul  1 13:35:33 2016 - [info]     GTID ON
    Fri Jul  1 13:35:33 2016 - [info]     Replicating from 192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Fri Jul  1 13:35:33 2016 - [info]   192.168.118.64(192.168.118.64:3306)  Version=5.6.28-log (oldest major version between slaves) log-bin:enabled
    Fri Jul  1 13:35:33 2016 - [info]     GTID ON
    Fri Jul  1 13:35:33 2016 - [info]     Replicating from 192.168.118.63(192.168.118.63:3306)
    Fri Jul  1 13:35:33 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Fri Jul  1 13:35:33 2016 - [info] Checking slave configurations..
    Fri Jul  1 13:35:33 2016 - [info]  read_only=1 is not set on slave 192.168.118.62(192.168.118.62:3306).
    Fri Jul  1 13:35:33 2016 - [info]  read_only=1 is not set on slave 192.168.118.64(192.168.118.64:3306).
    Fri Jul  1 13:35:33 2016 - [info] Checking replication filtering settings..
    Fri Jul  1 13:35:33 2016 - [info]  Replication filtering check ok.
    View Code

     

    7. 接下来就是在GTID复制基础上的切换过程

    (1) 配置检查阶段,具体检查如下

    [info] ** Phase 1: Configuration Check Phase completed.

     检查项目如下:

     7 Query     SELECT @@global.server_id As Value     
     7 Query     SELECT VERSION() AS Value                              #如果是GTID模式,版本不得小于5.6,如果是普通模式,版本不得小于5.0.45
     7 Query     SELECT @@global.gtid_mode As Value                     #MHA0.56版本开始支持GTID,之前的版本不支持
     7  Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   #binlog必须开启
      7 Query     SHOW MASTER STATUS                    
     7 Query     SELECT @@global.datadir AS Value
     7 Query     SELECT @@global.slave_parallel_workers AS Value        #确定slave是不是多线程并行复制,这个参数的影响还没整明白,再研究下
     7 Query     SHOW SLAVE STATUS                     
     7 Query     SELECT @@global.read_only As Value                     #确定read_only的设置,如果要转为新的master,这个值要设为0
     7 Query     SELECT @@global.relay_log_purge As Value               #确定relay_log是否可自动删除,默认是可以
     7 Query     SELECT @@global.relay_log_info_repository AS Value     #确定relay_log是以file还是table格式存放的,默认是file
     7 Query     SELECT @@global.datadir AS Value                       #确定数据存放位置
     7  Query              SELECT @@global.relay_log_info_file AS Value                                                   #确定relay_log的文件名,为后面slave之间的relay_log应用做准备

     备注:(1)默认情况下,从服务器上的中继日志在SQL线程执行完后会被自动删除的。但是这些中继日志在恢复其他从服务器时候可能会被用到,因此需要禁用中继日志的自动清除和定期清除旧的中继日志

                (2)binlog-do-db和replicate-ignore-db设置必须相同。MHA在启动时候会检测过滤规则,如果过滤规则不同,MHA不启动监控和故障转移

                (3)master.info和relay.info必须是file,不能是table

    (2)彻底关闭master连接的阶段,避免master未关闭导致的脑裂

    [info] * Phase 2: Dead Master Shutdown Phase..

    具体关闭命令是:

    /etc/masterha/master_ip_failover --orig_master_host=192.168.118.3 --orig_master_ip=192.168.118.3 --orig_master_port=3306 --command=stopssh --ssh_user=root  

    关闭完成后给出报告

    [info] * Phase 2: Dead Master Shutdown Phase completed.

     

    (3)master恢复阶段:

    [info] * Phase 3: Master Recovery Phase..

     

    ›1   确认relay log最新的slave实例

    ›2   确定新的master

          如果在配置文件中设置了候选master,会直接确定预设机器实例为master;如果没有预设,会选择含有最新的relay log的那个slave

    ›3   确认新的master后,会先设置sql_log_bin=0以阻塞master日志写入使其他slave赶上复制

           会选取含有最新relay log 的slave,在该slave上设置sql_log_bin=0并在其余slaves上应用该最新relay log,最终获得这个层次的数据一致性,之后再set sql_log_bin=1使恢复日志写入。可以通过半同步复制来解决无法ssh到master所在机器所造成的事务丢失问题

          待全部数据一致后,通过show master status确定新master的日志位置并在其他slave上执行change master语句创建新的主从连接:

    Fri Jul 1 13:35:33 2016 - [info] Getting new master's binlog name and position..
    Fri Jul 1 13:35:33 2016 - [info] mysql-bin.000004:191
    Fri Jul 1 13:35:33 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.118.2', MASTER_PORT=3306, MAST
    ER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

         执行该切换的具体语句是

    /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.118.3 --orig_master_ip=192.168.118.3 --orig_master_port=3306 --new_master_host=192.168.118.2 --new_master_ip=192.168.118.2 --new_master_port=3306 --new_master_user='user' --new_master_password='password'  

         该阶段成果后给出报告:

    Fri Jul  1 13:35:33 2016 - [info] ** Finished master recovery successfully.
    Fri Jul  1 13:35:33 2016 - [info] * Phase 3: Master Recovery Phase completed.

     

    (4)slaves恢复阶段:

    [info] * Phase 4: Slaves Recovery Phase..

    先停止IO线程,等待SQL线程执行完成后,stop slave,清除原slave信息,重新change master指向新的master,start slave ,over

    34 Query     SHOW SLAVE STATUS
                       34 Query     STOP SLAVE IO_THREAD
                       34 Query     SHOW SLAVE STATUS
                       34 Query     SHOW SLAVE STATUS
                       34 Query     STOP SLAVE
                       34 Query     SHOW SLAVE STATUS
                       34 Query     RESET SLAVE
                       34 Query     CHANGE MASTER TO MASTER_HOST = '192.168.118.62' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTER_PORT = 3306
                       34 Query     START SLAVE
                       35 Connect Out       repl@192.168.118.62:3306
                       34 Query     SHOW SLAVE STATUS
    View Code

     

    (5)清除新选出的master上的slave信息

    [info] * Phase 5: New master cleanup phase..

    reset slave all;

     9 Query     STOP SLAVE
                        9 Query     SHOW SLAVE STATUS
                        9 Query     RESET SLAVE /*!50516 ALL */
                        9 Query     SHOW SLAVE STATUS
    View Code

     

     至此,整个切换过程完成,最后给出切换报告,over

    ----- Failover Report -----
    
    app1: MySQL Master failover 192.168.118.3(192.168.118.3:3306) to 192.168.118.2(192.168.118.2:3306) succeeded
    
    Master 192.168.118.3(192.168.118.3:3306) is down!
    
    Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/app1.log for details.
    
    Started automated(non-interactive) failover.
    Invalidated master IP address on 192.168.118.3(192.168.118.3:3306)
    Selected 192.168.118.2(192.168.118.2:3306) as a new master.
    192.168.118.2(192.168.118.2:3306): OK: Applying all logs succeeded.
    192.168.118.2(192.168.118.2:3306): OK: Activated master IP address.
    192.168.118.4(192.168.118.4:3306): OK: Slave started, replicating from 192.168.118.2(192.168.118.2:3306)
    192.168.118.2(192.168.118.2:3306): Resetting slave info succeeded.
    Master failover to 192.168.118.2(192.168.118.2:3306) completed successfully.
  • 相关阅读:
    Junit使用教程(四)
    《数据通信与网络》笔记--TCP中的拥塞控制
    Android Apps开发环境搭建
    quick-cocos2d-x教程10:实现血条效果。
    spring实战笔记6---springMVC的请求过程
    LINQ体验(1)——Visual Studio 2008新特性
    eclipse maven 插件的安装和配置
    [LeetCode][Java] Remove Duplicates from Sorted List II
    C++对象模型——解构语意学(第五章)
    SQL SERVER之数据查询
  • 原文地址:https://www.cnblogs.com/xiaoyanger/p/5633459.html
Copyright © 2011-2022 走看看