zoukankan      html  css  js  c++  java
  • 一步一步搭建MySQL高可用架构MHA

    目录:

    (一)认识MHA
        (1.1)MHA概述
        (1.2)MHA架构及其工作原理
        (1.3)使用MHA的优势
    (二)MHA安装--源码安装
        (2.1)主机配置
        (2.2)MySQL主从复制搭建(1主2从,启用gtid)
        (2.3)安装MHA Node(在所有节点安装)
        (2.4)安装MHA Manager(在监控节点安装)
    (三)MHA配置(MHA Manager节点)
        (3.1)SSH节点互信配置
        (3.2)MHA配置文件
        (3.3)VIP切换配置
        (3.4)检查MHA配置是否正确
    (四)MHA Manager相关脚本说明
        (4.1)masterha_manager
        (4.2)masterha_master_switch
            (4.2.1)手动故障转移
            (4.2.2)在线主从切换
        (4.3)masterha_secondary_check
    (五)MHA使用测试
        (5.1)MHA故障切换测试(主节点服务器运行正常,数据库实例down)
        (5.2)MHA故障切换测试(主节点服务器异常down)
        (5.3)MHA手动故障切换测试
        (5.4)MHA手动在线主从切换测试
    (六)故障节点恢复
    (七)其它
        (7.1)定期清理日志的脚本
        (7.2)选择哪个主机作为主节点

                                                                                                                                                                                                                                        


    (一)认识MHA
    (1.1)MHA概述
    MHA是一套Perl编写的脚本,用来维护MySQL主从复制中Master节点的高可用。官方对其描述:Master High Availability Manager and tools for MySQL (MHA) for automating master failover and fast master switch.
    主要功能有:
    1.failover:主节点发生故障,将提升从节点为主节点。
    2.switchover:主节点进行停机维护,需将主节点切换到新的服务器上。

    主节点进行故障转移的难点:以一主多从的MySQL架构为例,如果主服务器崩溃,则需要选择一个最新的从服务器,将其升级为新的主节点。然后让其他从节点服务器从新的主服务器上开始复制。实际上,这是一个较为复杂的过程。即使可以识别到最新的从节点,其它从节点可能尚未收到所有二进制日志事件,如果复制开始后连接到新的主服务器,则这些从服务器将丢失事务,这将导致一致性问题。为避免一致性问题,需要在最新的主服务器启动之前,先识别丢失的binlog事件(尚未到达所有从服务器),并将其依次应用于每一个从服务器,此操作非常复杂,并且难以手动执行。
    MHA的目标是在没有任何备用计算机的情况下,尽快使主机故障转移的恢复过程完全自动化。恢复过程包括:确定新的主服务器,识别从属服务器之间的差异中继日志事件,将必要的事件应用于新的主服务器,同步其它从属服务器并使它们从新的主服务器开始复制。根据复制延迟,MHA通常可以在10到30秒的停机事件内进行故障转移。

    (1.2)MHA架构及其工作原理
    MHA组件包含2部分:MHA Manager和MHA Node。其中:
    Manager主要用于:监视MySQL主服务器,控制主服务器故障转移。一个Manager可以管理多个MHA集群;
    Node主要用于:①保存以故障主节点的binlog;②通过比较relay log,查找从节点的日志差异,确定哪一个是最新节点,应用差异日志;③purge relay log。

     MHA工作原理为:

    1.验证主从复制状态并识别当前的主服务器。确认主从节点的状态,如果主从节点数据库运行正常,则MHA Manager会一直监控主从节点的状态,如果从节点异常,则MHA停止运行;
    2.如果MHA Manager连续3次无法连接到主服务器,则认为主服务器出现故障;
    3.此时MHA会尝试通过MySQL的从节点连接到主节点,再次确认主节点运行状态,如果从节点也无法连接到主节点,说明MySQL主节点出现故障,启动主节点故障转移;
    4.启动故障转移后,MHA Manager首先确认是否可以ssh连接到主节点,如果ssh可以访问,则所有从节点将从主节点拷贝还未执行的binlog日志到节点上执行;如果ssh也无法访问到主节点,则MHA会到从节点执行"show slave status"判断哪个从节点应用到了最新的日志,然后其他节点到该节点拷贝relay log到自己节点应用,最终实现所有从节点应用日志到同一时间;
    5.启动选主机制,选择出主节点,启动VIP漂移
    6.将其他从接节点指到新的主节点

    (1.3)使用MHA的优势

    使用MHA的主要优势如下:

    • 主节点故障转移和从节点升级快速完成。在从节点复制延迟较小的情况下,通常10~30S即可完成故障转移;
    • 主节点崩溃不会导致数据不一致。从节点之间可以识别relay log的日志事件,并应用于每一个主机,保证所有从节点数据一致;
    • 无需修改当前的MySQL配置;
    • 无性能损失。MHA默认每隔3s向MySQL主节点发送一个简单的查询,不会过多的消耗服务器性能;
    • 适用于任何存储引擎。


    (二)MHA安装--源码安装
    基础资源信息:

    IP地址 主机名称 操作系统 用途
    192.168.10.5 monitor centos 7.4  MHA Manager主机
    192.168.10.11  node1  centos 7.4  MySQL主从复制,主节点
    192.168.10.12  node2  centos 7.4  MySQL主从复制,从节点,候选主节点
    192.168.10.13 node3  centos 7.4  MySQL主从复制,从节点
    192.168.10.10  /  / 虚拟IP,用来做主节点IP漂移

      

    (2.1)主机配置
    1.配置/etc/hosts文件

    [root@monitor ~]# cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    192.168.10.5 monitor
    192.168.10.11 node1
    192.168.10.12 node2
    192.168.10.13 node3

     2.配置epel源,MHA依赖于其它包使用默认的centos yum源无法安装全部的依赖包,建议添加epel源

    # RHEL/Centos7源
    yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    
    # 使新配置的源生效
    yum clean
    yum makecache

      

    (2.2)MySQL主从复制搭建(1主2从,启用gtid)
    MySQL版本:MySQL 5.7
    MHA本身并不构建复制环境,需要自己搭建MySQL的主从复制环境,这里构建基于GTID的主从复制。
    MySQL安装链接:https://www.cnblogs.com/lijiaman/p/10743102.html
    MySQL基于GDIT的主从搭建:https://www.cnblogs.com/lijiaman/p/12315379.html

    MySQL参数配置如下:

    [mysql]
    # 设置mysql客户端默认字符集
    default-character-set=utf8
    
    [mysqld]
    ##### 1.基础参数 #######
    # 是禁用dns解析,mysql的授权表中就不能使用主机名了,只能使用IP 
    skip-name-resolve
    
    # 设置3306端口
    port = 3306 
    
    # 设置mysql的安装目录
    basedir=/mysql
    
    # 设置mysql数据库的数据的存放目录
    datadir=/mysql/data
    
    # 允许最大连接数
    max_connections=1000
    
    # 服务端使用的字符集默认为8比特编码的latin1字符集
    character-set-server=utf8
    
    # 创建新表时将使用的默认存储引擎
    default-storage-engine=INNODB 
    
    # 表名不区分大小写
    lower_case_table_names=1
    
    max_allowed_packet=16M
    
    # error log 使用系统时区
    log_timestamps = 1
    
    # 使用root用户启动mysql数据库
    user = root
    
    ##### 2.复制相关参数配置  #####
    server_id = 1    # 要求所有节点不相同
    binlog_format=ROW
    log_bin=/mysql/binlog/node1-bin  
    max_binlog_size=1G
    sync_binlog=1
    innodb_flush_log_at_trx_commit=1
    
    ##### 3.启用GTID相关参数  #####
    gtid_mode=ON
    enforce-gtid-consistency=ON
    
    ##### 4.MHA要求参数  #####
    # 0表示禁止 SQL 线程在执行完一个 relay log 后自动将其删除,对于MHA场景下,对于某些滞后从库的恢复依赖于其他从库的relay log,因此采取禁用自动删除功能 
    relay_log_purge = 0 
    
    # 从节点需设置为只读
    # read_only = 1

     备注:3个节点不同的参数为:log_bin、server_id、read_only


    (2.3)安装MHA Node(在所有节点安装)
    MHA可以使用源码编译安装,也可以使用rpm包直接,这里使用源码编译安装。
    STEP1:下载MHA Node源码安装包
    进入github网站:https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58,下载最新版本的MHA Node。

    STEP2:安装MHA Node的依赖包,需要在所有节点都进行安装

    # 官方文档中提示,mha node安装只依赖于该包
    yum install -y perl-DBD-MySQL
    
    # 在实际安装过程中,发现还需要该包
    yum install -y perl-CPAN

    STEP3:安装mha node

    # 解压安装包
    $ tar -xzvf mha4mysql-node-0.58.tar.gz
    
    # 安装MHA Node
    $ perl Makefile.PL
    $ make
    $ make install

    Note:在实际安装过程中,报错
    [root@node1 mha4mysql-node-0.58]# perl Makefile.PL
    Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 5.
    BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 5.
    Compilation failed in require at inc/Module/Install.pm line 307.
    Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
    BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
    Compilation failed in require at inc/Module/Install.pm line 307.
    Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 322.

    解决方案:

    [root@node1 mha4mysql-node-0.58]# yum install -y perl-CPAN

     STEP4:安装确认

    安装完成之后,会在所有节点的/usr/local/bin/目录下生成node相关的脚本

    [root@node3 bin]# cd /usr/local/bin/
    [root@node3 bin]# ll
    -r-xr-xr-x. 1 root root 17639 Mar 29 15:04 apply_diff_relay_logs
    -r-xr-xr-x. 1 root root 4807 Mar 29 15:04 filter_mysqlbinlog
    -r-xr-xr-x. 1 root root 8337 Mar 29 15:04 purge_relay_logs
    -r-xr-xr-x. 1 root root 7525 Mar 29 15:04 save_binary_logs

    MHA Node整个安装过程操作日志:

      1 [root@monitor ~]# tar -xzvf mha4mysql-node-0.58.tar.gz
      2 mha4mysql-node-0.58/
      3 mha4mysql-node-0.58/inc/
      4 mha4mysql-node-0.58/inc/Module/
      5 mha4mysql-node-0.58/inc/Module/Install/
      6 mha4mysql-node-0.58/inc/Module/Install/Fetch.pm
      7 mha4mysql-node-0.58/inc/Module/Install/Metadata.pm
      8 mha4mysql-node-0.58/inc/Module/Install/AutoInstall.pm
      9 mha4mysql-node-0.58/inc/Module/Install/Win32.pm
     10 mha4mysql-node-0.58/inc/Module/Install/WriteAll.pm
     11 mha4mysql-node-0.58/inc/Module/Install/Can.pm
     12 mha4mysql-node-0.58/inc/Module/Install/Include.pm
     13 mha4mysql-node-0.58/inc/Module/Install/Makefile.pm
     14 mha4mysql-node-0.58/inc/Module/Install/Scripts.pm
     15 mha4mysql-node-0.58/inc/Module/Install/Base.pm
     16 mha4mysql-node-0.58/inc/Module/AutoInstall.pm
     17 mha4mysql-node-0.58/inc/Module/Install.pm
     18 mha4mysql-node-0.58/debian/
     19 mha4mysql-node-0.58/debian/compat
     20 mha4mysql-node-0.58/debian/changelog
     21 mha4mysql-node-0.58/debian/rules
     22 mha4mysql-node-0.58/debian/copyright
     23 mha4mysql-node-0.58/debian/control
     24 mha4mysql-node-0.58/bin/
     25 mha4mysql-node-0.58/bin/purge_relay_logs
     26 mha4mysql-node-0.58/bin/filter_mysqlbinlog
     27 mha4mysql-node-0.58/bin/save_binary_logs
     28 mha4mysql-node-0.58/bin/apply_diff_relay_logs
     29 mha4mysql-node-0.58/AUTHORS
     30 mha4mysql-node-0.58/MANIFEST
     31 mha4mysql-node-0.58/t/
     32 mha4mysql-node-0.58/t/perlcriticrc
     33 mha4mysql-node-0.58/t/99-perlcritic.t
     34 mha4mysql-node-0.58/README
     35 mha4mysql-node-0.58/COPYING
     36 mha4mysql-node-0.58/META.yml
     37 mha4mysql-node-0.58/lib/
     38 mha4mysql-node-0.58/lib/MHA/
     39 mha4mysql-node-0.58/lib/MHA/BinlogPosFinderElp.pm
     40 mha4mysql-node-0.58/lib/MHA/BinlogPosFindManager.pm
     41 mha4mysql-node-0.58/lib/MHA/BinlogPosFinderXid.pm
     42 mha4mysql-node-0.58/lib/MHA/BinlogPosFinder.pm
     43 mha4mysql-node-0.58/lib/MHA/BinlogHeaderParser.pm
     44 mha4mysql-node-0.58/lib/MHA/NodeConst.pm
     45 mha4mysql-node-0.58/lib/MHA/NodeUtil.pm
     46 mha4mysql-node-0.58/lib/MHA/SlaveUtil.pm
     47 mha4mysql-node-0.58/lib/MHA/BinlogManager.pm
     48 mha4mysql-node-0.58/Makefile.PL
     49 mha4mysql-node-0.58/rpm/
     50 mha4mysql-node-0.58/rpm/masterha_node.spec
     51 [root@monitor ~]#
     52 [root@monitor ~]#
     53 [root@monitor ~]# ll
     54 total 697468
     55 -rw-------. 1 root root 1325 Oct 24 2019 anaconda-ks.cfg
     56 -rw-r--r--. 1 root root 119801 Mar 29 15:14 mha4mysql-manager-0.58.tar.gz
     57 drwxr-xr-x. 8 lijiaman lijiaman 168 Mar 23 2018 mha4mysql-node-0.58
     58 -rw-r--r--. 1 root root 56220 Mar 29 15:14 mha4mysql-node-0.58.tar.gz
     59 -rw-r--r--. 1 root root 714022690 Mar 17 16:28 mysql-5.7.27-el7-x86_64.tar.gz
     60 [root@monitor ~]# cd mha4mysql-
     61 -bash: cd: mha4mysql-: No such file or directory
     62 [root@monitor ~]# cd mha4mysql-node-0.58
     63 [root@monitor mha4mysql-node-0.58]# ls
     64 AUTHORS bin COPYING debian inc lib Makefile.PL MANIFEST META.yml README rpm t
     65 [root@monitor mha4mysql-node-0.58]# perl Makefile.PL
     66 Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 5.
     67 BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 5.
     68 Compilation failed in require at inc/Module/Install.pm line 307.
     69 Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
     70 BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
     71 Compilation failed in require at inc/Module/Install.pm line 307.
     72 Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 322.
     73 [root@monitor mha4mysql-node-0.58]#
     74 [root@monitor mha4mysql-node-0.58]#
     75 [root@monitor mha4mysql-node-0.58]# yum install -y perl-CPAN
     76 Loaded plugins: fastestmirror
     77 Loading mirror speeds from cached hostfile
     78 * base: mirrors.cqu.edu.cn
     79 * epel: mirrors.bfsu.edu.cn
     80 * extras: mirrors.cqu.edu.cn
     81 ... 略 ...
     82 Installed:
     83 perl-CPAN.noarch 0:1.9800-299.el7_9
     84 
     85 Dependency Installed:
     86 gdbm-devel.x86_64 0:1.10-8.el7 libdb-devel.x86_64 0:5.3.21-25.el7 perl-Digest.noarch 0:1.17-245.el7
     87 perl-Digest-SHA.x86_64 1:5.85-4.el7 perl-ExtUtils-Install.noarch 0:1.58-299.el7_9 perl-ExtUtils-MakeMaker.noarch 0:6.68-3.el7
     88 perl-ExtUtils-Manifest.noarch 0:1.61-244.el7 perl-ExtUtils-ParseXS.noarch 1:3.18-3.el7 perl-Test-Harness.noarch 0:3.28-3.el7
     89 perl-devel.x86_64 4:5.16.3-299.el7_9 perl-local-lib.noarch 0:1.008010-4.el7 pyparsing.noarch 0:1.5.6-9.el7
     90 systemtap-sdt-devel.x86_64 0:4.0-13.el7
     91 
     92 Dependency Updated:
     93 libdb.x86_64 0:5.3.21-25.el7 libdb-utils.x86_64 0:5.3.21-25.el7
     94 
     95 Complete!
     96 [root@monitor mha4mysql-node-0.58]# perl Makefile.PL
     97 *** Module::AutoInstall version 1.06
     98 *** Checking for Perl dependencies...
     99 [Core Features]
    100 - DBI ...loaded. (1.627)
    101 - DBD::mysql ...loaded. (4.023)
    102 *** Module::AutoInstall configuration finished.
    103 Checking if your kit is complete...
    104 Looks good
    105 Writing Makefile for mha4mysql::node
    106 [root@monitor mha4mysql-node-0.58]# make
    107 cp lib/MHA/BinlogManager.pm blib/lib/MHA/BinlogManager.pm
    108 cp lib/MHA/BinlogPosFindManager.pm blib/lib/MHA/BinlogPosFindManager.pm
    109 cp lib/MHA/BinlogPosFinderXid.pm blib/lib/MHA/BinlogPosFinderXid.pm
    110 cp lib/MHA/BinlogHeaderParser.pm blib/lib/MHA/BinlogHeaderParser.pm
    111 cp lib/MHA/BinlogPosFinder.pm blib/lib/MHA/BinlogPosFinder.pm
    112 cp lib/MHA/BinlogPosFinderElp.pm blib/lib/MHA/BinlogPosFinderElp.pm
    113 cp lib/MHA/NodeUtil.pm blib/lib/MHA/NodeUtil.pm
    114 cp lib/MHA/SlaveUtil.pm blib/lib/MHA/SlaveUtil.pm
    115 cp lib/MHA/NodeConst.pm blib/lib/MHA/NodeConst.pm
    116 cp bin/filter_mysqlbinlog blib/script/filter_mysqlbinlog
    117 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/filter_mysqlbinlog
    118 cp bin/apply_diff_relay_logs blib/script/apply_diff_relay_logs
    119 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/apply_diff_relay_logs
    120 cp bin/purge_relay_logs blib/script/purge_relay_logs
    121 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/purge_relay_logs
    122 cp bin/save_binary_logs blib/script/save_binary_logs
    123 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/save_binary_logs
    124 Manifying blib/man1/filter_mysqlbinlog.1
    125 Manifying blib/man1/apply_diff_relay_logs.1
    126 Manifying blib/man1/purge_relay_logs.1
    127 Manifying blib/man1/save_binary_logs.1
    128 [root@monitor mha4mysql-node-0.58]# make install
    129 Installing /usr/local/share/perl5/MHA/BinlogManager.pm
    130 Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm
    131 Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm
    132 Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm
    133 Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm
    134 Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm
    135 Installing /usr/local/share/perl5/MHA/NodeUtil.pm
    136 Installing /usr/local/share/perl5/MHA/SlaveUtil.pm
    137 Installing /usr/local/share/perl5/MHA/NodeConst.pm
    138 Installing /usr/local/share/man/man1/filter_mysqlbinlog.1
    139 Installing /usr/local/share/man/man1/apply_diff_relay_logs.1
    140 Installing /usr/local/share/man/man1/purge_relay_logs.1
    141 Installing /usr/local/share/man/man1/save_binary_logs.1
    142 Installing /usr/local/bin/filter_mysqlbinlog
    143 Installing /usr/local/bin/apply_diff_relay_logs
    144 Installing /usr/local/bin/purge_relay_logs
    145 Installing /usr/local/bin/save_binary_logs
    146 Appending installation info to /usr/lib64/perl5/perllocal.pod
    147 [root@monitor mha4mysql-node-0.58]#
    View Code

    (2.4)安装MHA Manager(在监控节点安装)
    STEP1:下载MHA Manager源码安装包
    进入github网站:https://github.com/yoshinorim/mha4mysql-manager/releases/tag/v0.58,下载最新版本的MHA Manager。

    STEP2:安装MHA Manager的依赖包
    需要特别注意,MHA Manager也依赖于MHA Node,因此需要在Manager节点也安装node。

    # 官方文档中提示,mha node安装依赖包
    yum install -y perl-DBD-MySQL
    yum install -y perl-Config-Tiny
    yum install -y perl-Log-Dispatch
    yum install -y perl-Parallel-ForkManager

    STEP3:安装mha manager

    # 解压安装包
    $ tar -zxf mha4mysql-manager-0.58.tar.gz
    
    # 安装MHA Manager
    $ perl Makefile.PL
    $ make
    $ make install

    STEP4:安装确认
    安装完成之后,会在MHA Manager节点的/usr/local/bin/目录下生成manager相关的脚本

    [root@monitor mha4mysql-manager-0.58]# cd /usr/local/bin/
    [root@monitor bin]# ll
    -r-xr-xr-x. 1 root root 17639 Mar 29 15:17 apply_diff_relay_logs
    -r-xr-xr-x. 1 root root 4807 Mar 29 15:17 filter_mysqlbinlog
    -r-xr-xr-x. 1 root root 1995 Mar 29 15:20 masterha_check_repl
    -r-xr-xr-x. 1 root root 1779 Mar 29 15:20 masterha_check_ssh
    -r-xr-xr-x. 1 root root 1865 Mar 29 15:20 masterha_check_status
    -r-xr-xr-x. 1 root root 3201 Mar 29 15:20 masterha_conf_host
    -r-xr-xr-x. 1 root root 2517 Mar 29 15:20 masterha_manager
    -r-xr-xr-x. 1 root root 2165 Mar 29 15:20 masterha_master_monitor
    -r-xr-xr-x. 1 root root 2373 Mar 29 15:20 masterha_master_switch
    -r-xr-xr-x. 1 root root 5172 Mar 29 15:20 masterha_secondary_check
    -r-xr-xr-x. 1 root root 1739 Mar 29 15:20 masterha_stop
    -r-xr-xr-x. 1 root root 8337 Mar 29 15:17 purge_relay_logs
    -r-xr-xr-x. 1 root root 7525 Mar 29 15:17 save_binary_logs

    MHA Manager整个安装过程操作日志:

      1 [root@monitor ~]# tar -xzvf mha4mysql-manager-0.58.tar.gz
      2 mha4mysql-manager-0.58/
      3 mha4mysql-manager-0.58/inc/
      4 mha4mysql-manager-0.58/inc/Module/
      5 mha4mysql-manager-0.58/inc/Module/Install/
      6 mha4mysql-manager-0.58/inc/Module/Install/Fetch.pm
      7 mha4mysql-manager-0.58/inc/Module/Install/Metadata.pm
      8 mha4mysql-manager-0.58/inc/Module/Install/AutoInstall.pm
      9 mha4mysql-manager-0.58/inc/Module/Install/Win32.pm
     10 mha4mysql-manager-0.58/inc/Module/Install/WriteAll.pm
     11 mha4mysql-manager-0.58/inc/Module/Install/Can.pm
     12 mha4mysql-manager-0.58/inc/Module/Install/Include.pm
     13 mha4mysql-manager-0.58/inc/Module/Install/Makefile.pm
     14 mha4mysql-manager-0.58/inc/Module/Install/Scripts.pm
     15 mha4mysql-manager-0.58/inc/Module/Install/Base.pm
     16 mha4mysql-manager-0.58/inc/Module/AutoInstall.pm
     17 mha4mysql-manager-0.58/inc/Module/Install.pm
     18 mha4mysql-manager-0.58/debian/
     19 mha4mysql-manager-0.58/debian/compat
     20 mha4mysql-manager-0.58/debian/changelog
     21 mha4mysql-manager-0.58/debian/docs
     22 mha4mysql-manager-0.58/debian/rules
     23 mha4mysql-manager-0.58/debian/copyright
     24 mha4mysql-manager-0.58/debian/control
     25 mha4mysql-manager-0.58/bin/
     26 mha4mysql-manager-0.58/bin/masterha_check_status
     27 mha4mysql-manager-0.58/bin/masterha_check_ssh
     28 mha4mysql-manager-0.58/bin/masterha_master_monitor
     29 mha4mysql-manager-0.58/bin/masterha_manager
     30 mha4mysql-manager-0.58/bin/masterha_master_switch
     31 mha4mysql-manager-0.58/bin/masterha_stop
     32 mha4mysql-manager-0.58/bin/masterha_secondary_check
     33 mha4mysql-manager-0.58/bin/masterha_check_repl
     34 mha4mysql-manager-0.58/bin/masterha_conf_host
     35 mha4mysql-manager-0.58/AUTHORS
     36 mha4mysql-manager-0.58/MANIFEST
     37 mha4mysql-manager-0.58/tests/
     38 mha4mysql-manager-0.58/tests/intro.txt
     39 mha4mysql-manager-0.58/tests/t/
     40 mha4mysql-manager-0.58/tests/t/t_online_3tier.sh
     41 mha4mysql-manager-0.58/tests/t/t_mm_ro_fail.sh
     42 mha4mysql-manager-0.58/tests/t/t_large_data_slow.sh
     43 mha4mysql-manager-0.58/tests/t/t_recover_master_fail.sh
     44 mha4mysql-manager-0.58/tests/t/t_4tier.sh
     45 mha4mysql-manager-0.58/tests/t/t_online_slave_sql_stop.sh
     46 mha4mysql-manager-0.58/tests/t/t_online_3tier_slave.sh
     47 mha4mysql-manager-0.58/tests/t/grant_nopass.sql
     48 mha4mysql-manager-0.58/tests/t/t_manual.sh
     49 mha4mysql-manager-0.58/tests/t/t_mm_noslaves.sh
     50 mha4mysql-manager-0.58/tests/t/t_online_mm_3tier.sh
     51 mha4mysql-manager-0.58/tests/t/t_advisory_select.sh
     52 mha4mysql-manager-0.58/tests/t/mha_test_mm_online.cnf.tmpl
     53 mha4mysql-manager-0.58/tests/t/t_ignore_nostart.sh
     54 mha4mysql-manager-0.58/tests/t/t_dual_master_error.sh
     55 mha4mysql-manager-0.58/tests/t/t_large_data_sql_fail.sh
     56 mha4mysql-manager-0.58/tests/t/t_mm_subm_dead.sh
     57 mha4mysql-manager-0.58/tests/t/t_online_3tier_slave_keep.sh
     58 mha4mysql-manager-0.58/tests/t/mha_test_connect.cnf.tmpl
     59 mha4mysql-manager-0.58/tests/t/t_latest_recovery3.sh
     60 mha4mysql-manager-0.58/tests/t/t_save_master_log.sh
     61 mha4mysql-manager-0.58/tests/t/t_needsync_1_ssh.sh
     62 mha4mysql-manager-0.58/tests/t/t_mm.sh
     63 mha4mysql-manager-0.58/tests/t/t_needsync_flush_slave.sh
     64 mha4mysql-manager-0.58/tests/t/t_online_slave_pass.sh
     65 mha4mysql-manager-0.58/tests/t/t_ignore_start.sh
     66 mha4mysql-manager-0.58/tests/t/my.cnf
     67 mha4mysql-manager-0.58/tests/t/t_large_data_sql_stop.sh
     68 mha4mysql-manager-0.58/tests/t/t_filter_incorrect.sh
     69 mha4mysql-manager-0.58/tests/t/start_m.sh
     70 mha4mysql-manager-0.58/tests/t/t_large_data_bulk.sh
     71 mha4mysql-manager-0.58/tests/t/master_ip_failover
     72 mha4mysql-manager-0.58/tests/t/t_large_data.sh
     73 mha4mysql-manager-0.58/tests/t/t_new_master_heavy_wait.sh
     74 mha4mysql-manager-0.58/tests/t/t_data_io_error.sh
     75 mha4mysql-manager-0.58/tests/t/t_large_data_tran.sh
     76 mha4mysql-manager-0.58/tests/t/t_needsync_1_pass.sh
     77 mha4mysql-manager-0.58/tests/t/t_save_master_log_pass.sh
     78 mha4mysql-manager-0.58/tests/t/my-row.cnf
     79 mha4mysql-manager-0.58/tests/t/t_online_slave.sh
     80 mha4mysql-manager-0.58/tests/t/t_latest_recovery2.sh
     81 mha4mysql-manager-0.58/tests/t/t_online_filter.sh
     82 mha4mysql-manager-0.58/tests/t/stop_s1.sh
     83 mha4mysql-manager-0.58/tests/t/init.sh
     84 mha4mysql-manager-0.58/tests/t/t_ignore_recovery1.sh
     85 mha4mysql-manager-0.58/tests/t/t_online_busy.sh
     86 mha4mysql-manager-0.58/tests/t/t_keep_relay_log_purge.sh
     87 mha4mysql-manager-0.58/tests/t/t_needsync_1.sh
     88 mha4mysql-manager-0.58/tests/t/t_apply_many_logs.sh
     89 mha4mysql-manager-0.58/tests/t/t_apply_many_logs3.sh
     90 mha4mysql-manager-0.58/tests/t/t_slave_stop.sh
     91 mha4mysql-manager-0.58/tests/t/t_slave_incorrect.sh
     92 mha4mysql-manager-0.58/tests/t/bulk_tran_insert.pl
     93 mha4mysql-manager-0.58/tests/t/mha_test_multi_online.cnf.tmpl
     94 mha4mysql-manager-0.58/tests/t/stop_m.sh
     95 mha4mysql-manager-0.58/tests/t/start_s4.sh
     96 mha4mysql-manager-0.58/tests/t/t_4tier_subm_dead.sh
     97 mha4mysql-manager-0.58/tests/t/mha_test_online.cnf.tmpl
     98 mha4mysql-manager-0.58/tests/t/t_slave_sql_start3.sh
     99 mha4mysql-manager-0.58/tests/t/run_bg.sh
    100 mha4mysql-manager-0.58/tests/t/t_needsync_2_pass.sh
    101 mha4mysql-manager-0.58/tests/t/t_apply_many_logs2.sh
    102 mha4mysql-manager-0.58/tests/t/mha_test_mm.cnf.tmpl
    103 mha4mysql-manager-0.58/tests/t/stop_s4.sh
    104 mha4mysql-manager-0.58/tests/t/t_needsync_1_nocm.sh
    105 mha4mysql-manager-0.58/tests/t/t_ignore_recovery4.sh
    106 mha4mysql-manager-0.58/tests/t/t_advisory_connect.sh
    107 mha4mysql-manager-0.58/tests/t/t_normal_crash.sh
    108 mha4mysql-manager-0.58/tests/t/t_mm_normal_skip_reset.sh
    109 mha4mysql-manager-0.58/tests/t/t_slave_sql_start2.sh
    110 mha4mysql-manager-0.58/tests/t/t_slave_sql_start.sh
    111 mha4mysql-manager-0.58/tests/t/t_normal_crash_nocm.sh
    112 mha4mysql-manager-0.58/tests/t/t_mm_3tier_subm_dead.sh
    113 mha4mysql-manager-0.58/tests/t/mha_test_err1.cnf.tmpl
    114 mha4mysql-manager-0.58/tests/t/mha_test_reset.cnf.tmpl
    115 mha4mysql-manager-0.58/tests/t/t_needsync_fail.sh
    116 mha4mysql-manager-0.58/tests/t/t_needsync_1_nopass.sh
    117 mha4mysql-manager-0.58/tests/t/start_s1.sh
    118 mha4mysql-manager-0.58/tests/t/t_needsync_flush.sh
    119 mha4mysql-manager-0.58/tests/t/run.sh
    120 mha4mysql-manager-0.58/tests/t/master_ip_failover_blank
    121 mha4mysql-manager-0.58/tests/t/mha_test.cnf.tmpl
    122 mha4mysql-manager-0.58/tests/t/t_save_master_log_ssh.sh
    123 mha4mysql-manager-0.58/tests/t/kill_m.sh
    124 mha4mysql-manager-0.58/tests/t/t_online_slave_fail.sh
    125 mha4mysql-manager-0.58/tests/t/t_binary.sh
    126 mha4mysql-manager-0.58/tests/t/t_needsync_flush3.sh
    127 mha4mysql-manager-0.58/tests/t/t_recover_slave_fail.sh
    128 mha4mysql-manager-0.58/tests/t/mha_test_ignore.cnf.tmpl
    129 mha4mysql-manager-0.58/tests/t/t_ignore_recovery3.sh
    130 mha4mysql-manager-0.58/tests/t/force_start_m.sh
    131 mha4mysql-manager-0.58/tests/t/t_recover_slave_ok.sh
    132 mha4mysql-manager-0.58/tests/t/t_mm_normal.sh
    133 mha4mysql-manager-0.58/tests/t/start_s2.sh
    134 mha4mysql-manager-0.58/tests/t/t_online_mm_3tier_slave.sh
    135 mha4mysql-manager-0.58/tests/t/insert.pl
    136 mha4mysql-manager-0.58/tests/t/grant.sql
    137 mha4mysql-manager-0.58/tests/t/stop_s2.sh
    138 mha4mysql-manager-0.58/tests/t/waitpid
    139 mha4mysql-manager-0.58/tests/t/t_mm_subm_dead_many.sh
    140 mha4mysql-manager-0.58/tests/t/t_ignore_recovery2.sh
    141 mha4mysql-manager-0.58/tests/t/tran_insert.pl
    142 mha4mysql-manager-0.58/tests/t/insert_binary.pl
    143 mha4mysql-manager-0.58/tests/t/t_online_mm.sh
    144 mha4mysql-manager-0.58/tests/t/mha_test_nopass.cnf.tmpl
    145 mha4mysql-manager-0.58/tests/t/t_needsync_2.sh
    146 mha4mysql-manager-0.58/tests/t/mha_test_online_pass.cnf.tmpl
    147 mha4mysql-manager-0.58/tests/t/t_needsync_2_ssh.sh
    148 mha4mysql-manager-0.58/tests/t/mha_test_multi.cnf.tmpl
    149 mha4mysql-manager-0.58/tests/t/run_tests
    150 mha4mysql-manager-0.58/tests/t/mha_test_latest.cnf.tmpl
    151 mha4mysql-manager-0.58/tests/t/t_online_mm_skip_reset.sh
    152 mha4mysql-manager-0.58/tests/t/t_online_normal.sh
    153 mha4mysql-manager-0.58/tests/t/env.sh
    154 mha4mysql-manager-0.58/tests/t/t_needsync_flush2.sh
    155 mha4mysql-manager-0.58/tests/t/t_conf.sh
    156 mha4mysql-manager-0.58/tests/t/t_mm_subm_dead_noslave.sh
    157 mha4mysql-manager-0.58/tests/t/mha_test_ssh.cnf.tmpl
    158 mha4mysql-manager-0.58/tests/t/mha_test_err2.cnf.tmpl
    159 mha4mysql-manager-0.58/tests/t/t_mm_3tier.sh
    160 mha4mysql-manager-0.58/tests/t/t_no_relay_log.sh
    161 mha4mysql-manager-0.58/tests/t/change_relay_log_info.sh
    162 mha4mysql-manager-0.58/tests/t/t_new_master_heavy.sh
    163 mha4mysql-manager-0.58/tests/t/mha_test_pass.cnf.tmpl
    164 mha4mysql-manager-0.58/tests/t/t_latest_recovery1.sh
    165 mha4mysql-manager-0.58/tests/t/t_recover_slave_fail2.sh
    166 mha4mysql-manager-0.58/tests/t/t_large_data_bulk_slow.sh
    167 mha4mysql-manager-0.58/tests/t/check
    168 mha4mysql-manager-0.58/tests/t/t_large_data_slow2.sh
    169 mha4mysql-manager-0.58/tests/run_suites.sh
    170 mha4mysql-manager-0.58/t/
    171 mha4mysql-manager-0.58/t/perlcriticrc
    172 mha4mysql-manager-0.58/t/99-perlcritic.t
    173 mha4mysql-manager-0.58/README
    174 mha4mysql-manager-0.58/samples/
    175 mha4mysql-manager-0.58/samples/scripts/
    176 mha4mysql-manager-0.58/samples/scripts/master_ip_failover
    177 mha4mysql-manager-0.58/samples/scripts/power_manager
    178 mha4mysql-manager-0.58/samples/scripts/send_report
    179 mha4mysql-manager-0.58/samples/scripts/master_ip_online_change
    180 mha4mysql-manager-0.58/samples/conf/
    181 mha4mysql-manager-0.58/samples/conf/app1.cnf
    182 mha4mysql-manager-0.58/samples/conf/masterha_default.cnf
    183 mha4mysql-manager-0.58/COPYING
    184 mha4mysql-manager-0.58/META.yml
    185 mha4mysql-manager-0.58/lib/
    186 mha4mysql-manager-0.58/lib/MHA/
    187 mha4mysql-manager-0.58/lib/MHA/ManagerAdmin.pm
    188 mha4mysql-manager-0.58/lib/MHA/Server.pm
    189 mha4mysql-manager-0.58/lib/MHA/MasterRotate.pm
    190 mha4mysql-manager-0.58/lib/MHA/Config.pm
    191 mha4mysql-manager-0.58/lib/MHA/ManagerAdminWrapper.pm
    192 mha4mysql-manager-0.58/lib/MHA/ServerManager.pm
    193 mha4mysql-manager-0.58/lib/MHA/HealthCheck.pm
    194 mha4mysql-manager-0.58/lib/MHA/ManagerConst.pm
    195 mha4mysql-manager-0.58/lib/MHA/DBHelper.pm
    196 mha4mysql-manager-0.58/lib/MHA/SSHCheck.pm
    197 mha4mysql-manager-0.58/lib/MHA/FileStatus.pm
    198 mha4mysql-manager-0.58/lib/MHA/ManagerUtil.pm
    199 mha4mysql-manager-0.58/lib/MHA/MasterFailover.pm
    200 mha4mysql-manager-0.58/lib/MHA/MasterMonitor.pm
    201 mha4mysql-manager-0.58/Makefile.PL
    202 mha4mysql-manager-0.58/rpm/
    203 mha4mysql-manager-0.58/rpm/masterha_manager.spec
    204 [root@monitor ~]#
    205 [root@monitor ~]#
    206 [root@monitor ~]# cd mha4mysql-manager-0.58
    207 [root@monitor mha4mysql-manager-0.58]# ls
    208 AUTHORS bin COPYING debian inc lib Makefile.PL MANIFEST META.yml README rpm samples t tests
    209 [root@monitor mha4mysql-manager-0.58]# perl Makefile.PL
    210 *** Module::AutoInstall version 1.06
    211 *** Checking for Perl dependencies...
    212 [Core Features]
    213 - DBI ...loaded. (1.627)
    214 - DBD::mysql ...loaded. (4.023)
    215 - Time::HiRes ...loaded. (1.9725)
    216 - Config::Tiny ...loaded. (2.14)
    217 - Log::Dispatch ...loaded. (2.41)
    218 - Parallel::ForkManager ...loaded. (1.18)
    219 - MHA::NodeConst ...loaded. (0.58)
    220 *** Module::AutoInstall configuration finished.
    221 Checking if your kit is complete...
    222 Looks good
    223 Writing Makefile for mha4mysql::manager
    224 [root@monitor mha4mysql-manager-0.58]# make
    225 cp lib/MHA/ManagerUtil.pm blib/lib/MHA/ManagerUtil.pm
    226 cp lib/MHA/Config.pm blib/lib/MHA/Config.pm
    227 cp lib/MHA/HealthCheck.pm blib/lib/MHA/HealthCheck.pm
    228 cp lib/MHA/ServerManager.pm blib/lib/MHA/ServerManager.pm
    229 cp lib/MHA/ManagerConst.pm blib/lib/MHA/ManagerConst.pm
    230 cp lib/MHA/FileStatus.pm blib/lib/MHA/FileStatus.pm
    231 cp lib/MHA/ManagerAdmin.pm blib/lib/MHA/ManagerAdmin.pm
    232 cp lib/MHA/ManagerAdminWrapper.pm blib/lib/MHA/ManagerAdminWrapper.pm
    233 cp lib/MHA/MasterFailover.pm blib/lib/MHA/MasterFailover.pm
    234 cp lib/MHA/MasterRotate.pm blib/lib/MHA/MasterRotate.pm
    235 cp lib/MHA/MasterMonitor.pm blib/lib/MHA/MasterMonitor.pm
    236 cp lib/MHA/Server.pm blib/lib/MHA/Server.pm
    237 cp lib/MHA/SSHCheck.pm blib/lib/MHA/SSHCheck.pm
    238 cp lib/MHA/DBHelper.pm blib/lib/MHA/DBHelper.pm
    239 cp bin/masterha_stop blib/script/masterha_stop
    240 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop
    241 cp bin/masterha_conf_host blib/script/masterha_conf_host
    242 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host
    243 cp bin/masterha_check_repl blib/script/masterha_check_repl
    244 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl
    245 cp bin/masterha_check_status blib/script/masterha_check_status
    246 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status
    247 cp bin/masterha_master_monitor blib/script/masterha_master_monitor
    248 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor
    249 cp bin/masterha_check_ssh blib/script/masterha_check_ssh
    250 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh
    251 cp bin/masterha_master_switch blib/script/masterha_master_switch
    252 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch
    253 cp bin/masterha_secondary_check blib/script/masterha_secondary_check
    254 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check
    255 cp bin/masterha_manager blib/script/masterha_manager
    256 /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager
    257 Manifying blib/man1/masterha_stop.1
    258 Manifying blib/man1/masterha_conf_host.1
    259 Manifying blib/man1/masterha_check_repl.1
    260 Manifying blib/man1/masterha_check_status.1
    261 Manifying blib/man1/masterha_master_monitor.1
    262 Manifying blib/man1/masterha_check_ssh.1
    263 Manifying blib/man1/masterha_master_switch.1
    264 Manifying blib/man1/masterha_secondary_check.1
    265 Manifying blib/man1/masterha_manager.1
    266 [root@monitor mha4mysql-manager-0.58]# make install
    267 Installing /usr/local/share/perl5/MHA/ManagerUtil.pm
    268 Installing /usr/local/share/perl5/MHA/Config.pm
    269 Installing /usr/local/share/perl5/MHA/HealthCheck.pm
    270 Installing /usr/local/share/perl5/MHA/ServerManager.pm
    271 Installing /usr/local/share/perl5/MHA/ManagerConst.pm
    272 Installing /usr/local/share/perl5/MHA/FileStatus.pm
    273 Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm
    274 Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm
    275 Installing /usr/local/share/perl5/MHA/MasterFailover.pm
    276 Installing /usr/local/share/perl5/MHA/MasterRotate.pm
    277 Installing /usr/local/share/perl5/MHA/MasterMonitor.pm
    278 Installing /usr/local/share/perl5/MHA/Server.pm
    279 Installing /usr/local/share/perl5/MHA/SSHCheck.pm
    280 Installing /usr/local/share/perl5/MHA/DBHelper.pm
    281 Installing /usr/local/share/man/man1/masterha_stop.1
    282 Installing /usr/local/share/man/man1/masterha_conf_host.1
    283 Installing /usr/local/share/man/man1/masterha_check_repl.1
    284 Installing /usr/local/share/man/man1/masterha_check_status.1
    285 Installing /usr/local/share/man/man1/masterha_master_monitor.1
    286 Installing /usr/local/share/man/man1/masterha_check_ssh.1
    287 Installing /usr/local/share/man/man1/masterha_master_switch.1
    288 Installing /usr/local/share/man/man1/masterha_secondary_check.1
    289 Installing /usr/local/share/man/man1/masterha_manager.1
    290 Installing /usr/local/bin/masterha_stop
    291 Installing /usr/local/bin/masterha_conf_host
    292 Installing /usr/local/bin/masterha_check_repl
    293 Installing /usr/local/bin/masterha_check_status
    294 Installing /usr/local/bin/masterha_master_monitor
    295 Installing /usr/local/bin/masterha_check_ssh
    296 Installing /usr/local/bin/masterha_master_switch
    297 Installing /usr/local/bin/masterha_secondary_check
    298 Installing /usr/local/bin/masterha_manager
    299 Appending installation info to /usr/lib64/perl5/perllocal.pod
    300 [root@monitor mha4mysql-manager-0.58]#
    301  
    View Code

    (三)MHA配置(MHA Manager节点)
    所有的MHA配置都是在MHA Manager节点完成的,接下来对MHA进行配置。

    (3.1)SSH节点互信配置
    MHA在failover切换时,需要拷贝binlog/relaylog,需要各个节点无需密码就可以访问,配置如下:
    STEP1:在所有节点生成rsa秘钥

    # 特别注意,该步骤需要在所有节点执行
    [root@monitor ~]# /usr/bin/ssh-keygen -t rsa

    STEP2:把所有节点的公钥存放到同一个文件authorized_keys中,可以在任意节点上操作

    [root@monitor ~]# cd .ssh/
    [root@monitor .ssh]# pwd
    /root/.ssh
    ssh 192.168.10.5 cat ~/.ssh/id_rsa.pub >> authorized_keys
    ssh 192.168.10.11 cat ~/.ssh/id_rsa.pub >> authorized_keys
    ssh 192.168.10.12 cat ~/.ssh/id_rsa.pub >> authorized_keys
    ssh 192.168.10.13 cat ~/.ssh/id_rsa.pub >> authorized_keys

    STEP3:拷贝authorized_keys文件到其它节点

    [root@monitor .ssh]# pwd
    /root/.ssh
    scp authorized_keys root@192.168.10.11:`pwd`
    scp authorized_keys root@192.168.10.12:`pwd`
    scp authorized_keys root@192.168.10.13:`pwd`

    STEP4:验证ssh【可选,后续会使用masterha_check_ssh脚本检查】

    ssh 192.168.10.5 hostname
    ssh 192.168.10.11 hostname
    ssh 192.168.10.12 hostname
    ssh 192.168.10.13 hostname

    (3.2)MHA配置文件
    手动创建配置文件/etc/mha/app1.cnf,信息如下:

    [root@monitor ~]# vim /etc/mha/app1.cnf
    ########## 1.全局配置  ############
    [server default]
    # 设置manager日志
    manager_log=/mha/mha4mysql-manager-master/app1/log/manager.log
    # 设置manager工作路径
    manager_workdir=/mha/mha4mysql-manager-master/app1
    # 设置master binlog位置,以便主节点MySQL数据库宕了后,MHA可以找到binlog日志位置
    master_binlog_dir=/mysql/binlog
    # 设置自动failover时的切换脚本,该脚本需要手动创建
    master_ip_failover_script=/mha/mha4mysql-manager-master/bin/master_ip_failover
    # 设置手动切换时的切换脚本,该脚本需要手动创建
    master_ip_online_change_script=/mha/mha4mysql-manager-master/bin/master_ip_online_change
    # 设置MHA监控、切换用户。运行所有必需的管理命令,例如STOP SLAVE,CHANGE MASTER和RESET SLAVE
    user=mhaadmin
    password="mhaadmin"
    # 设置复制用户、密码。每个从属服务器上的CHANGE MASTER TO master_user ..中使用的MySQL复制用户名。该用户应在目标主机上具有REPLICATION SLAVE特权。
    # 默认情况下,将使用新主服务器上的SHOW SLAVE STATUS中的Master_User(当前作为从服务器运行)。
    repl_user=replica
    repl_password=replica
    # 设置操作系统层面的ssh登录用户
    ssh_user=root
    # 设置发生切换后发送报警的脚本
    # report_script=/mha/mha4mysql-manager-master/bin/send_report
    # 集群内部检测主节点状态
    secondary_check_script=/usr/local/bin/masterha_secondary_check -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.11 --master_port=3306
    # 设置故障发生后关闭故障主机的脚本(主要作用是关闭故障主机防止脑裂)
    shutdown_script=""
    # 设置监控master的间隔时间为3秒,若尝试3次没有回应则自动failover
    ping_interval=5
    # 设置远端MySQL在切换时保存binlog的具体位置
    remote_workdir=/tmp
    
     ########## 2.节点配置  ############
    [server1]
    hostname=192.168.10.11
    port=3306
     
    [server2]
    # 设置为候选master,发生切换后提升为master,即使这个库不是集群中最新的slave
    candidate_master=1
    # 默认情况下一个slave落后master 100Mb的relay logs时MHA将不会选择该slave作为master,通过设置check_repl_delay=0可以保证设置为candidate_master的server一定会提升为新的master
    check_repl_delay=0
    hostname=192.168.10.12
    port=3306
     
    [server3]
    hostname=192.168.10.13
    port=3306

     相关参数配置:

    1.在MHA Manager上创建涉及到的路径

    [root@monitor ~]# mkdir -p /mha/mha4mysql-manager-master/app1/log
    [root@monitor ~]# mkdir -p /mha/mha4mysql-manager-master/bin/

    2.在MySQL主节点上创建用于MHA管理的数据库账号mhaadmin

    create user 'mhaadmin'@'%' identified WITH mysql_native_password by 'mhaadmin';
    grant all privileges on *.* to 'mhaadmin'@'%';
    flush privileges;

    (3.3)故障转移IP切换配置
    MHA还提供了VIP(虚拟IP)切换功能,当MySQL主节点运行正常时,VIP运行在主服务器上,一旦主节点发生故障切换,VIP也会随之切换到新的主节点上,这样的好处是:应用程序使用VIP连接到数据库,即使主库发生故障,也能够继续访问数据库。该文件的路径由MHA配置文件中的master_ip_failover_script参数决定。
    STEP1:在Manager节点添加VIP切换脚本

    [root@monitor ~]# vi /mha/mha4mysql-manager-master/bin/master_ip_failover
    #!/usr/bin/env perl
    
    #  Copyright (C) 2011 DeNA Co.,Ltd.
    #  You should have received a copy of the GNU General Public License
    #   along with this program; if not, write to the Free Software
    #  Foundation, Inc.,
    #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
    
    ## Note: This is a sample script and is not complete. Modify the script based on your environment.
    
    use strict;
    use warnings FATAL => 'all';
    
    use Getopt::Long;
    use MHA::DBHelper;
    
    my (
      $command,        $ssh_user,         $orig_master_host,
      $orig_master_ip, $orig_master_port, $new_master_host,
      $new_master_ip,  $new_master_port,  $new_master_user,
      $new_master_password
    );
    
    # 在每次配置时,修改下面4行即可
    my $vip = '192.168.10.10/24';
    my $key = '1';
    my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
    GetOptions(
      'command=s'             => $command,
      'ssh_user=s'            => $ssh_user,
      'orig_master_host=s'    => $orig_master_host,
      'orig_master_ip=s'      => $orig_master_ip,
      'orig_master_port=i'    => $orig_master_port,
      'new_master_host=s'     => $new_master_host,
      'new_master_ip=s'       => $new_master_ip,
      'new_master_port=i'     => $new_master_port,
      'new_master_user=s'     => $new_master_user,
      'new_master_password=s' => $new_master_password,
    );
    
    exit &main();
    
    sub main {
        print "
    
    IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
    
    ";
        if ( $command eq "stop" || $command eq "stopssh" ) {
            my $exit_code = 1;
            eval {
                print "Disabling the VIP on old master: $orig_master_host 
    ";
                &stop_vip();
                $exit_code = 0;
            };
            if ($@) {
                warn "Got Error: $@
    ";
                exit $exit_code;
            }
            exit $exit_code;
        }
        elsif ( $command eq "start" ) {
            my $exit_code = 10;
            eval {
                print "Enabling the VIP - $vip on the new master - $new_master_host 
    ";
                &start_vip();
                $exit_code = 0;
            };
            if ($@) {
                warn $@;
                exit $exit_code;
            }
            exit $exit_code;
        }
        elsif ( $command eq "status" ) {
            print "Checking the Status of the script.. OK 
    ";
            exit 0;
        }
        else {
            &usage();
            exit 1;
        }
    }
    sub start_vip() {
        `ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
    }
    sub stop_vip() {
         return 0  unless  ($ssh_user);
        `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
    }
    sub usage {
      print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
    ";
    }

    对该脚本授权授予执行的权限

    [root@monitor bin]# ll
    total 4
    -rw-r--r--. 1 root root 2725 Mar 29 16:55 master_ip_failover
    [root@monitor bin]# chmod +x master_ip_failover

    Note:在每次配置时,修改下面4行即可
    my $vip = '192.168.10.10/24';
    my $key = '1';
    my $ssh_start_vip = "/sbin/ifconfig ens34:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens34:$key down";
    其中:

    • $vip是我们定义的虚拟IP,需要与MySQL主机的IP在同一个网段
    • $key随便定义一个数字
    • $ssh_start_vip和$ssh_stop_vip中存在一个"ens34"参数,是我们VIP绑定的网卡,网上很多文章这里使用的是" eth* ",这是redhat的网卡命名方式,具体使用什么参数,可以ifconfig来决定。

    STEP2:在MySQL主节点手动开启VIP

    # 在MySQL主节点手动开启VIP
    /sbin/ifconfig ens34:1 192.168.10.10/24

    添加VIP前后网卡信息对比
    [root@node1 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.143 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::8e05:c8ea:5953:4213 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::b39e:c76c:b3dc:4d74 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:5c txqueuelen 1000 (Ethernet)
    RX packets 40322 bytes 6336618 (6.0 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 989 bytes 74179 (72.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.11 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fed1:5466 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:66 txqueuelen 1000 (Ethernet)
    RX packets 2777 bytes 619139 (604.6 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 2189 bytes 342114 (334.0 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1 (Local Loopback)
    RX packets 141 bytes 26024 (25.4 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 141 bytes 26024 (25.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    [root@node1 ~]# /sbin/ifconfig ens34:1 192.168.10.10/24           # 添加虚拟IP
    [root@node1 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.143 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::8e05:c8ea:5953:4213 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::b39e:c76c:b3dc:4d74 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:5c txqueuelen 1000 (Ethernet)
    RX packets 44562 bytes 6706687 (6.3 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 990 bytes 74239 (72.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.11 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fed1:5466 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:66 txqueuelen 1000 (Ethernet)
    RX packets 2914 bytes 629516 (614.7 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 2300 bytes 354878 (346.5 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    ens34:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
    ether 00:0c:29:d1:54:66 txqueuelen 1000 (Ethernet)

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1 (Local Loopback)
    RX packets 141 bytes 26024 (25.4 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 141 bytes 26024 (25.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    (3.4)在线VIP切换配置
    除了故障转移需要进行VIP切换,在线主从切换也需要切换VIP,在线切换VIP脚本路径由MHA配置文件中的master_ip_online_change_script参数决定。
    在Manager节点添加在线VIP切换脚本:

    [root@monitor bin]# vim /mha/mha4mysql-manager-master/bin/master_ip_online_change 
    
    #!/usr/bin/env perl
     
    #  Copyright (C) 2011 DeNA Co.,Ltd.
    #
    #  This program is free software; you can redistribute it and/or modify
    #  it under the terms of the GNU General Public License as published by
    #  the Free Software Foundation; either version 2 of the License, or
    #  (at your option) any later version.
    #
    #  This program is distributed in the hope that it will be useful,
    #  but WITHOUT ANY WARRANTY; without even the implied warranty of
    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    #  GNU General Public License for more details.
    #
    #  You should have received a copy of the GNU General Public License
    #   along with this program; if not, write to the Free Software
    #  Foundation, Inc.,
    #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
     
    ## Note: This is a sample script and is not complete. Modify the script based on your environment.
     
    use strict;
    use warnings FATAL => 'all';
     
    use Getopt::Long;
    use MHA::DBHelper;
    use MHA::NodeUtil;
    use Time::HiRes qw( sleep gettimeofday tv_interval );
    use Data::Dumper;
     
    my $_tstart;
    my $_running_interval = 0.1;
    my (
      $command,              $orig_master_is_new_slave, $orig_master_host,
      $orig_master_ip,       $orig_master_port,         $orig_master_user,
      $orig_master_password, $orig_master_ssh_user,     $new_master_host,
      $new_master_ip,        $new_master_port,          $new_master_user,
      $new_master_password,  $new_master_ssh_user,
    );
    
    # 在每次配置时,修改下面5行即可
    my $vip = '192.168.10.10/24';
    my $key = '1';
    my $ssh_start_vip = "/sbin/ifconfig ens34:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens34:$key down";
    my $ssh_user = "root";
     
    GetOptions(
      'command=s'                => $command,
      'orig_master_is_new_slave' => $orig_master_is_new_slave,
      'orig_master_host=s'       => $orig_master_host,
      'orig_master_ip=s'         => $orig_master_ip,
      'orig_master_port=i'       => $orig_master_port,
      'orig_master_user=s'       => $orig_master_user,
      'orig_master_password=s'   => $orig_master_password,
      'orig_master_ssh_user=s'   => $orig_master_ssh_user,
      'new_master_host=s'        => $new_master_host,
      'new_master_ip=s'          => $new_master_ip,
      'new_master_port=i'        => $new_master_port,
      'new_master_user=s'        => $new_master_user,
      'new_master_password=s'    => $new_master_password,
      'new_master_ssh_user=s'    => $new_master_ssh_user,
    );
     
    exit &main();
     
    sub current_time_us {
      my ( $sec, $microsec ) = gettimeofday();
      my $curdate = localtime($sec);
      return $curdate . " " . sprintf( "%06d", $microsec );
    }
     
    sub sleep_until {
      my $elapsed = tv_interval($_tstart);
      if ( $_running_interval > $elapsed ) {
        sleep( $_running_interval - $elapsed );
      }
    }
     
    sub get_threads_util {
      my $dbh                    = shift;
      my $my_connection_id       = shift;
      my $running_time_threshold = shift;
      my $type                   = shift;
      $running_time_threshold = 0 unless ($running_time_threshold);
      $type                   = 0 unless ($type);
      my @threads;
     
      my $sth = $dbh->prepare("SHOW PROCESSLIST");
      $sth->execute();
     
      while ( my $ref = $sth->fetchrow_hashref() ) {
        my $id         = $ref->{Id};
        my $user       = $ref->{User};
        my $host       = $ref->{Host};
        my $command    = $ref->{Command};
        my $state      = $ref->{State};
        my $query_time = $ref->{Time};
        my $info       = $ref->{Info};
        $info =~ s/^s*(.*?)s*$/$1/ if defined($info);
        next if ( $my_connection_id == $id );
        next if ( defined($query_time) && $query_time < $running_time_threshold );
        next if ( defined($command)    && $command eq "Binlog Dump" );
        next if ( defined($user)       && $user eq "system user" );
        next
          if ( defined($command)
          && $command eq "Sleep"
          && defined($query_time)
          && $query_time >= 1 );
     
        if ( $type >= 1 ) {
          next if ( defined($command) && $command eq "Sleep" );
          next if ( defined($command) && $command eq "Connect" );
        }
     
        if ( $type >= 2 ) {
          next if ( defined($info) && $info =~ m/^select/i );
          next if ( defined($info) && $info =~ m/^show/i );
        }
     
        push @threads, $ref;
      }
      return @threads;
    }
     
    sub main {
      if ( $command eq "stop" ) {
        ## Gracefully killing connections on the current master
        # 1. Set read_only= 1 on the new master
        # 2. DROP USER so that no app user can establish new connections
        # 3. Set read_only= 1 on the current master
        # 4. Kill current queries
        # * Any database access failure will result in script die.
        my $exit_code = 1;
        eval {
          ## Setting read_only=1 on the new master (to avoid accident)
          my $new_master_handler = new MHA::DBHelper();
     
          # args: hostname, port, user, password, raise_error(die_on_error)_or_not
          $new_master_handler->connect( $new_master_ip, $new_master_port,
            $new_master_user, $new_master_password, 1 );
          print current_time_us() . " Set read_only on the new master.. ";
          $new_master_handler->enable_read_only();
          if ( $new_master_handler->is_read_only() ) {
            print "ok.
    ";
          }
          else {
            die "Failed!
    ";
          }
     
     
          $new_master_handler->disconnect();
     
          # Connecting to the orig master, die if any database error happens
          my $orig_master_handler = new MHA::DBHelper();
          $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
            $orig_master_user, $orig_master_password, 1 );
     
          ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
          $orig_master_handler->disable_log_bin_local();
          print current_time_us() . " Drpping app user on the orig master..
    ";
          # FIXME_xxx_drop_app_user($orig_master_handler);
     
          ## Waiting for N * 100 milliseconds so that current connections can exit
          my $time_until_read_only = 15;
          $_tstart = [gettimeofday];
          my @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
          while ( $time_until_read_only > 0 && $#threads >= 0 ) {
            if ( $time_until_read_only % 5 == 0 ) {
              printf
    "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)
    ",
                current_time_us(), $#threads + 1, $time_until_read_only * 100;
              if ( $#threads < 5 ) {
                print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "
    "
                  foreach (@threads);
              }
            }
            sleep_until();
            $_tstart = [gettimeofday];
            $time_until_read_only--;
            @threads = get_threads_util( $orig_master_handler->{dbh},
              $orig_master_handler->{connection_id} );
          }
     
          ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
          print current_time_us() . " Set read_only=1 on the orig master.. ";
          $orig_master_handler->enable_read_only();
          if ( $orig_master_handler->is_read_only() ) {
            print "ok.
    ";
          }
          else {
            die "Failed!
    ";
          }
     
          ## Waiting for M * 100 milliseconds so that current update queries can complete
          my $time_until_kill_threads = 5;
          @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
          while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
            if ( $time_until_kill_threads % 5 == 0 ) {
              printf
    "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)
    ",
                current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
              if ( $#threads < 5 ) {
                print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "
    "
                  foreach (@threads);
              }
            }
            sleep_until();
            $_tstart = [gettimeofday];
            $time_until_kill_threads--;
            @threads = get_threads_util( $orig_master_handler->{dbh},
              $orig_master_handler->{connection_id} );
          }
     
          ## Terminating all threads
          print current_time_us() . " Killing all application threads..
    ";
          $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
          print current_time_us() . " done.
    ";
          $orig_master_handler->enable_log_bin_local();
          $orig_master_handler->disconnect();
     
          ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
     
          print "Disabling the VIP on old master: $orig_master_host 
    ";
          &stop_vip();      
          $exit_code = 0;
        };
        if ($@) {
          warn "Got Error: $@
    ";
          exit $exit_code;
        }
        exit $exit_code;
      }
      elsif ( $command eq "start" ) {
        ## Activating master ip on the new master
        # 1. Create app user with write privileges
        # 2. Moving backup script if needed
        # 3. Register new master's ip to the catalog database
     
    # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
    # If exit code is 0 or 10, MHA does not abort
        my $exit_code = 10;
        eval {
             my $new_master_handler = new MHA::DBHelper();
     
          # args: hostname, port, user, password, raise_error_or_not
            $new_master_handler->connect( $new_master_ip, $new_master_port,
            $new_master_user, $new_master_password, 1 );
          #####################עzhu shi jin yong er jin zhi ri zhi    ###########################
            ## Set disable log bin
            #$new_master_handler->disable_log_bin_local();
          #########################################
          print current_time_us() . " Set read_only=0 on the new master.
    ";
          ## Set read_only=0 on the new master
          $new_master_handler->disable_read_only();
         
          #################### zhu shi chuang jian  yong hu , bing kai qi bin log
          ## Creating an app user on the new master
          # print current_time_us() . " Creating app user on the new master..
    ";
          # FIXME_xxx_create_app_user($new_master_handler);
          # $new_master_handler->enable_log_bin_local();
          # $new_master_handler->disconnect();
          
          #######################################################################
          print "Enabling the VIP - $vip on the new master - $new_master_host 
    ";
          &start_vip();
          ## Update master ip on the catalog database, etc
          $exit_code = 0;
        };
        if ($@) {
          warn "Got Error: $@
    ";
          exit $exit_code;
        }
        exit $exit_code;
      }
      elsif ( $command eq "status" ) {
     
        # do nothing
        &start_vip();   
         exit 0;
      }
      else {
        &usage();
        exit 1;
      }
    }
     
     
    # simple system call that enable the VIP on the new master
    sub start_vip() {
        `ssh $ssh_user@$new_master_ip " $ssh_start_vip "`;
    }
    # A simple system call that disable the VIP on the old_master
    sub stop_vip() {
        `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
    }
    sub usage {
      print
    "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
    ";
      die;
    }

    (3.5)检查MHA配置是否正确
    MHA配置检查主要检查:SSH节点互信和主从复制状态。
    STEP1:SSH节点互信检查

    [root@monitor ~]# masterha_check_ssh -conf=/etc/mha/app1.cnf
    Mon Mar 29 17:07:16 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Mar 29 17:07:16 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
    Mon Mar 29 17:07:16 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
    Mon Mar 29 17:07:16 2021 - [info] Starting SSH connection tests..
    Mon Mar 29 17:07:17 2021 - [debug]
    Mon Mar 29 17:07:16 2021 - [debug] Connecting via SSH from root@192.168.10.11(192.168.10.11:22) to root@192.168.10.12(192.168.10.12:22)..
    Mon Mar 29 17:07:16 2021 - [debug] ok.
    Mon Mar 29 17:07:16 2021 - [debug] Connecting via SSH from root@192.168.10.11(192.168.10.11:22) to root@192.168.10.13(192.168.10.13:22)..
    Mon Mar 29 17:07:17 2021 - [debug] ok.
    Mon Mar 29 17:07:18 2021 - [debug]
    Mon Mar 29 17:07:16 2021 - [debug] Connecting via SSH from root@192.168.10.12(192.168.10.12:22) to root@192.168.10.11(192.168.10.11:22)..
    Warning: Permanently added '192.168.10.11' (ECDSA) to the list of known hosts.
    Mon Mar 29 17:07:17 2021 - [debug] ok.
    Mon Mar 29 17:07:17 2021 - [debug] Connecting via SSH from root@192.168.10.12(192.168.10.12:22) to root@192.168.10.13(192.168.10.13:22)..
    Warning: Permanently added '192.168.10.13' (ECDSA) to the list of known hosts.
    Mon Mar 29 17:07:17 2021 - [debug] ok.
    Mon Mar 29 17:07:18 2021 - [debug]
    Mon Mar 29 17:07:17 2021 - [debug] Connecting via SSH from root@192.168.10.13(192.168.10.13:22) to root@192.168.10.11(192.168.10.11:22)..
    Warning: Permanently added '192.168.10.11' (ECDSA) to the list of known hosts.
    Mon Mar 29 17:07:17 2021 - [debug] ok.
    Mon Mar 29 17:07:17 2021 - [debug] Connecting via SSH from root@192.168.10.13(192.168.10.13:22) to root@192.168.10.12(192.168.10.12:22)..
    Warning: Permanently added '192.168.10.12' (ECDSA) to the list of known hosts.
    Mon Mar 29 17:07:18 2021 - [debug] ok.
    Mon Mar 29 17:07:18 2021 - [info] All SSH connection tests passed successfully.

     如果最后返回“successfully”字样,则说明SSH节点互信没有问题。

    STEP2:主从复制状态检查

    [root@monitor ~]# masterha_check_repl -conf=/etc/mha/app1.cnf
    Mon Mar 29 17:10:58 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Mar 29 17:10:58 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
    Mon Mar 29 17:10:58 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
    Mon Mar 29 17:10:58 2021 - [info] MHA::MasterMonitor version 0.58.
    Mon Mar 29 17:10:59 2021 - [info] GTID failover mode = 1
    Mon Mar 29 17:10:59 2021 - [info] Dead Servers:
    Mon Mar 29 17:10:59 2021 - [info] Alive Servers:
    Mon Mar 29 17:10:59 2021 - [info] 192.168.10.11(192.168.10.11:3306)
    Mon Mar 29 17:10:59 2021 - [info] 192.168.10.12(192.168.10.12:3306)
    Mon Mar 29 17:10:59 2021 - [info] 192.168.10.13(192.168.10.13:3306)
    Mon Mar 29 17:10:59 2021 - [info] Alive Slaves:
    Mon Mar 29 17:10:59 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    Mon Mar 29 17:10:59 2021 - [info] GTID ON
    Mon Mar 29 17:10:59 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    Mon Mar 29 17:10:59 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    Mon Mar 29 17:10:59 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    Mon Mar 29 17:10:59 2021 - [info] GTID ON
    Mon Mar 29 17:10:59 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    Mon Mar 29 17:10:59 2021 - [info] Current Alive Master: 192.168.10.11(192.168.10.11:3306)
    Mon Mar 29 17:10:59 2021 - [info] Checking slave configurations..
    Mon Mar 29 17:10:59 2021 - [info] Checking replication filtering settings..
    Mon Mar 29 17:10:59 2021 - [info] binlog_do_db= , binlog_ignore_db=
    Mon Mar 29 17:10:59 2021 - [info] Replication filtering check ok.
    Mon Mar 29 17:10:59 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
    Mon Mar 29 17:10:59 2021 - [info] Checking SSH publickey authentication settings on the current master..
    Mon Mar 29 17:10:59 2021 - [info] HealthCheck: SSH to 192.168.10.11 is reachable.
    Mon Mar 29 17:10:59 2021 - [info]
    192.168.10.11(192.168.10.11:3306) (current master)
    +--192.168.10.12(192.168.10.12:3306)
    +--192.168.10.13(192.168.10.13:3306)
    
    Mon Mar 29 17:10:59 2021 - [info] Checking replication health on 192.168.10.12..
    Mon Mar 29 17:10:59 2021 - [info] ok.
    Mon Mar 29 17:10:59 2021 - [info] Checking replication health on 192.168.10.13..
    Mon Mar 29 17:10:59 2021 - [info] ok.
    Mon Mar 29 17:10:59 2021 - [info] Checking master_ip_failover_script status:
    Mon Mar 29 17:10:59 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306
    
    
    IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    
    Checking the Status of the script.. OK
    Mon Mar 29 17:10:59 2021 - [info] OK.
    Mon Mar 29 17:10:59 2021 - [warning] shutdown_script is not defined.
    Mon Mar 29 17:10:59 2021 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

    如果最后出现“OK”,则说明主从复制正常。

    (四)MHA Manager相关脚本说明

    在安装完MHA Manager后,会生成一系列的脚本文件,所有文件机器作用如下:

    masterha_check_repl:检查MySQL复制运行情况
    masterha_check_ssh:检查MHA与MySQL节点之间的ssh互信
    masterha_check_status:检查masterha_manager的运行状态
    masterha_manager:启动MHA Manager
    masterha_stop:停止MHA Manager
    masterha_master_switch :用于手动failover和手动主节点切换
    masterha_secondary_check :通过MySQL从节点检查主节点状态
    masterha_master_monitor:用于监控MySQL主节点,不常用
    masterha_conf_host:一个帮助程序脚本,用于从配置文件中添加/删除主机条目,不常用

    这里我们对常用的几个脚本进行学习。

    (4.1)masterha_manager
    用途:用于启动mha manager,完成自动故障转移。启动方式如下:

    # 在前台启动mha manager
    masterha_manager --conf=/etc/mha/app1.cnf
    
    # 在后台启动mha manager
    nohup masterha_manager --conf=/etc/mha/app1.cnf &

    其它重要参数:

    --ignore_fail_on_start :默认情况下如果任何从属服务器发生故障,MHA Manager都不会启动故障转移,但是某些情况下,你可能希望进行故障转移。则可以在mha配置文件的节点配置部分添加ignore_fail=1,在masterha_manager添加该参数,及时这些服务器发生故障,MHA也会进行故障转移
    --ignore_last_failover :如果先前的故障转移失败,则8小时内不能再次启动故障转移,因为该问题可能再次发生。启动故障转移的步骤通常是手动删除$manager_workdir/$(app_name).failover.error文件;通过设置--ignore_last_failover,无论最近的故障转移状态如何,MHA都会进行故障转移
    --last_failover_minute = 分钟 :如果故障转移是最近完成的(默认8小时),则mha manager不会进行故障转移,因为很可能通过故障转移无法解决问题。此参数可以用来修改时间标准
    --remove_dead_master_conf :添加此参数后,如果故障转移成功完成,则mha manager会从配置文件删除失效主服务器的部分。如果不加该参数,在切换时不会修改配置文件,此时如果启动masterha_manager,则masterha_manager将停止,并显示错误"there is a dead slave"

    (4.2)masterha_master_switch
    用途:用于①手动故障转移;在线手动主从切换
    手动故障转移主要用于主节点已经down情况下的切换,在线手动主从切换主要是将主节点切换到另一个机器上,以便于进行一些维护工作。

    (4.2.1)手动故障转移
    手动故障转移示例:

    masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=192.168.10.11 --ignore_last_failover

    其它重要参数:
    --master_state=dead :必填参数。 可设置为"dead"和"alive",当使用手动故障转移的时候,设置为"dead",当使用在线手动主从切换的时候,设置为"alive"
    --dead_master_host = 主机名 :必填参数。死主机的主机名,也可以选择设置--dead_master_ip和--dead_master_port。
    --new_master_host = 主机名 :可选参数。显式设置某个节点为MySQL主节点,如果未设置该参数,则MHA会根据选主规则来确定新的主服务器
    --new_master_host = 端口号 :可选参数。新主服务器上MySQL实例的侦听端口,默认3306
    --interactive = (0 | 1) :切换过程是否启用非交互式。默认1,交互式
    --ignore_last_failover :与masterha_manager相同
    --remove_dead_master_conf :与masterha_manager相同

    (4.2.2)在线主从切换
    在线主从切换示例:

    masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=192.168.10.11 -- orig_master_is_new_slave

    其它重要参数:
    --new_master_host = 主机名 : 新主节点的主机名
    --orig_master_is_new_slave :主机切换完成后,先前的主机将作为新主机的从机运行。默认情况下,它是禁用的
    --running_updates_limit = (秒) :如果MySQL主节点写请求超过这个参数,或者从节点落后于主节点这个值,则切换中止
    --remove_orig_master_conf :如果在线切换成果,则MHA将自动移除配置文件中的原始主节点信息
    --skip_lock_all_tables :当执行在线切换,在原主节点上运行"flush tables with read lock"确保主节点停止更新

    (4.3)masterha_secondary_check
    用途:用于主节点状态检查,该脚本需要在mha manager的配置文件的参数secondary_check_script中设置。在MHA的配置文件中,secondary_check_script脚本是可选的,个人建议配置上,为什么呢?我们可以通过下面的图来做说明:

    • 当不配置secondary_check_script脚本时,mha manager直接连接到mysql master查看节点状态。此时如果mha manager与MySQL主节点存在网络故障,即使MySQL主节点正常运行,也可能产生误判,导致发生故障转移。
    • 当配置了secondary_check_script脚本时,首先mha manager直接连接到mysql master查看节点状态,如果判断主节点异常,则会通过从节点连接到主节点再次判断主节点是否正常,如果3条链路都无法连接到主节点,说明主节点down,不存在误判。

     

    (五)MHA使用测试
    如果要进行自动故障切换,则需要启动mha_manager;如果要进行手动故障切换或者手动主从切换,则需要关闭mha manager。

    # 启动mha监控MySQL状态
    [root@monitor ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover &
    [1] 30989
    [root@monitor ~]# nohup: ignoring input and appending output to ‘nohup.out’
    
    # 查看mha manager的运行状态
    [root@monitor ~]# masterha_check_status --conf=/etc/mha/app1.cnf
    app1 (pid:30989) is running(0:PING_OK), master:192.168.10.11
    
    # 如果要关闭mha manager,使用如下命令
    [root@monitor ~]# masterha_stop --conf=/etc/mha/app1.cnf
    Stopped app1 successfully.
    [1]+ Exit 1 nohup masterha_manager -conf=/etc/mha/app1.cnf

    (5.1)MHA故障切换测试(主节点服务器运行正常,数据库实例down)
    STEP1:开启MHA Manager

    [root@monitor ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover &
    [1] 2794
    [root@monitor ~]# nohup: ignoring input and appending output to ‘nohup.out’

    STEP2:在主节点node1关闭MySQL数据库

    [root@node1 ~]# mysqladmin -uroot -p123456 shutdown
    mysqladmin: [Warning] Using a password on the command line interface can be insecure.
    
    [root@node1 ~]#
    [root@node1 ~]# service mysqld status
    ERROR! MySQL is not running, but lock file (/var/lock/subsys/mysql) exists

    STEP3:查看数据库是否发生了主服务器故障转移
    查看节点2,发现该节点的从库信息已经消除,目前已经为主库

    [root@node2 ~]# mysql -uroot -p123456
    mysql> show slave status G
    Empty set (0.00 sec)
    
    mysql> show master status G
    *************************** 1. row ***************************
                 File: node2-bin.000005
             Position: 194
         Binlog_Do_DB: 
     Binlog_Ignore_DB: 
    Executed_Gtid_Set: 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    30753d6b-8790-11eb-864f-000c2999ad6c:1-2
    1 row in set (0.00 sec) 

    查看节点3,发现其主节点信息已自动变更为节点2

    [root@node3 ~]# mysql -uroot -p123456
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.12
                      Master_User: replica
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: node2-bin.000005
              Read_Master_Log_Pos: 194
                   Relay_Log_File: node3-relay-bin.000007
                    Relay_Log_Pos: 407
            Relay_Master_Log_File: node2-bin.000005
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    STEP4:查看虚拟IP是否已经发生了漂移
    节点1的VIP已经被关闭

    [root@node1 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.143 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:5c txqueuelen 1000 (Ethernet)
    RX packets 14274 bytes 1210361 (1.1 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 20 bytes 1965 (1.9 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.11 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fed1:5466 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:d1:54:66 txqueuelen 1000 (Ethernet)
    RX packets 3095 bytes 356811 (348.4 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 2254 bytes 378194 (369.3 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1 (Local Loopback)
    RX packets 48 bytes 4194 (4.0 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 48 bytes 4194 (4.0 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    节点2的VIP已经自动生成

    [root@node2 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.51 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::b39e:c76c:b3dc:4d74 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:99:ad:6c txqueuelen 1000 (Ethernet)
    RX packets 16102 bytes 1371651 (1.3 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 23 bytes 2286 (2.2 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.12 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fe99:ad76 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:99:ad:76 txqueuelen 1000 (Ethernet)
    RX packets 1474 bytes 153431 (149.8 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1103 bytes 351924 (343.6 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
    ether 00:0c:29:99:ad:76 txqueuelen 1000 (Ethernet)
    
    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1 (Local Loopback)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    整个切换的日志信息如下:

      1 ############################# MHA manager启动日志--开始 ####################################
      2 Tue Mar 30 09:54:21 2021 - [info] MHA::MasterMonitor version 0.58.
      3 Tue Mar 30 09:54:22 2021 - [info] GTID failover mode = 1 # 是否启用gtid
      4 Tue Mar 30 09:54:22 2021 - [info] Dead Servers:
      5 Tue Mar 30 09:54:22 2021 - [info] Alive Servers: # 确认所有正常节点
      6 Tue Mar 30 09:54:22 2021 - [info] 192.168.10.11(192.168.10.11:3306)
      7 Tue Mar 30 09:54:22 2021 - [info] 192.168.10.12(192.168.10.12:3306)
      8 Tue Mar 30 09:54:22 2021 - [info] 192.168.10.13(192.168.10.13:3306)
      9 Tue Mar 30 09:54:22 2021 - [info] Alive Slaves: # Slave节点信息确认:是否启用binlog,是否启用gtifd,主节点信息,candidate_master参数确认
     10 Tue Mar 30 09:54:22 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     11 Tue Mar 30 09:54:22 2021 - [info] GTID ON
     12 Tue Mar 30 09:54:22 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     13 Tue Mar 30 09:54:22 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     14 Tue Mar 30 09:54:22 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     15 Tue Mar 30 09:54:22 2021 - [info] GTID ON
     16 Tue Mar 30 09:54:22 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     17 Tue Mar 30 09:54:22 2021 - [info] Current Alive Master: 192.168.10.11(192.168.10.11:3306)
     18 Tue Mar 30 09:54:22 2021 - [info] Checking slave configurations..
     19 Tue Mar 30 09:54:22 2021 - [info] Checking replication filtering settings..
     20 Tue Mar 30 09:54:22 2021 - [info] binlog_do_db= , binlog_ignore_db=
     21 Tue Mar 30 09:54:22 2021 - [info] Replication filtering check ok.
     22 Tue Mar 30 09:54:22 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
     23 Tue Mar 30 09:54:22 2021 - [info] Checking SSH publickey authentication settings on the current master..
     24 Tue Mar 30 09:54:23 2021 - [info] HealthCheck: SSH to 192.168.10.11 is reachable.
     25 Tue Mar 30 09:54:23 2021 - [info] # MySQL主从关系拓扑图
     26 192.168.10.11(192.168.10.11:3306) (current master)
     27 +--192.168.10.12(192.168.10.12:3306)
     28 +--192.168.10.13(192.168.10.13:3306)
     29 
     30 Tue Mar 30 09:54:23 2021 - [info] Checking master_ip_failover_script status: # 检查故障转移IP切换脚本
     31 Tue Mar 30 09:54:23 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306
     32 
     33 
     34 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
     35 
     36 Checking the Status of the script.. OK
     37 Tue Mar 30 09:54:23 2021 - [info] OK.
     38 Tue Mar 30 09:54:23 2021 - [warning] shutdown_script is not defined.
     39 Tue Mar 30 09:54:23 2021 - [info] Set master ping interval 5 seconds. # 设置mha manager检查MySQL主节点的频率
     40 Tue Mar 30 09:54:23 2021 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.11 --master_port=3306 # 通过从节点检查主节点的脚本
     41 Tue Mar 30 09:54:23 2021 - [info] Starting ping health check on 192.168.10.11(192.168.10.11:3306)..
     42 Tue Mar 30 09:54:23 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. # mha manager在主节点执行ping成功,持续监控主节点
     43 ############################# MHA manager启动日志--结束 ####################################
     44 
     45 
     46 ############################# MySQL关闭,MHA manager开始切换主节点--开始 ####################################
     47 
     48 ###########################################################################################
     49 # 1.检测到mha manager无法连接到mysql主节点,然后通过MySQL从节点再次访问主节点数据库,发现无法
     50 # 访问主节点数据库,但是ssh连接是通的
     51 ###########################################################################################
     52 Tue Mar 30 09:56:18 2021 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
     53 Tue Mar 30 09:56:18 2021 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.11 --master_port=3306 --user=root --master_host=192.168.10.11 --master_ip=192.168.10.11 --master_port=3306 --master_user=mhaadmin --master_password=mhaadmin --ping_type=SELECT
     54 Tue Mar 30 09:56:18 2021 - [info] Executing SSH check script: exit 0
     55 Warning: Permanently added 'node2' (ECDSA) to the list of known hosts.
     56 Tue Mar 30 09:56:18 2021 - [info] HealthCheck: SSH to 192.168.10.11 is reachable.
     57 Monitoring server node2 is reachable, Master is not reachable from node2. OK.
     58 Warning: Permanently added 'node3' (ECDSA) to the list of known hosts.
     59 Monitoring server node3 is reachable, Master is not reachable from node3. OK.
     60 Tue Mar 30 09:56:19 2021 - [info] Master is not reachable from all other monitoring servers. Failover should start.
     61 Tue Mar 30 09:56:23 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (111))
     62 Tue Mar 30 09:56:23 2021 - [warning] Connection failed 2 time(s)..
     63 Tue Mar 30 09:56:28 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (111))
     64 Tue Mar 30 09:56:28 2021 - [warning] Connection failed 3 time(s)..
     65 Tue Mar 30 09:56:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (111))
     66 Tue Mar 30 09:56:33 2021 - [warning] Connection failed 4 time(s)..
     67 Tue Mar 30 09:56:33 2021 - [warning] Master is not reachable from health checker!
     68 Tue Mar 30 09:56:33 2021 - [warning] Master 192.168.10.11(192.168.10.11:3306) is not reachable!
     69 Tue Mar 30 09:56:33 2021 - [warning] SSH is reachable.
     70 
     71 ###########################################################################################
     72 # 2.检查当前MySQL节点的状态,确认:主节点down,从节点运行正常,
     73 # 停止mha监控脚本,开始MySQL主节点切换
     74 ###########################################################################################
     75 Tue Mar 30 09:56:33 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
     76 Tue Mar 30 09:56:33 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
     77 Tue Mar 30 09:56:33 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
     78 Tue Mar 30 09:56:33 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
     79 Tue Mar 30 09:56:34 2021 - [info] GTID failover mode = 1
     80 Tue Mar 30 09:56:34 2021 - [info] Dead Servers:
     81 Tue Mar 30 09:56:34 2021 - [info] 192.168.10.11(192.168.10.11:3306)
     82 Tue Mar 30 09:56:34 2021 - [info] Alive Servers:
     83 Tue Mar 30 09:56:34 2021 - [info] 192.168.10.12(192.168.10.12:3306)
     84 Tue Mar 30 09:56:34 2021 - [info] 192.168.10.13(192.168.10.13:3306)
     85 Tue Mar 30 09:56:34 2021 - [info] Alive Slaves:
     86 Tue Mar 30 09:56:34 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     87 Tue Mar 30 09:56:34 2021 - [info] GTID ON
     88 Tue Mar 30 09:56:34 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     89 Tue Mar 30 09:56:34 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     90 Tue Mar 30 09:56:34 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     91 Tue Mar 30 09:56:34 2021 - [info] GTID ON
     92 Tue Mar 30 09:56:34 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     93 Tue Mar 30 09:56:34 2021 - [info] Checking slave configurations..
     94 Tue Mar 30 09:56:34 2021 - [info] Checking replication filtering settings..
     95 Tue Mar 30 09:56:34 2021 - [info] Replication filtering check ok.
     96 Tue Mar 30 09:56:34 2021 - [info] Master is down!
     97 Tue Mar 30 09:56:34 2021 - [info] Terminating monitoring script.
     98 Tue Mar 30 09:56:34 2021 - [info] Got exit code 20 (Master dead).
     99 Tue Mar 30 09:56:34 2021 - [info] MHA::MasterFailover version 0.58.
    100 Tue Mar 30 09:56:34 2021 - [info] Starting master failover.
    101 Tue Mar 30 09:56:34 2021 - [info]
    102 
    103 ###########################################################################################
    104 # phase 1: 检查MySQL主从节点配置信息
    105 ###########################################################################################
    106 Tue Mar 30 09:56:34 2021 - [info] * Phase 1: Configuration Check Phase..
    107 Tue Mar 30 09:56:34 2021 - [info]
    108 Tue Mar 30 09:56:35 2021 - [info] GTID failover mode = 1
    109 Tue Mar 30 09:56:35 2021 - [info] Dead Servers:
    110 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.11(192.168.10.11:3306)
    111 Tue Mar 30 09:56:35 2021 - [info] Checking master reachability via MySQL(double check)...
    112 Tue Mar 30 09:56:35 2021 - [info] ok.
    113 Tue Mar 30 09:56:35 2021 - [info] Alive Servers:
    114 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.12(192.168.10.12:3306)
    115 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.13(192.168.10.13:3306)
    116 Tue Mar 30 09:56:35 2021 - [info] Alive Slaves:
    117 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    118 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    119 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    120 Tue Mar 30 09:56:35 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    121 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    122 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    123 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    124 Tue Mar 30 09:56:35 2021 - [info] Starting GTID based failover.
    125 Tue Mar 30 09:56:35 2021 - [info]
    126 Tue Mar 30 09:56:35 2021 - [info] ** Phase 1: Configuration Check Phase completed.
    127 Tue Mar 30 09:56:35 2021 - [info]
    128 
    129 ###########################################################################################
    130 # phase 2: ① 关闭老的主节点的VIP
    131 # ② 关闭老的主节点服务器,我没有配置关闭服务器脚本,因此略过
    132 ###########################################################################################
    133 Tue Mar 30 09:56:35 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
    134 Tue Mar 30 09:56:35 2021 - [info]
    135 Tue Mar 30 09:56:35 2021 - [info] Forcing shutdown so that applications never connect to the current master..
    136 Tue Mar 30 09:56:35 2021 - [info] Executing master IP deactivation script:
    137 Tue Mar 30 09:56:35 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --command=stopssh --ssh_user=root
    138 
    139 
    140 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    141 
    142 Disabling the VIP on old master: 192.168.10.11
    143 Tue Mar 30 09:56:35 2021 - [info] done.
    144 Tue Mar 30 09:56:35 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    145 Tue Mar 30 09:56:35 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    146 Tue Mar 30 09:56:35 2021 - [info]
    147 
    148 ###########################################################################################
    149 # phase 3: 主节点选举
    150 # phase 3.1 : 确认所有数据库节点最新的binlog日志位置,以决定哪个节点的数据是最新的
    151 # phase 3.2 : 选取新的MySQL主节点,选举方式取决于章节《(6.2)选择哪个主机作为主节点》,
    152 # 日志中有2个phase 3.3,把第一个当做3.2
    153 # phase 3.3 : 选取新的主节点,应用差异日志,激活虚拟IP,设置新的主节点为读写状态
    154 ###########################################################################################
    155 Tue Mar 30 09:56:35 2021 - [info] * Phase 3: Master Recovery Phase..
    156 Tue Mar 30 09:56:35 2021 - [info]
    157 Tue Mar 30 09:56:35 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    158 Tue Mar 30 09:56:35 2021 - [info]
    159 Tue Mar 30 09:56:35 2021 - [info] The latest binary log file/position on all slaves is node1-bin.000011:194
    160 Tue Mar 30 09:56:35 2021 - [info] Retrieved Gtid Set: 2db1f74f-8790-11eb-b668-000c29d1545c:605-94186
    161 Tue Mar 30 09:56:35 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
    162 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    163 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    164 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    165 Tue Mar 30 09:56:35 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    166 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    167 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    168 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    169 Tue Mar 30 09:56:35 2021 - [info] The oldest binary log file/position on all slaves is node1-bin.000011:194
    170 Tue Mar 30 09:56:35 2021 - [info] Retrieved Gtid Set: 2db1f74f-8790-11eb-b668-000c29d1545c:605-94186
    171 Tue Mar 30 09:56:35 2021 - [info] Oldest slaves:
    172 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    173 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    174 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    175 Tue Mar 30 09:56:35 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    176 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    177 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    178 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    179 Tue Mar 30 09:56:35 2021 - [info]
    180 Tue Mar 30 09:56:35 2021 - [info] * Phase 3.3: Determining New Master Phase..
    181 Tue Mar 30 09:56:35 2021 - [info]
    182 Tue Mar 30 09:56:35 2021 - [info] Searching new master from slaves..
    183 Tue Mar 30 09:56:35 2021 - [info] Candidate masters from the configuration file:
    184 Tue Mar 30 09:56:35 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    185 Tue Mar 30 09:56:35 2021 - [info] GTID ON
    186 Tue Mar 30 09:56:35 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    187 Tue Mar 30 09:56:35 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    188 Tue Mar 30 09:56:35 2021 - [info] Non-candidate masters:
    189 Tue Mar 30 09:56:35 2021 - [info] Searching from candidate_master slaves which have received the latest relay log events..
    190 Tue Mar 30 09:56:35 2021 - [info] New master is 192.168.10.12(192.168.10.12:3306)
    191 Tue Mar 30 09:56:35 2021 - [info] Starting master failover..
    192 Tue Mar 30 09:56:35 2021 - [info]
    193 From:
    194 192.168.10.11(192.168.10.11:3306) (current master)
    195 +--192.168.10.12(192.168.10.12:3306)
    196 +--192.168.10.13(192.168.10.13:3306)
    197 
    198 To:
    199 192.168.10.12(192.168.10.12:3306) (new master)
    200 +--192.168.10.13(192.168.10.13:3306)
    201 Tue Mar 30 09:56:35 2021 - [info]
    202 Tue Mar 30 09:56:35 2021 - [info] * Phase 3.3: New Master Recovery Phase..
    203 Tue Mar 30 09:56:35 2021 - [info]
    204 Tue Mar 30 09:56:35 2021 - [info] Waiting all logs to be applied..
    205 Tue Mar 30 09:56:35 2021 - [info] done.
    206 Tue Mar 30 09:56:35 2021 - [info] Getting new master's binlog name and position..
    207 Tue Mar 30 09:56:35 2021 - [info] node2-bin.000005:194
    208 Tue Mar 30 09:56:35 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replica', MASTER_PASSWORD='xxx';
    209 Tue Mar 30 09:56:35 2021 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: node2-bin.000005, 194, 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    210 30753d6b-8790-11eb-864f-000c2999ad6c:1-2
    211 Tue Mar 30 09:56:35 2021 - [info] Executing master IP activate script:
    212 Tue Mar 30 09:56:35 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --new_master_host=192.168.10.12 --new_master_ip=192.168.10.12 --new_master_port=3306 --new_master_user='mhaadmin' --new_master_password=xxx
    213 
    214 
    215 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    216 
    217 Enabling the VIP - 192.168.10.10/24 on the new master - 192.168.10.12
    218 Tue Mar 30 09:56:36 2021 - [info] OK.
    219 Tue Mar 30 09:56:36 2021 - [info] Setting read_only=0 on 192.168.10.12(192.168.10.12:3306)..
    220 Tue Mar 30 09:56:36 2021 - [info] ok.
    221 Tue Mar 30 09:56:36 2021 - [info] ** Finished master recovery successfully.
    222 Tue Mar 30 09:56:36 2021 - [info] * Phase 3: Master Recovery Phase completed.
    223 Tue Mar 30 09:56:36 2021 - [info]
    224 
    225 ###########################################################################################
    226 # phase 4: 从节点指向新的主节点
    227 # 执行reset slave --> change master指向新的主节点 --> 开启slave
    228 ###########################################################################################
    229 Tue Mar 30 09:56:36 2021 - [info] * Phase 4: Slaves Recovery Phase..
    230 Tue Mar 30 09:56:36 2021 - [info]
    231 Tue Mar 30 09:56:36 2021 - [info]
    232 Tue Mar 30 09:56:36 2021 - [info] * Phase 4.1: Starting Slaves in parallel..
    233 Tue Mar 30 09:56:36 2021 - [info]
    234 Tue Mar 30 09:56:36 2021 - [info] -- Slave recovery on host 192.168.10.13(192.168.10.13:3306) started, pid: 3995. Check tmp log /mha/mha4mysql-manager-master/app1/192.168.10.13_3306_20210330095634.log if it takes time..
    235 Tue Mar 30 09:56:37 2021 - [info]
    236 Tue Mar 30 09:56:37 2021 - [info] Log messages from 192.168.10.13 ...
    237 Tue Mar 30 09:56:37 2021 - [info]
    238 Tue Mar 30 09:56:36 2021 - [info] Resetting slave 192.168.10.13(192.168.10.13:3306) and starting replication from the new master 192.168.10.12(192.168.10.12:3306)..
    239 Tue Mar 30 09:56:36 2021 - [info] Executed CHANGE MASTER.
    240 Tue Mar 30 09:56:36 2021 - [info] Slave started.
    241 Tue Mar 30 09:56:36 2021 - [info] gtid_wait(2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    242 30753d6b-8790-11eb-864f-000c2999ad6c:1-2) completed on 192.168.10.13(192.168.10.13:3306). Executed 3 events.
    243 Tue Mar 30 09:56:37 2021 - [info] End of log messages from 192.168.10.13.
    244 Tue Mar 30 09:56:37 2021 - [info] -- Slave on host 192.168.10.13(192.168.10.13:3306) started.
    245 Tue Mar 30 09:56:37 2021 - [info] All new slave servers recovered successfully.
    246 Tue Mar 30 09:56:37 2021 - [info]
    247 
    248 ###########################################################################################
    249 # phase 5: 新的主节点清除之前的slave信息
    250 # 新的主节点是之前的slave节点,即使已经变为主节点,通过show slave staus依然可以看到之前的信息,
    251 # 该步骤的目的就是清除之前的slave信息
    252 ###########################################################################################
    253 Tue Mar 30 09:56:37 2021 - [info] * Phase 5: New master cleanup phase..
    254 Tue Mar 30 09:56:37 2021 - [info]
    255 Tue Mar 30 09:56:37 2021 - [info] Resetting slave info on the new master..
    256 Tue Mar 30 09:56:37 2021 - [info] 192.168.10.12: Resetting slave info succeeded.
    257 Tue Mar 30 09:56:37 2021 - [info] Master failover to 192.168.10.12(192.168.10.12:3306) completed successfully.
    258 Tue Mar 30 09:56:37 2021 - [info]
    259 
    260 
    261 ###########################################################################################
    262 # 最后的结论
    263 # 切换成功
    264 ###########################################################################################
    265 ----- Failover Report -----
    266 
    267 app1: MySQL Master failover 192.168.10.11(192.168.10.11:3306) to 192.168.10.12(192.168.10.12:3306) succeeded
    268 
    269 Master 192.168.10.11(192.168.10.11:3306) is down!
    270 
    271 Check MHA Manager logs at monitor:/mha/mha4mysql-manager-master/app1/log/manager.log for details.
    272 
    273 Started automated(non-interactive) failover.
    274 Invalidated master IP address on 192.168.10.11(192.168.10.11:3306)
    275 Selected 192.168.10.12(192.168.10.12:3306) as a new master.
    276 192.168.10.12(192.168.10.12:3306): OK: Applying all logs succeeded.
    277 192.168.10.12(192.168.10.12:3306): OK: Activated master IP address.
    278 192.168.10.13(192.168.10.13:3306): OK: Slave started, replicating from 192.168.10.12(192.168.10.12:3306)
    279 192.168.10.12(192.168.10.12:3306): Resetting slave info succeeded.
    280 Master failover to 192.168.10.12(192.168.10.12:3306) completed successfully.
    281 
    282  
    View Code

    (5.2)MHA故障切换测试(主节点服务器异常down)
    测试环境:
    主节点:192.168.10.11
    从节点1:192.168.10.12
    从节点2:192.168.10.13


    STEP1:开启MHA Manager

    [root@monitor bin]# nohup masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover &
    [1] 22973
    [root@monitor bin]# nohup: ignoring input and appending output to ‘nohup.out’

    STEP2:在主节点node1关闭服务器

    [root@node1 ~]# shutdown -h 0

    STEP3:查看数据库是否发生了主服务器故障转移
    查看节点2,发现该节点的从库信息已经消除,目前已经为主库

    [root@node2 ~]# mysql -uroot -p123456
    mysql> show slave status G
    
    mysql> show master status G
    *************************** 1. row ***************************
    File: node2-bin.000001
    Position: 154
    Binlog_Do_DB:
    Binlog_Ignore_DB:
    Executed_Gtid_Set: 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    30753d6b-8790-11eb-864f-000c2999ad6c:1-2

    查看节点3,发现其主节点信息已自动变更为节点2

    [root@node3 ~]# mysql -uroot -p123456
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.12
                      Master_User: replica
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: node2-bin.000001
              Read_Master_Log_Pos: 154
                   Relay_Log_File: node3-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: node2-bin.000001
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    STEP4:查看虚拟IP是否已经发生了漂移,确认已经漂移

    [root@node2 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.51 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::b39e:c76c:b3dc:4d74 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:99:ad:6c txqueuelen 1000 (Ethernet)
    RX packets 87548 bytes 7624146 (7.2 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 16 bytes 1584 (1.5 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.12 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fe99:ad76 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:99:ad:76 txqueuelen 1000 (Ethernet)
    RX packets 225060 bytes 328243958 (313.0 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 12997 bytes 1013311 (989.5 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
    ether 00:0c:29:99:ad:76 txqueuelen 1000 (Ethernet)

    整个切换的日志信息如下:

      1 [root@monitor ~]# tail -100f /mha/mha4mysql-manager-master/app1/log/manager.log
      2 
      3 Tue Mar 30 16:25:15 2021 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 432.
      4 Tue Mar 30 16:25:15 2021 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s node2 -s node3 --user=root --master_host=node1 --master_ip=192.168.10.11 --master_port=3306 --user=root --master_host=192.168.10.11 --master_ip=192.168.10.11 --master_port=3306 --master_user=mhaadmin --master_password=mhaadmin --ping_type=SELECT
      5 Tue Mar 30 16:25:15 2021 - [info] Executing SSH check script: exit 0
      6 Tue Mar 30 16:25:20 2021 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.10.11! at /usr/local/share/perl5/MHA/HealthCheck.pm line 343.
      7 Tue Mar 30 16:25:20 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (4))
      8 Tue Mar 30 16:25:20 2021 - [warning] Connection failed 2 time(s)..
      9 Monitoring server node2 is reachable, Master is not reachable from node2. OK.
     10 Tue Mar 30 16:25:25 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (4))
     11 Tue Mar 30 16:25:25 2021 - [warning] Connection failed 3 time(s)..
     12 Monitoring server node3 is reachable, Master is not reachable from node3. OK.
     13 Tue Mar 30 16:25:25 2021 - [info] Master is not reachable from all other monitoring servers. Failover should start.
     14 Tue Mar 30 16:25:30 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.11' (4))
     15 Tue Mar 30 16:25:30 2021 - [warning] Connection failed 4 time(s)..
     16 Tue Mar 30 16:25:30 2021 - [warning] Master is not reachable from health checker!
     17 Tue Mar 30 16:25:30 2021 - [warning] Master 192.168.10.11(192.168.10.11:3306) is not reachable!
     18 Tue Mar 30 16:25:30 2021 - [warning] SSH is NOT reachable.
     19 Tue Mar 30 16:25:30 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
     20 Tue Mar 30 16:25:30 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
     21 Tue Mar 30 16:25:30 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
     22 Tue Mar 30 16:25:30 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
     23 Tue Mar 30 16:25:31 2021 - [info] GTID failover mode = 1
     24 Tue Mar 30 16:25:31 2021 - [info] Dead Servers:
     25 Tue Mar 30 16:25:31 2021 - [info] 192.168.10.11(192.168.10.11:3306)
     26 Tue Mar 30 16:25:31 2021 - [info] Alive Servers:
     27 Tue Mar 30 16:25:31 2021 - [info] 192.168.10.12(192.168.10.12:3306)
     28 Tue Mar 30 16:25:31 2021 - [info] 192.168.10.13(192.168.10.13:3306)
     29 Tue Mar 30 16:25:31 2021 - [info] Alive Slaves:
     30 Tue Mar 30 16:25:31 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     31 Tue Mar 30 16:25:31 2021 - [info] GTID ON
     32 Tue Mar 30 16:25:31 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     33 Tue Mar 30 16:25:31 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     34 Tue Mar 30 16:25:31 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     35 Tue Mar 30 16:25:31 2021 - [info] GTID ON
     36 Tue Mar 30 16:25:31 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     37 Tue Mar 30 16:25:31 2021 - [info] Checking slave configurations..
     38 Tue Mar 30 16:25:31 2021 - [info] Checking replication filtering settings..
     39 Tue Mar 30 16:25:31 2021 - [info] Replication filtering check ok.
     40 Tue Mar 30 16:25:31 2021 - [info] Master is down!
     41 Tue Mar 30 16:25:31 2021 - [info] Terminating monitoring script.
     42 Tue Mar 30 16:25:31 2021 - [info] Got exit code 20 (Master dead).
     43 Tue Mar 30 16:25:31 2021 - [info] MHA::MasterFailover version 0.58.
     44 Tue Mar 30 16:25:31 2021 - [info] Starting master failover.
     45 Tue Mar 30 16:25:31 2021 - [info]
     46 Tue Mar 30 16:25:31 2021 - [info] * Phase 1: Configuration Check Phase..
     47 Tue Mar 30 16:25:31 2021 - [info]
     48 Tue Mar 30 16:25:32 2021 - [info] GTID failover mode = 1
     49 Tue Mar 30 16:25:32 2021 - [info] Dead Servers:
     50 Tue Mar 30 16:25:32 2021 - [info] 192.168.10.11(192.168.10.11:3306)
     51 Tue Mar 30 16:25:32 2021 - [info] Checking master reachability via MySQL(double check)...
     52 Tue Mar 30 16:25:33 2021 - [info] ok.
     53 Tue Mar 30 16:25:33 2021 - [info] Alive Servers:
     54 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.12(192.168.10.12:3306)
     55 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.13(192.168.10.13:3306)
     56 Tue Mar 30 16:25:33 2021 - [info] Alive Slaves:
     57 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     58 Tue Mar 30 16:25:33 2021 - [info] GTID ON
     59 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     60 Tue Mar 30 16:25:33 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     61 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     62 Tue Mar 30 16:25:33 2021 - [info] GTID ON
     63 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     64 Tue Mar 30 16:25:33 2021 - [info] Starting GTID based failover.
     65 Tue Mar 30 16:25:33 2021 - [info]
     66 Tue Mar 30 16:25:33 2021 - [info] ** Phase 1: Configuration Check Phase completed.
     67 Tue Mar 30 16:25:33 2021 - [info]
     68 Tue Mar 30 16:25:33 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
     69 Tue Mar 30 16:25:33 2021 - [info]
     70 Tue Mar 30 16:25:33 2021 - [info] Forcing shutdown so that applications never connect to the current master..
     71 Tue Mar 30 16:25:33 2021 - [info] Executing master IP deactivation script:
     72 Tue Mar 30 16:25:33 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --command=stop
     73 
     74 
     75 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
     76 
     77 Disabling the VIP on old master: 192.168.10.11
     78 Tue Mar 30 16:25:33 2021 - [info] done.
     79 Tue Mar 30 16:25:33 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
     80 Tue Mar 30 16:25:33 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
     81 Tue Mar 30 16:25:33 2021 - [info]
     82 Tue Mar 30 16:25:33 2021 - [info] * Phase 3: Master Recovery Phase..
     83 Tue Mar 30 16:25:33 2021 - [info]
     84 Tue Mar 30 16:25:33 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
     85 Tue Mar 30 16:25:33 2021 - [info]
     86 Tue Mar 30 16:25:33 2021 - [info] The latest binary log file/position on all slaves is node1-bin.000001:154
     87 Tue Mar 30 16:25:33 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
     88 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     89 Tue Mar 30 16:25:33 2021 - [info] GTID ON
     90 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     91 Tue Mar 30 16:25:33 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     92 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     93 Tue Mar 30 16:25:33 2021 - [info] GTID ON
     94 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     95 Tue Mar 30 16:25:33 2021 - [info] The oldest binary log file/position on all slaves is node1-bin.000001:154
     96 Tue Mar 30 16:25:33 2021 - [info] Oldest slaves:
     97 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     98 Tue Mar 30 16:25:33 2021 - [info] GTID ON
     99 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    100 Tue Mar 30 16:25:33 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    101 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    102 Tue Mar 30 16:25:33 2021 - [info] GTID ON
    103 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    104 Tue Mar 30 16:25:33 2021 - [info]
    105 Tue Mar 30 16:25:33 2021 - [info] * Phase 3.3: Determining New Master Phase..
    106 Tue Mar 30 16:25:33 2021 - [info]
    107 Tue Mar 30 16:25:33 2021 - [info] Searching new master from slaves..
    108 Tue Mar 30 16:25:33 2021 - [info] Candidate masters from the configuration file:
    109 Tue Mar 30 16:25:33 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    110 Tue Mar 30 16:25:33 2021 - [info] GTID ON
    111 Tue Mar 30 16:25:33 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    112 Tue Mar 30 16:25:33 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    113 Tue Mar 30 16:25:33 2021 - [info] Non-candidate masters:
    114 Tue Mar 30 16:25:33 2021 - [info] Searching from candidate_master slaves which have received the latest relay log events..
    115 Tue Mar 30 16:25:33 2021 - [info] New master is 192.168.10.12(192.168.10.12:3306)
    116 Tue Mar 30 16:25:33 2021 - [info] Starting master failover..
    117 Tue Mar 30 16:25:33 2021 - [info]
    118 From:
    119 192.168.10.11(192.168.10.11:3306) (current master)
    120 +--192.168.10.12(192.168.10.12:3306)
    121 +--192.168.10.13(192.168.10.13:3306)
    122 
    123 To:
    124 192.168.10.12(192.168.10.12:3306) (new master)
    125 +--192.168.10.13(192.168.10.13:3306)
    126 Tue Mar 30 16:25:33 2021 - [info]
    127 Tue Mar 30 16:25:33 2021 - [info] * Phase 3.3: New Master Recovery Phase..
    128 Tue Mar 30 16:25:33 2021 - [info]
    129 Tue Mar 30 16:25:33 2021 - [info] Waiting all logs to be applied..
    130 Tue Mar 30 16:25:33 2021 - [info] done.
    131 Tue Mar 30 16:25:33 2021 - [info] Getting new master's binlog name and position..
    132 Tue Mar 30 16:25:33 2021 - [info] node2-bin.000001:154
    133 Tue Mar 30 16:25:33 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replica', MASTER_PASSWORD='xxx';
    134 Tue Mar 30 16:25:33 2021 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: node2-bin.000001, 154, 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    135 30753d6b-8790-11eb-864f-000c2999ad6c:1-2
    136 Tue Mar 30 16:25:33 2021 - [info] Executing master IP activate script:
    137 Tue Mar 30 16:25:33 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --new_master_host=192.168.10.12 --new_master_ip=192.168.10.12 --new_master_port=3306 --new_master_user='mhaadmin' --new_master_password=xxx
    138 
    139 
    140 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    141 
    142 Enabling the VIP - 192.168.10.10/24 on the new master - 192.168.10.12
    143 Tue Mar 30 16:25:33 2021 - [info] OK.
    144 Tue Mar 30 16:25:33 2021 - [info] Setting read_only=0 on 192.168.10.12(192.168.10.12:3306)..
    145 Tue Mar 30 16:25:33 2021 - [info] ok.
    146 Tue Mar 30 16:25:33 2021 - [info] ** Finished master recovery successfully.
    147 Tue Mar 30 16:25:33 2021 - [info] * Phase 3: Master Recovery Phase completed.
    148 Tue Mar 30 16:25:33 2021 - [info]
    149 Tue Mar 30 16:25:33 2021 - [info] * Phase 4: Slaves Recovery Phase..
    150 Tue Mar 30 16:25:33 2021 - [info]
    151 Tue Mar 30 16:25:33 2021 - [info]
    152 Tue Mar 30 16:25:33 2021 - [info] * Phase 4.1: Starting Slaves in parallel..
    153 Tue Mar 30 16:25:33 2021 - [info]
    154 Tue Mar 30 16:25:33 2021 - [info] -- Slave recovery on host 192.168.10.13(192.168.10.13:3306) started, pid: 23239. Check tmp log /mha/mha4mysql-manager-master/app1/192.168.10.13_3306_20210330162531.log if it takes time..
    155 Tue Mar 30 16:25:35 2021 - [info]
    156 Tue Mar 30 16:25:35 2021 - [info] Log messages from 192.168.10.13 ...
    157 Tue Mar 30 16:25:35 2021 - [info]
    158 Tue Mar 30 16:25:33 2021 - [info] Resetting slave 192.168.10.13(192.168.10.13:3306) and starting replication from the new master 192.168.10.12(192.168.10.12:3306)..
    159 Tue Mar 30 16:25:33 2021 - [info] Executed CHANGE MASTER.
    160 Tue Mar 30 16:25:34 2021 - [info] Slave started.
    161 Tue Mar 30 16:25:34 2021 - [info] gtid_wait(2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    162 30753d6b-8790-11eb-864f-000c2999ad6c:1-2) completed on 192.168.10.13(192.168.10.13:3306). Executed 0 events.
    163 Tue Mar 30 16:25:35 2021 - [info] End of log messages from 192.168.10.13.
    164 Tue Mar 30 16:25:35 2021 - [info] -- Slave on host 192.168.10.13(192.168.10.13:3306) started.
    165 Tue Mar 30 16:25:35 2021 - [info] All new slave servers recovered successfully.
    166 Tue Mar 30 16:25:35 2021 - [info]
    167 Tue Mar 30 16:25:35 2021 - [info] * Phase 5: New master cleanup phase..
    168 Tue Mar 30 16:25:35 2021 - [info]
    169 Tue Mar 30 16:25:35 2021 - [info] Resetting slave info on the new master..
    170 Tue Mar 30 16:25:35 2021 - [info] 192.168.10.12: Resetting slave info succeeded.
    171 Tue Mar 30 16:25:35 2021 - [info] Master failover to 192.168.10.12(192.168.10.12:3306) completed successfully.
    172 Tue Mar 30 16:25:35 2021 - [info]
    173 
    174 ----- Failover Report -----
    175 
    176 app1: MySQL Master failover 192.168.10.11(192.168.10.11:3306) to 192.168.10.12(192.168.10.12:3306) succeeded
    177 
    178 Master 192.168.10.11(192.168.10.11:3306) is down!
    179 
    180 Check MHA Manager logs at monitor:/mha/mha4mysql-manager-master/app1/log/manager.log for details.
    181 
    182 Started automated(non-interactive) failover.
    183 Invalidated master IP address on 192.168.10.11(192.168.10.11:3306)
    184 Selected 192.168.10.12(192.168.10.12:3306) as a new master.
    185 192.168.10.12(192.168.10.12:3306): OK: Applying all logs succeeded.
    186 192.168.10.12(192.168.10.12:3306): OK: Activated master IP address.
    187 192.168.10.13(192.168.10.13:3306): OK: Slave started, replicating from 192.168.10.12(192.168.10.12:3306)
    188 192.168.10.12(192.168.10.12:3306): Resetting slave info succeeded.
    189 Master failover to 192.168.10.12(192.168.10.12:3306) completed successfully.
    View Code

    (5.3)MHA手动故障切换测试
    测试环境:
    主节点 :192.168.10.11
    从节点1:192.168.10.12
    从节点2:192.168.10.13

    STEP1:手动故障切换操作需要在mha manager停止的情况下操作

    [root@monitor ~]# masterha_check_status --conf=/etc/mha/app1.cnf
    app1 is stopped(2:NOT_RUNNING).

    STEP2:在主节点node1关闭MySQL服务

    [root@node1 ~]# service mysqld stop

    STEP3:手动执行故障切换,默认切换到192.168.10.12,因为这台机器配置了candidate_master=1参数。我们这里主节点强制切到192.168.10.13

    masterha_master_switch --master_state=dead --conf=/etc/mha/app1.cnf --dead_master_host=192.168.10.11 --new_master_host=192.168.10.13 --ignore_last_failover

    STEP4:查看节点2,发现其主节点信息已自动变更为节点3

    [root@node2 ~]# mysql -uroot -p123456
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.13
                      Master_User: replica
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: node3-bin.000006
              Read_Master_Log_Pos: 194
                   Relay_Log_File: node2-relay-bin.000009
                    Relay_Log_Pos: 407
            Relay_Master_Log_File: node3-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    STEP5:查看虚拟IP是否已经发生了漂移,确认已经漂移

    [root@node3 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.85 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::8e05:c8ea:5953:4213 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    inet6 fe80::b39e:c76c:b3dc:4d74 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:25:bd:bb txqueuelen 1000 (Ethernet)
    RX packets 140091 bytes 12147844 (11.5 MiB)
    RX errors 0 dropped 1 overruns 0 frame 0
    TX packets 455 bytes 30047 (29.3 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.13 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fe25:bdc5 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:25:bd:c5 txqueuelen 1000 (Ethernet)
    RX packets 6423 bytes 600863 (586.7 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 5034 bytes 1318734 (1.2 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
    ether 00:0c:29:25:bd:c5 txqueuelen 1000 (Ethernet)

    手动故障切换相关日志:

      1 [root@monitor ~]# masterha_master_switch --master_state=dead --conf=/etc/mha/app1.cnf --dead_master_host=192.168.10.11 --new_master_host=192.168.10.13 --ignore_last_failover
      2 --dead_master_ip=<dead_master_ip> is not set. Using 192.168.10.11.
      3 --dead_master_port=<dead_master_port> is not set. Using 3306.
      4 Tue Mar 30 17:07:51 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
      5 Tue Mar 30 17:07:51 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
      6 Tue Mar 30 17:07:51 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
      7 Tue Mar 30 17:07:51 2021 - [info] MHA::MasterFailover version 0.58.
      8 Tue Mar 30 17:07:51 2021 - [info] Starting master failover.
      9 Tue Mar 30 17:07:51 2021 - [info]
     10 Tue Mar 30 17:07:51 2021 - [info] * Phase 1: Configuration Check Phase..
     11 Tue Mar 30 17:07:51 2021 - [info]
     12 Tue Mar 30 17:07:52 2021 - [info] GTID failover mode = 1
     13 Tue Mar 30 17:07:52 2021 - [info] Dead Servers:
     14 Tue Mar 30 17:07:52 2021 - [info] 192.168.10.11(192.168.10.11:3306)
     15 Tue Mar 30 17:07:52 2021 - [info] Checking master reachability via MySQL(double check)...
     16 Tue Mar 30 17:07:52 2021 - [info] ok.
     17 Tue Mar 30 17:07:52 2021 - [info] Alive Servers:
     18 Tue Mar 30 17:07:52 2021 - [info] 192.168.10.12(192.168.10.12:3306)
     19 Tue Mar 30 17:07:52 2021 - [info] 192.168.10.13(192.168.10.13:3306)
     20 Tue Mar 30 17:07:52 2021 - [info] Alive Slaves:
     21 Tue Mar 30 17:07:52 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     22 Tue Mar 30 17:07:52 2021 - [info] GTID ON
     23 Tue Mar 30 17:07:52 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     24 Tue Mar 30 17:07:52 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     25 Tue Mar 30 17:07:52 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     26 Tue Mar 30 17:07:52 2021 - [info] GTID ON
     27 Tue Mar 30 17:07:52 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     28 Master 192.168.10.11(192.168.10.11:3306) is dead. Proceed? (yes/NO): yes
     29 Tue Mar 30 17:07:54 2021 - [info] Starting GTID based failover.
     30 Tue Mar 30 17:07:54 2021 - [info]
     31 Tue Mar 30 17:07:54 2021 - [info] ** Phase 1: Configuration Check Phase completed.
     32 Tue Mar 30 17:07:54 2021 - [info]
     33 Tue Mar 30 17:07:54 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
     34 Tue Mar 30 17:07:54 2021 - [info]
     35 Tue Mar 30 17:07:54 2021 - [info] HealthCheck: SSH to 192.168.10.11 is reachable.
     36 Tue Mar 30 17:07:54 2021 - [info] Forcing shutdown so that applications never connect to the current master..
     37 Tue Mar 30 17:07:54 2021 - [info] Executing master IP deactivation script:
     38 Tue Mar 30 17:07:54 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --command=stopssh --ssh_user=root
     39 
     40 
     41 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
     42 
     43 Disabling the VIP on old master: 192.168.10.11
     44 SIOCSIFFLAGS: Cannot assign requested address
     45 Tue Mar 30 17:07:55 2021 - [info] done.
     46 Tue Mar 30 17:07:55 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
     47 Tue Mar 30 17:07:55 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
     48 Tue Mar 30 17:07:55 2021 - [info]
     49 Tue Mar 30 17:07:55 2021 - [info] * Phase 3: Master Recovery Phase..
     50 Tue Mar 30 17:07:55 2021 - [info]
     51 Tue Mar 30 17:07:55 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
     52 Tue Mar 30 17:07:55 2021 - [info]
     53 Tue Mar 30 17:07:55 2021 - [info] The latest binary log file/position on all slaves is node1-bin.000004:154
     54 Tue Mar 30 17:07:55 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
     55 Tue Mar 30 17:07:55 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     56 Tue Mar 30 17:07:55 2021 - [info] GTID ON
     57 Tue Mar 30 17:07:55 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     58 Tue Mar 30 17:07:55 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     59 Tue Mar 30 17:07:55 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     60 Tue Mar 30 17:07:55 2021 - [info] GTID ON
     61 Tue Mar 30 17:07:55 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     62 Tue Mar 30 17:07:55 2021 - [info] The oldest binary log file/position on all slaves is node1-bin.000004:154
     63 Tue Mar 30 17:07:55 2021 - [info] Oldest slaves:
     64 Tue Mar 30 17:07:55 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     65 Tue Mar 30 17:07:55 2021 - [info] GTID ON
     66 Tue Mar 30 17:07:55 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     67 Tue Mar 30 17:07:55 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     68 Tue Mar 30 17:07:55 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     69 Tue Mar 30 17:07:55 2021 - [info] GTID ON
     70 Tue Mar 30 17:07:55 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     71 Tue Mar 30 17:07:55 2021 - [info]
     72 Tue Mar 30 17:07:55 2021 - [info] * Phase 3.3: Determining New Master Phase..
     73 Tue Mar 30 17:07:55 2021 - [info]
     74 Tue Mar 30 17:07:55 2021 - [info] 192.168.10.13 can be new master.
     75 Tue Mar 30 17:07:55 2021 - [info] New master is 192.168.10.13(192.168.10.13:3306)
     76 Tue Mar 30 17:07:55 2021 - [info] Starting master failover..
     77 Tue Mar 30 17:07:55 2021 - [info]
     78 From:
     79 192.168.10.11(192.168.10.11:3306) (current master)
     80 +--192.168.10.12(192.168.10.12:3306)
     81 +--192.168.10.13(192.168.10.13:3306)
     82 
     83 To:
     84 192.168.10.13(192.168.10.13:3306) (new master)
     85 +--192.168.10.12(192.168.10.12:3306)
     86 
     87 Starting master switch from 192.168.10.11(192.168.10.11:3306) to 192.168.10.13(192.168.10.13:3306)? (yes/NO): yes
     88 Tue Mar 30 17:07:56 2021 - [info] New master decided manually is 192.168.10.13(192.168.10.13:3306)
     89 Tue Mar 30 17:07:56 2021 - [info]
     90 Tue Mar 30 17:07:56 2021 - [info] * Phase 3.3: New Master Recovery Phase..
     91 Tue Mar 30 17:07:56 2021 - [info]
     92 Tue Mar 30 17:07:56 2021 - [info] Waiting all logs to be applied..
     93 Tue Mar 30 17:07:56 2021 - [info] done.
     94 Tue Mar 30 17:07:56 2021 - [info] Replicating from the latest slave 192.168.10.12(192.168.10.12:3306) and waiting to apply..
     95 Tue Mar 30 17:07:56 2021 - [info] Waiting all logs to be applied on the latest slave..
     96 Tue Mar 30 17:07:56 2021 - [info] Resetting slave 192.168.10.13(192.168.10.13:3306) and starting replication from the new master 192.168.10.12(192.168.10.12:3306)..
     97 Tue Mar 30 17:07:56 2021 - [info] Executed CHANGE MASTER.
     98 Tue Mar 30 17:07:57 2021 - [info] Slave started.
     99 Tue Mar 30 17:07:57 2021 - [info] Waiting to execute all relay logs on 192.168.10.13(192.168.10.13:3306)..
    100 Tue Mar 30 17:07:57 2021 - [info] master_pos_wait(node2-bin.000001:154) completed on 192.168.10.13(192.168.10.13:3306). Executed 0 events.
    101 Tue Mar 30 17:07:57 2021 - [info] done.
    102 Tue Mar 30 17:07:57 2021 - [info] done.
    103 Tue Mar 30 17:07:57 2021 - [info] Getting new master's binlog name and position..
    104 Tue Mar 30 17:07:57 2021 - [info] node3-bin.000006:194
    105 Tue Mar 30 17:07:57 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.13', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replica', MASTER_PASSWORD='xxx';
    106 Tue Mar 30 17:07:57 2021 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: node3-bin.000006, 194, 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    107 30753d6b-8790-11eb-864f-000c2999ad6c:1-2,
    108 32a16250-8790-11eb-b587-000c2925bdbb:1-2
    109 Tue Mar 30 17:07:57 2021 - [info] Executing master IP activate script:
    110 Tue Mar 30 17:07:57 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306 --new_master_host=192.168.10.13 --new_master_ip=192.168.10.13 --new_master_port=3306 --new_master_user='mhaadmin' --new_master_password=xxx
    111 
    112 
    113 IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    114 
    115 Enabling the VIP - 192.168.10.10/24 on the new master - 192.168.10.13
    116 Tue Mar 30 17:07:57 2021 - [info] OK.
    117 Tue Mar 30 17:07:57 2021 - [info] Setting read_only=0 on 192.168.10.13(192.168.10.13:3306)..
    118 Tue Mar 30 17:07:57 2021 - [info] ok.
    119 Tue Mar 30 17:07:57 2021 - [info] ** Finished master recovery successfully.
    120 Tue Mar 30 17:07:57 2021 - [info] * Phase 3: Master Recovery Phase completed.
    121 Tue Mar 30 17:07:57 2021 - [info]
    122 Tue Mar 30 17:07:57 2021 - [info] * Phase 4: Slaves Recovery Phase..
    123 Tue Mar 30 17:07:57 2021 - [info]
    124 Tue Mar 30 17:07:57 2021 - [info]
    125 Tue Mar 30 17:07:57 2021 - [info] * Phase 4.1: Starting Slaves in parallel..
    126 Tue Mar 30 17:07:57 2021 - [info]
    127 Tue Mar 30 17:07:57 2021 - [info] -- Slave recovery on host 192.168.10.12(192.168.10.12:3306) started, pid: 25496. Check tmp log /mha/mha4mysql-manager-master/app1/192.168.10.12_3306_20210330170751.log if it takes time..
    128 Tue Mar 30 17:07:59 2021 - [info]
    129 Tue Mar 30 17:07:59 2021 - [info] Log messages from 192.168.10.12 ...
    130 Tue Mar 30 17:07:59 2021 - [info]
    131 Tue Mar 30 17:07:57 2021 - [info] Resetting slave 192.168.10.12(192.168.10.12:3306) and starting replication from the new master 192.168.10.13(192.168.10.13:3306)..
    132 Tue Mar 30 17:07:57 2021 - [info] Executed CHANGE MASTER.
    133 Tue Mar 30 17:07:58 2021 - [info] Slave started.
    134 Tue Mar 30 17:07:58 2021 - [info] gtid_wait(2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    135 30753d6b-8790-11eb-864f-000c2999ad6c:1-2,
    136 32a16250-8790-11eb-b587-000c2925bdbb:1-2) completed on 192.168.10.12(192.168.10.12:3306). Executed 0 events.
    137 Tue Mar 30 17:07:59 2021 - [info] End of log messages from 192.168.10.12.
    138 Tue Mar 30 17:07:59 2021 - [info] -- Slave on host 192.168.10.12(192.168.10.12:3306) started.
    139 Tue Mar 30 17:07:59 2021 - [info] All new slave servers recovered successfully.
    140 Tue Mar 30 17:07:59 2021 - [info]
    141 Tue Mar 30 17:07:59 2021 - [info] * Phase 5: New master cleanup phase..
    142 Tue Mar 30 17:07:59 2021 - [info]
    143 Tue Mar 30 17:07:59 2021 - [info] Resetting slave info on the new master..
    144 Tue Mar 30 17:07:59 2021 - [info] 192.168.10.13: Resetting slave info succeeded.
    145 Tue Mar 30 17:07:59 2021 - [info] Master failover to 192.168.10.13(192.168.10.13:3306) completed successfully.
    146 Tue Mar 30 17:07:59 2021 - [info]
    147 
    148 ----- Failover Report -----
    149 
    150 app1: MySQL Master failover 192.168.10.11(192.168.10.11:3306) to 192.168.10.13(192.168.10.13:3306) succeeded
    151 
    152 Master 192.168.10.11(192.168.10.11:3306) is down!
    153 
    154 Check MHA Manager logs at monitor for details.
    155 
    156 Started manual(interactive) failover.
    157 Invalidated master IP address on 192.168.10.11(192.168.10.11:3306)
    158 Selected 192.168.10.13(192.168.10.13:3306) as a new master.
    159 192.168.10.13(192.168.10.13:3306): OK: Applying all logs succeeded.
    160 192.168.10.13(192.168.10.13:3306): OK: Activated master IP address.
    161 192.168.10.12(192.168.10.12:3306): OK: Slave started, replicating from 192.168.10.13(192.168.10.13:3306)
    162 192.168.10.13(192.168.10.13:3306): Resetting slave info succeeded.
    163 Master failover to 192.168.10.13(192.168.10.13:3306) completed successfully.
    164 
    165  
    View Code

    (5.4)MHA手动在线主从切换测试
    测试环境:
    主节点 :192.168.10.11
    从节点1:192.168.10.12
    从节点2:192.168.10.13

    STEP1:手动故障切换操作需要在mha manager停止的情况下操作

    [root@monitor ~]# masterha_check_status --conf=/etc/mha/app1.cnf
    app1 is stopped(2:NOT_RUNNING).

    STEP2:手动执行在线主从切换,这里我们将主节点强制切到192.168.10.13,并将之前的主节点192.168.10.11转换为备节点

    masterha_master_switch --master_state=alive --conf=/etc/mha/app1.cnf --new_master_host=192.168.10.13 --orig_master_is_new_slave

    STEP3:查看节点1,发现其主节点信息已自动变更为节点3

    [root@node1 ~]# mysql -uroot -p123456
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.13
                      Master_User: replica
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: node3-bin.000007
              Read_Master_Log_Pos: 194
                   Relay_Log_File: node1-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: node3-bin.000007
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    虚拟IP也漂移到了节点3

    [root@node3 ~]# ifconfig
    ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 10.1.45.85 netmask 255.255.255.0 broadcast 10.1.45.255
    inet6 fe80::953b:2262:6137:ad20 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:25:bd:bb txqueuelen 1000 (Ethernet)
    RX packets 32135 bytes 2922975 (2.7 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 79 bytes 5526 (5.3 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.13 netmask 255.255.255.0 broadcast 192.168.10.255
    inet6 fe80::20c:29ff:fe25:bdc5 prefixlen 64 scopeid 0x20<link>
    ether 00:0c:29:25:bd:c5 txqueuelen 1000 (Ethernet)
    RX packets 1851 bytes 181301 (177.0 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1496 bytes 575947 (562.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    
    ens34:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255
    ether 00:0c:29:25:bd:c5 txqueuelen 1000 (Ethernet)

    手动在线切换操作日志:

      1 [root@monitor ~]# masterha_master_switch --master_state=alive --conf=/etc/mha/app1.cnf --new_master_host=192.168.10.13 --orig_master_is_new_slave
      2 Wed Mar 31 10:10:42 2021 - [info] MHA::MasterRotate version 0.58.
      3 Wed Mar 31 10:10:42 2021 - [info] Starting online master switch..
      4 Wed Mar 31 10:10:42 2021 - [info]
      5 Wed Mar 31 10:10:42 2021 - [info] * Phase 1: Configuration Check Phase..
      6 Wed Mar 31 10:10:42 2021 - [info]
      7 Wed Mar 31 10:10:42 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
      8 Wed Mar 31 10:10:42 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
      9 Wed Mar 31 10:10:42 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
     10 Wed Mar 31 10:10:43 2021 - [info] GTID failover mode = 1
     11 Wed Mar 31 10:10:43 2021 - [info] Current Alive Master: 192.168.10.11(192.168.10.11:3306)
     12 Wed Mar 31 10:10:43 2021 - [info] Alive Slaves:
     13 Wed Mar 31 10:10:43 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     14 Wed Mar 31 10:10:43 2021 - [info] GTID ON
     15 Wed Mar 31 10:10:43 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     16 Wed Mar 31 10:10:43 2021 - [info] Primary candidate for the new Master (candidate_master is set)
     17 Wed Mar 31 10:10:43 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
     18 Wed Mar 31 10:10:43 2021 - [info] GTID ON
     19 Wed Mar 31 10:10:43 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
     20 
     21 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.10.11(192.168.10.11:3306)? (Y
     22 Wed Mar 31 10:10:44 2021 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
     23 Wed Mar 31 10:10:44 2021 - [info] ok.
     24 Wed Mar 31 10:10:44 2021 - [info] Checking MHA is not monitoring or doing failover..
     25 Wed Mar 31 10:10:44 2021 - [info] Checking replication health on 192.168.10.12..
     26 Wed Mar 31 10:10:44 2021 - [info] ok.
     27 Wed Mar 31 10:10:44 2021 - [info] Checking replication health on 192.168.10.13..
     28 Wed Mar 31 10:10:44 2021 - [info] ok.
     29 Wed Mar 31 10:10:44 2021 - [info] 192.168.10.13 can be new master.
     30 Wed Mar 31 10:10:44 2021 - [info]
     31 From:
     32 192.168.10.11(192.168.10.11:3306) (current master)
     33 +--192.168.10.12(192.168.10.12:3306)
     34 +--192.168.10.13(192.168.10.13:3306)
     35 
     36 To:
     37 192.168.10.13(192.168.10.13:3306) (new master)
     38 +--192.168.10.12(192.168.10.12:3306)
     39 +--192.168.10.11(192.168.10.11:3306)
     40 
     41 Starting master switch from 192.168.10.11(192.168.10.11:3306) to 192.168.10.13(192.168.10.13:3306)? (yes/NO): yes
     42 Wed Mar 31 10:10:46 2021 - [info] Checking whether 192.168.10.13(192.168.10.13:3306) is ok for the new master..
     43 Wed Mar 31 10:10:46 2021 - [info] ok.
     44 Wed Mar 31 10:10:46 2021 - [info] 192.168.10.11(192.168.10.11:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules,
     45 Wed Mar 31 10:10:46 2021 - [info] 192.168.10.11(192.168.10.11:3306): Resetting slave pointing to the dummy host.
     46 Wed Mar 31 10:10:46 2021 - [info] ** Phase 1: Configuration Check Phase completed.
     47 Wed Mar 31 10:10:46 2021 - [info]
     48 Wed Mar 31 10:10:46 2021 - [info] * Phase 2: Rejecting updates Phase..
     49 Wed Mar 31 10:10:46 2021 - [info]
     50 Wed Mar 31 10:10:46 2021 - [info] Executing master ip online change script to disable write on the current master:
     51 Wed Mar 31 10:10:46 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_online_change --command=stop --orig_master_host=192.168.10.11 --origr_ip=192.168.10.13 --new_master_port=3306 --new_master_user='mhaadmin' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new
     52 Wed Mar 31 10:10:46 2021 346151 Set read_only on the new master.. ok.
     53 Wed Mar 31 10:10:46 2021 351134 Drpping app user on the orig master..
     54 Wed Mar 31 10:10:46 2021 351874 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds)
     55 {'Time' => '114','db' => undef,'Id' => '48','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     56 {'Time' => '112','db' => undef,'Id' => '49','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     57 Wed Mar 31 10:10:46 2021 854267 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds)
     58 {'Time' => '114','db' => undef,'Id' => '48','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     59 {'Time' => '112','db' => undef,'Id' => '49','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     60 Wed Mar 31 10:10:47 2021 358649 Waiting all running 2 threads are disconnected.. (max 500 milliseconds)
     61 {'Time' => '115','db' => undef,'Id' => '48','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     62 {'Time' => '113','db' => undef,'Id' => '49','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     63 Wed Mar 31 10:10:47 2021 860488 Set read_only=1 on the orig master.. ok.
     64 Wed Mar 31 10:10:47 2021 862857 Waiting all running 2 queries are disconnected.. (max 500 milliseconds)
     65 {'Time' => '115','db' => undef,'Id' => '48','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     66 {'Time' => '113','db' => undef,'Id' => '49','User' => 'replica','State' => 'Master has sent all binlog to slave; waiting for more updates','Command'
     67 Wed Mar 31 10:10:48 2021 364167 Killing all application threads..
     68 Wed Mar 31 10:10:48 2021 365951 done.
     69 Disabling the VIP on old master: 192.168.10.11
     70 Wed Mar 31 10:10:48 2021 - [info] ok.
     71 Wed Mar 31 10:10:48 2021 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
     72 Wed Mar 31 10:10:48 2021 - [info] Executing FLUSH TABLES WITH READ LOCK..
     73 Wed Mar 31 10:10:48 2021 - [info] ok.
     74 Wed Mar 31 10:10:48 2021 - [info] Orig master binlog:pos is node1-bin.000006:154.
     75 Wed Mar 31 10:10:48 2021 - [info] Waiting to execute all relay logs on 192.168.10.13(192.168.10.13:3306)..
     76 Wed Mar 31 10:10:48 2021 - [info] master_pos_wait(node1-bin.000006:154) completed on 192.168.10.13(192.168.10.13:3306). Executed 0 events.
     77 Wed Mar 31 10:10:48 2021 - [info] done.
     78 Wed Mar 31 10:10:48 2021 - [info] Getting new master's binlog name and position..
     79 Wed Mar 31 10:10:48 2021 - [info] node3-bin.000007:194
     80 Wed Mar 31 10:10:48 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1
     81 Wed Mar 31 10:10:48 2021 - [info] Executing master ip online change script to allow write on the new master:
     82 Wed Mar 31 10:10:48 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_online_change --command=start --orig_master_host=192.168.10.11 --orier_ip=192.168.10.13 --new_master_port=3306 --new_master_user='mhaadmin' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_ne
     83 Wed Mar 31 10:10:48 2021 741504 Set read_only=0 on the new master.
     84 Enabling the VIP - 192.168.10.10/24 on the new master - 192.168.10.13
     85 Wed Mar 31 10:10:49 2021 - [info] ok.
     86 Wed Mar 31 10:10:49 2021 - [info]
     87 Wed Mar 31 10:10:49 2021 - [info] * Switching slaves in parallel..
     88 Wed Mar 31 10:10:49 2021 - [info]
     89 Wed Mar 31 10:10:49 2021 - [info] -- Slave switch on host 192.168.10.12(192.168.10.12:3306) started, pid: 4943
     90 Wed Mar 31 10:10:49 2021 - [info]
     91 Wed Mar 31 10:10:51 2021 - [info] Log messages from 192.168.10.12 ...
     92 Wed Mar 31 10:10:51 2021 - [info]
     93 Wed Mar 31 10:10:49 2021 - [info] Waiting to execute all relay logs on 192.168.10.12(192.168.10.12:3306)..
     94 Wed Mar 31 10:10:49 2021 - [info] master_pos_wait(node1-bin.000006:154) completed on 192.168.10.12(192.168.10.12:3306). Executed 0 events.
     95 Wed Mar 31 10:10:49 2021 - [info] done.
     96 Wed Mar 31 10:10:49 2021 - [info] Resetting slave 192.168.10.12(192.168.10.12:3306) and starting replication from the new master 192.168.10.13(192.
     97 Wed Mar 31 10:10:49 2021 - [info] Executed CHANGE MASTER.
     98 Wed Mar 31 10:10:50 2021 - [info] Slave started.
     99 Wed Mar 31 10:10:51 2021 - [info] End of log messages from 192.168.10.12 ...
    100 Wed Mar 31 10:10:51 2021 - [info]
    101 Wed Mar 31 10:10:51 2021 - [info] -- Slave switch on host 192.168.10.12(192.168.10.12:3306) succeeded.
    102 Wed Mar 31 10:10:51 2021 - [info] Unlocking all tables on the orig master:
    103 Wed Mar 31 10:10:51 2021 - [info] Executing UNLOCK TABLES..
    104 Wed Mar 31 10:10:51 2021 - [info] ok.
    105 Wed Mar 31 10:10:51 2021 - [info] Starting orig master as a new slave..
    106 Wed Mar 31 10:10:51 2021 - [info] Resetting slave 192.168.10.11(192.168.10.11:3306) and starting replication from the new master 192.168.10.13(192.
    107 Wed Mar 31 10:10:51 2021 - [info] Executed CHANGE MASTER.
    108 Wed Mar 31 10:10:52 2021 - [info] Slave started.
    109 Wed Mar 31 10:10:52 2021 - [info] All new slave servers switched successfully.
    110 Wed Mar 31 10:10:52 2021 - [info]
    111 Wed Mar 31 10:10:52 2021 - [info] * Phase 5: New master cleanup phase..
    112 Wed Mar 31 10:10:52 2021 - [info]
    113 Wed Mar 31 10:10:52 2021 - [info] 192.168.10.13: Resetting slave info succeeded.
    114 Wed Mar 31 10:10:52 2021 - [info] Switching master to 192.168.10.13(192.168.10.13:3306) completed successfully.
    115 [root@monitor ~]#
    View Code

    (六)故障节点恢复
    在执行自动或者手动故障转移之后,原主节点已经被踢出了主从复制集群。那么如何将原主机重新添加到集群中呢?有2种方案可以考虑:
    利用新的主节点备份,对原先的master数据库进行重新还原,然后作为slave加入到集群中;
    原master数据库理论上只是落后于现在的master数据库,只要新主节点binlog完整,可以将原先的master作为slave加入到集群中
    这里对第二种方法进行演示。
    测试环境:
    主节点 :192.168.10.11
    从节点1:192.168.10.12
    从节点2:192.168.10.13
    STEP1:开启mha manager监控

    [root@monitor ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf &
    [1] 5622
    [root@monitor ~]# nohup: ignoring input and appending output to ‘nohup.out’
    
    [root@monitor ~]#
    [root@monitor ~]#
    [root@monitor ~]#
    [root@monitor ~]# masterha_check_status --conf=/etc/mha/app1.cnf
    app1 (pid:5622) is running(0:PING_OK), master:192.168.10.11

    STEP2:模拟自动故障转移,关闭主节点,使得主节点切换到node2上

    # 重启节点1 服务器
    [root@node1 ~]# reboot
    
    # 此时发生了主节点切换,通过查看,可以看到已经切换到了节点2上
    [root@node3 ~]# mysql -uroot -p123456
    Oracle is a registered trademark of Oracle Corporation and/or its
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.12
                      Master_User: replica

    STEP3:node1节点重启之后,该服务器上的MySQL已经是独立存在的,与其它2个节点不是主从关系。

    [root@node1 ~]# service mysqld start
    Starting MySQL. SUCCESS! 
    [root@node1 ~]# mysql -uroot -p123456
    
    mysql> show master status G
    *************************** 1. row ***************************
                 File: node1-bin.000008
             Position: 154
         Binlog_Do_DB: 
     Binlog_Ignore_DB: 
    Executed_Gtid_Set: 2db1f74f-8790-11eb-b668-000c29d1545c:1-94186,
    30753d6b-8790-11eb-864f-000c2999ad6c:1-2,
    32a16250-8790-11eb-b587-000c2925bdbb:1-2
    1 row in set (0.00 sec)
    
    mysql> show slave status G
    Empty set (0.00 sec)

    STEP4:将node1节点转换为从节点
    首先到mha manager日志中查找从节点开启语句,只要发生切换,不管是故障转移还是手动在线切换,在日志中均会生成从节点开启复制日志:

    [root@monitor bin]# tail -500f /mha/mha4mysql-manager-master/app1/log/manager.log |grep "CHANGE MASTER"
    Wed Mar 31 13:24:26 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replica', MASTER_PASSWORD='xxx';
    Wed Mar 31 13:24:26 2021 - [info] Executed CHANGE MASTER.

    在node1执行开启复制语句:

    [root@node1 ~]# mysql -uroot -p123456
    mysql>  CHANGE MASTER TO MASTER_HOST='192.168.10.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replica', MASTER_PASSWORD='replica';
    Query OK, 0 rows affected, 2 warnings (0.01 sec)
    
    mysql> start slave;
    Query OK, 0 rows affected (0.01 sec)
    
    mysql> show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.10.12
                      Master_User: replica
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: node2-bin.000002
              Read_Master_Log_Pos: 154
                   Relay_Log_File: node1-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: node2-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    STEP5:将node1节点转换为主节点,可选
    在step4结束之后,node1已经加入到主从复制环境中,此时主从关系为:
    主节点 :192.168.10.12
    从节点1:192.168.10.11
    从节点2:192.168.10.13

    如果还要考虑将192.168.10.11转换为主节点,则还需执行在线主从切换:

    [root@monitor ~]# masterha_master_switch --master_state=alive --conf=/etc/mha/app1.cnf --new_master_host=192.168.10.11 --orig_master_is_new_slave

    新的关系为:

    [root@monitor ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
    Wed Mar 31 13:41:03 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Wed Mar 31 13:41:03 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
    Wed Mar 31 13:41:03 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
    Wed Mar 31 13:41:03 2021 - [info] MHA::MasterMonitor version 0.58.
    Wed Mar 31 13:41:04 2021 - [info] GTID failover mode = 1
    Wed Mar 31 13:41:04 2021 - [info] Dead Servers:
    Wed Mar 31 13:41:04 2021 - [info] Alive Servers:
    Wed Mar 31 13:41:04 2021 - [info] 192.168.10.11(192.168.10.11:3306)
    Wed Mar 31 13:41:04 2021 - [info] 192.168.10.12(192.168.10.12:3306)
    Wed Mar 31 13:41:04 2021 - [info] 192.168.10.13(192.168.10.13:3306)
    Wed Mar 31 13:41:04 2021 - [info] Alive Slaves:
    Wed Mar 31 13:41:04 2021 - [info] 192.168.10.12(192.168.10.12:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 31 13:41:04 2021 - [info] GTID ON
    Wed Mar 31 13:41:04 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    Wed Mar 31 13:41:04 2021 - [info] Primary candidate for the new Master (candidate_master is set)
    Wed Mar 31 13:41:04 2021 - [info] 192.168.10.13(192.168.10.13:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
    Wed Mar 31 13:41:04 2021 - [info] GTID ON
    Wed Mar 31 13:41:04 2021 - [info] Replicating from 192.168.10.11(192.168.10.11:3306)
    Wed Mar 31 13:41:04 2021 - [info] Current Alive Master: 192.168.10.11(192.168.10.11:3306)
    Wed Mar 31 13:41:04 2021 - [info] Checking slave configurations..
    Wed Mar 31 13:41:04 2021 - [info] Checking replication filtering settings..
    Wed Mar 31 13:41:04 2021 - [info] binlog_do_db= , binlog_ignore_db=
    Wed Mar 31 13:41:04 2021 - [info] Replication filtering check ok.
    Wed Mar 31 13:41:04 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
    Wed Mar 31 13:41:04 2021 - [info] Checking SSH publickey authentication settings on the current master..
    Wed Mar 31 13:41:04 2021 - [info] HealthCheck: SSH to 192.168.10.11 is reachable.
    Wed Mar 31 13:41:04 2021 - [info]
    192.168.10.11(192.168.10.11:3306) (current master)
    +--192.168.10.12(192.168.10.12:3306)
    +--192.168.10.13(192.168.10.13:3306)
    
    Wed Mar 31 13:41:04 2021 - [info] Checking replication health on 192.168.10.12..
    Wed Mar 31 13:41:04 2021 - [info] ok.
    Wed Mar 31 13:41:04 2021 - [info] Checking replication health on 192.168.10.13..
    Wed Mar 31 13:41:04 2021 - [info] ok.
    Wed Mar 31 13:41:04 2021 - [info] Checking master_ip_failover_script status:
    Wed Mar 31 13:41:04 2021 - [info] /mha/mha4mysql-manager-master/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.11 --orig_master_ip=192.168.10.11 --orig_master_port=3306
    
    
    IN SCRIPT TEST====/sbin/ifconfig ens34:1 down==/sbin/ifconfig ens34:1 192.168.10.10/24===
    
    Checking the Status of the script.. OK
    Wed Mar 31 13:41:04 2021 - [info] OK.
    Wed Mar 31 13:41:04 2021 - [warning] shutdown_script is not defined.
    Wed Mar 31 13:41:04 2021 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

    (七)其它
    (7.1)定期清理日志relaylog
    在搭建MHA集群时,要求将所有MySQL节点的自动清除relay log功能关闭,因为在故障切换时,进行数据恢复的时候,可能会需要从节点的relaylog日志。

    mysql> show variables like 'relay_log_purge';
    +-----------------+-------+
    | Variable_name   | Value |
    +-----------------+-------+
    | relay_log_purge | OFF   |
    +-----------------+-------+

    那么从节点的relaylog如何删除呢?mha node提供了一个purge_relay_logs脚本,专用于清理relaylog。

    [root@node3 bin]# which purge_relay_logs
    /usr/local/bin/purge_relay_logs

    工作原理如下:
    1.purge_relay_logs的功能
    2.为relay日志创建硬链接(最小化批量删除大文件导致的性能问题)
    3.SET GLOBAL relay_log_purge=1; FLUSH LOGS; SET GLOBAL relay_log_purge=0;
    4.删除relay log(rm –f /path/to/archive_dir/*)

    使用方法如下:

    [root@node3 bin]# purge_relay_logs --help
    Usage:
    purge_relay_logs --user=root --password=rootpass --host=127.0.0.1
    
    See online reference
    (http://code.google.com/p/mysql-master-ha/wiki/Requirements#purge_relay_
    logs_script) for details.

    核心参数描述:
    --user mysql :用户名,缺省为root
    --password :mysql 密码
    --port :端口号
    --host :主机名,缺省为127.0.0.1
    --workdir :指定创建relay log的硬链接的位置,默认是/var/tmp,成功执行脚本后,硬链接的中继日志文件被删除,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,建议指定为relay log相同的分区
    --disable_relay_log_purge :默认情况下,参数relay_log_purge=1,脚本不做任何处理,自动退出,设定该参数,脚本会将relay_log_purge设置为0,当清理relay log之后,最后将参数设置为OFF(0)

    在从节点上手动执行relaylog清理:

    [root@node2 data]# /usr/local/bin/purge_relay_logs --user=root --password=123456 --disable_relay_log_purge
    2021-03-31 14:16:29: purge_relay_logs script started.
    Found relay_log.info: /mysql/data/relay-log.info
    Opening /mysql/data/node2-relay-bin.000001 ..
    Opening /mysql/data/node2-relay-bin.000002 ..
    Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
    2021-03-31 14:16:32: All relay log purging operations succeeded.

    在生产环境中需要设置crontab定时执行

    $ crontab -l
    # purge relay logs at 5am
    0 5 * * * /usr/local/bin/purge_relay_logs --user=root --password=123456 --disable_relay_log_purge >> /tmp/mha_purge_relay_logs.log 2>&1

    (7.2)选择哪个主机作为主节点
    如果主机满足上述条件,则根据以下规则确定新的主服务器:
        *如果在某些主机上candicate_master = 1,则将优先考虑它们
                *如果其中一些是最新的(接收到最新二进制日志事件的从站),则该主机将被选作新的主站
                *如果最新是多个主机,则主主机将由“配置文件中按节名称的顺序”确定。如果您具有server1,server2和server3部分,并且server1和server3均为候选人主控者和最新者,则将选择server1作为新的主控者。

        *如果没有任何服务器设置candicate_master = 1参数,
                *则最新的从属服务器将成为新的主服务器。如果最新是多个从站,则将应用“按节命名”规则。
                *如果最新的从站都不是新的主站,则非最新的从站之一将是新的主站。按节命名规则也将在此处应用。

    【完】

  • 相关阅读:
    vscode中配置git
    javaScript基础-03 javascript语句
    使用DOM4J解析XML文档
    使用JDOM解析XML
    使用SAXParser解析XML文档的实例
    Schema技术的使用小结.
    Java.util.Arrays 工具类的使用
    3种类内部的用法(成员,局部,匿名)
    File类的常用方法.
    Math类的常用方法和静态导入.
  • 原文地址:https://www.cnblogs.com/lijiaman/p/14603794.html
Copyright © 2011-2022 走看看