zoukankan      html  css  js  c++  java
  • MySQL-MHA集群部署(binlog复制)

    MHA的理论知识网上有很多教程,这里不会说明;仅推荐博客链接!

    MHA的理论说明:http://www.ywnds.com/?p=8094

    MHA的安装包需要在google上面下载,或者就是csdn上面花钱下载!

    详细说明怎么搭建MHA

    #四台服务器分配如下
    10.0.102.214         test3            MHA的管理节点
    10.0.102.204         test2            master节点
    10.0.102.179         test1            slave节点(作为备用的管理节点)
    10.0.102.221         mgt01            slave节点

    #这里我们一主两从的架构基于binlog复制,首先需要配置好一主两从的架构。
    #需要注意的是,作为备用主的slave服务器需要开通二进制日志和配置log_slave_updates参数
    #MySQL基于binglog复制过程如下: https://www.cnblogs.com/wxzhe/p/10051114.html
    #部署过程中不会说明怎么搭建MySQL主从架构

    第一步:搭建好主从架构,也就是一主两从的架构。【MHA的官方不支持一主一从,但是传闻阿里修改了源码使其支持一主一从,这里使用官方的结构】

    需要注意的是要在作为备用主的服务器添加如下配置:

    log-bin=                        #开启二进制日志
    log_slave_updates               #把SQL线程的动作写入二进制日志

    第二步:安装MHA

    在MHA的集群的所有服务器上需要安装MHA-node节点,

    [root@mgt01 ~]# yum install epel-release perl-DBD-MySQL perl-CPAN -y             #安装依赖包
    [root@mgt01 src]# ls
    mha4mysql-node-0.56.tar.gz
    [root@mgt01 src]# tar zxvf mha4mysql-node-0.56.tar.gz    -C  ../                 #解压
    [root@mgt01 src]# cd ../
    [root@mgt01 local]# cd mha4mysql-node-0.56/
    [root@mgt01 mha4mysql-node-0.56]# ls
    AUTHORS  bin  COPYING  debian  inc  lib  Makefile.PL  MANIFEST  META.yml  README  rpm  t
    [root@mgt01 mha4mysql-node-0.56]# perl Makefile.PL                               #编译
    [root@mgt01 mha4mysql-node-0.56]# make  & make install                           #安装

    [root@mgt01 ~]# cd /usr/local/bin #安装完成之后,会在/usr/local/bin目录下面生成如下文件
    [root@mgt01 bin]# ls
    apply_diff_relay_logs  filter_mysqlbinlog  purge_relay_logs  save_binary_logs
    [root@mgt01 bin]# ll
    total 44
    -r-xr-xr-x 1 root root 16367 Dec  8 10:29 apply_diff_relay_logs
    -r-xr-xr-x 1 root root  4807 Dec  8 10:29 filter_mysqlbinlog
    -r-xr-xr-x 1 root root  8261 Dec  8 10:29 purge_relay_logs
    -r-xr-xr-x 1 root root  7525 Dec  8 10:29 save_binary_logs

    注意上面的这一步操作,需要在MHA集群的每个节点上都执行!

    安装MHA-manager,也就是MHA集群的管理节点!

    #首先安装MHA-manager需要安装的包
    yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
    #安装MHA-manager
    tar
    zxvf mha4mysql-manager-0.56.tar.gz cd mha4mysql-manager-0.56/ perl Makefile.PL make & make install
    cp -frp samples/scripts/* /usr/local/bin #把这些脚本文件拷贝到/usr/local/bin下面,这样不用再添加环境变量
    master_ip_failover:故障自动切换时对vip管理的脚本,不是必须。如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移。
    
    master_ip_online_change:在线切换时对vip的管理,不是必须,同样可以自行编写简单的shell完成。
    
    power_manager:故障发生后关闭主机的脚本,不是必须。
    
    send_report:因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成。
    脚本说明

    第三步:配置MHA

    配置MHA这一步主要做的就是写MHA的配置文件,然后创建对应的目录!

    在上面的samples/  目录下还有一个目录conf,里面有两个配置文件模板:

    [root@test3 conf]# ls
    app1.cnf  masterha_default.cnf
    [root@test3 conf]# 

    把配置文件模板拷贝到/etc下面:

    mkdir /etc/masterha -p             #在/etc下面创建MHA使用的配置文件的目录【名字可以随意,最好可以标识目录的内容】
    cp * /etc/masterha/

    首先编辑masterha_default.cnf文件

    [root@mha ~]# cat /etc/masterha_default.cnf
    [server default]
    # 设置监控用户mha,需要有授权
    user=mha
    # 设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码; 
    password=123456
    # 设置复制环境中的复制用户名; 
    repl_user=repl
    # 设置复制用户的密码; 
    repl_password=123456
    # 设置ssh的登录用户名;
    ssh_user=root
    # 设置ssh的登录端口(不写默认22端口);
    ssh_port=22
    # 设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover; 
    ping_interval=3
    # 设置mysql master保存binlog的目录,以便MHA可以找到master的二进制日志;
    master_binlog_dir= /data/mysql/
    # 设置mysql master在发生切换时保存binlog的目录,在mysql master上创建这个目录(不写默认为/var/tmp); 
    remote_workdir=/data/log/masterha
     
    # 一旦MHA到mysql01的监控之间出现问题,MHA Manager将会尝试从mysql02,mysql03登录到mysql01;
    secondary_check_script= masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306
    # 设置自动failover时候的切换脚本(脚本有瑕疵,需要自行修改);
    #master_ip_failover_script=/usr/local/bin/master_ip_failover
    # 设置手动切换时候的切换脚本(脚本有瑕疵,需要自行修改);
    #master_ip_online_change_script=/usr/local/bin/master_ip_online_change
    # 设置发生切换后发送的报警的脚本(可自行编写);
    #report_script=/usr/local/bin/send_report
    # 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用);
    #shutdown_script=""
    masterha_default.cnf配置参数说明

    上面给出了masterha_default.cnf每个配置参数的说明情况,下面这个是我的配置

    [server default]
    user=root
    password=123456
    ssh_user=root
    ssh_port=22
    ping_interval=3
    repl_user=repl
    repl_password=123456
    
    master_binlog_dir= /data/mysql/
    
    remote_workdir=/data/log/masterha
    
    secondary_check_script= masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306
    
    master_ip_failover_script= /usr/local/bin/master_ip_failover
    # shutdown_script= /script/masterha/power_manager
    report_script= /usr/local/bin/send_report
    # master_ip_online_change_script= /script/masterha/master_ip_online_change

    然后编辑配置app1.conf文件

    只针对单个应用生效,但是app1.cnf的配置参数优先级高于masterha_default.cnf,一般都会在app1.cnf包含masterha_default.cnf所有参数。MHA可以监控多个主从的集群,每个集群的配置文件可以用名字区分,因为这里只有一个集群,因此只有app1.conf一个文件!

    [root@test3 masterha]# cat app1.cnf 
    
    manager_log=/data/log/app1/manager.log
    manager_workdir=/data/log/app1
    master_binlog_dir=/data/mysql
    password=123456
    ping_interval=3
    remote_workdir=/data/log/masterha
    repl_password=123456
    repl_user=repl
    report_script=/usr/local/bin/send_report
    secondary_check_script=masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306
    ssh_port=22
    ssh_user=root
    user=root

    [server1]
    hostname=10.0.102.204
    port=3306
    candidate_master=1

    [server2]
    candidate_master=1
    hostname=10.0.102.179
    port=3306

    [server3]
    hostname=10.0.102.221
    no_master=1
    port=3306

    这个配置文件的参数基本都比较好理解,需要注意的是,配置文件指定的目录都需要另行创建!

    mkdir -p /data/log/masterha
    mkdir /data/log/app1

    candidate_master设置为1时,表示为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave。默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master check_repl_delay=0。

    同样设置为候选master的slave一定要开启二进制日志和log_slave_updates参数!

    设置relay log的清除方式(在每个slave节点上)

    在配置文件中加上relay_log_purge=0,需要重启才能生效!

    注意:MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在Linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

    MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。

    pure_relay_logs脚本参数如下所示:

    --user mysql                 #用户名;
    --password mysql             #密码;
    --port                       #端口号;
    --workdir                    #指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除;
    --disable_relay_log_purge    #默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF;

    设置定期清理relay脚本

    [root@mgt01 ~]# cat !$
    cat purge_relay.sh
    #!/bin/bash
    user=root
    passwd=123456
    port=3306
    log_dir='/data/masterha/log'
    work_dir='/data'
    purge='/usr/local/bin/purge_relay_logs'
     
    if [ ! -d $log_dir ];then
      mkdir $log_dir -p
    fi
     
    $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

    把以上脚本加入定时计划任务:

    [root@mgt01 log]# crontab -l
    * 4 * * * sh /root/purge_relay.sh

    purge_relay_logs脚本删除中继日志不会阻塞SQL线程

    第四步: 设置ssh无密码认证

    MHA的管理节点可以无密码访问集群中的其余节点!

    MySQL集群需要互相之间可以无密码访问!

    ssh无密码访问不再写过程。

    使用MHA检查ssh是否成功

    [root@test3 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf

    若成功则进行下一步,检查复制

    有一些博客提到:暂时先注释配置文件中master_ip_failover_script= /usr/local/bin/master_ip_failover这个选项,不然这个检查过不去的。但是我测试时候没有注释,也是可以检查成功的】

    [root@test3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
    [root@test3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
    Sat Dec  8 17:03:38 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Dec  8 17:03:38 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
    Sat Dec  8 17:03:38 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
    Sat Dec  8 17:03:38 2018 - [info] MHA::MasterMonitor version 0.56.
    Sat Dec  8 17:03:38 2018 - [info] GTID failover mode = 0
    Sat Dec  8 17:03:38 2018 - [info] Dead Servers:
    Sat Dec  8 17:03:38 2018 - [info] Alive Servers:
    Sat Dec  8 17:03:38 2018 - [info]   10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:03:38 2018 - [info]   10.0.102.179(10.0.102.179:3306)
    Sat Dec  8 17:03:38 2018 - [info]   10.0.102.221(10.0.102.221:3306)
    Sat Dec  8 17:03:38 2018 - [info] Alive Slaves:
    Sat Dec  8 17:03:38 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:03:38 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:03:38 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:03:38 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:03:38 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:03:38 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:03:38 2018 - [info] Current Alive Master: 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:03:38 2018 - [info] Checking slave configurations..
    Sat Dec  8 17:03:38 2018 - [info]  read_only=1 is not set on slave 10.0.102.179(10.0.102.179:3306).
    Sat Dec  8 17:03:38 2018 - [info]  read_only=1 is not set on slave 10.0.102.221(10.0.102.221:3306).
    Sat Dec  8 17:03:38 2018 - [warning]  log-bin is not set on slave 10.0.102.221(10.0.102.221:3306). This host cannot be a master.
    Sat Dec  8 17:03:38 2018 - [info] Checking replication filtering settings..
    Sat Dec  8 17:03:38 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
    Sat Dec  8 17:03:38 2018 - [info]  Replication filtering check ok.
    Sat Dec  8 17:03:38 2018 - [info] GTID (with auto-pos) is not supported
    Sat Dec  8 17:03:38 2018 - [info] Starting SSH connection tests..
    Sat Dec  8 17:03:40 2018 - [info] All SSH connection tests passed successfully.
    Sat Dec  8 17:03:40 2018 - [info] Checking MHA Node version..
    Sat Dec  8 17:03:40 2018 - [info]  Version check ok.
    Sat Dec  8 17:03:40 2018 - [info] Checking SSH publickey authentication settings on the current master..
    Sat Dec  8 17:03:40 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable.
    Sat Dec  8 17:03:41 2018 - [info] Master MHA Node version is 0.56.
    Sat Dec  8 17:03:41 2018 - [info] Checking recovery script configurations on 10.0.102.204(10.0.102.204:3306)..
    Sat Dec  8 17:03:41 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/log/masterha/save_binary_logs_test --manager_version=0.56 --start_file=test2-bin.000007 
    Sat Dec  8 17:03:41 2018 - [info]   Connecting to root@10.0.102.204(10.0.102.204:22).. 
      Creating /data/log/masterha if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /data/mysql, up to test2-bin.000007
    Sat Dec  8 17:03:41 2018 - [info] Binlog setting check done.
    Sat Dec  8 17:03:41 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Sat Dec  8 17:03:41 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.102.179 --slave_ip=10.0.102.179 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.7.22-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
    Sat Dec  8 17:03:41 2018 - [info]   Connecting to root@10.0.102.179(10.0.102.179:22).. 
      Checking slave recovery environment settings..
        Opening /data/mysql/relay-log.info ... ok.
        Relay log found at /data/mysql, up to test1-relay-bin.000002
        Temporary relay log file is /data/mysql/test1-relay-bin.000002
        Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sat Dec  8 17:03:41 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.102.221 --slave_ip=10.0.102.221 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.7.22 --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
    Sat Dec  8 17:03:41 2018 - [info]   Connecting to root@10.0.102.221(10.0.102.221:22).. 
      Checking slave recovery environment settings..
        Opening /data/mysql/relay-log.info ... ok.
        Relay log found at /data/mysql, up to mgt01-relay-bin.000002
        Temporary relay log file is /data/mysql/mgt01-relay-bin.000002
        Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sat Dec  8 17:03:41 2018 - [info] Slaves settings check done.
    Sat Dec  8 17:03:41 2018 - [info] 
    10.0.102.204(10.0.102.204:3306) (current master)
     +--10.0.102.179(10.0.102.179:3306)
     +--10.0.102.221(10.0.102.221:3306)
    
    Sat Dec  8 17:03:41 2018 - [info] Checking replication health on 10.0.102.179..
    Sat Dec  8 17:03:41 2018 - [info]  ok.
    Sat Dec  8 17:03:41 2018 - [info] Checking replication health on 10.0.102.221..
    Sat Dec  8 17:03:41 2018 - [info]  ok.
    Sat Dec  8 17:03:41 2018 - [warning] master_ip_failover_script is not defined.
    Sat Dec  8 17:03:41 2018 - [warning] shutdown_script is not defined.
    Sat Dec  8 17:03:41 2018 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.详细过程
    详细过程

    遇到过一次是复制检查时,总是会dead servers下面有一个服务器,但是集群里面是正常的,各种都是正常的,后来发现是本地的解析出错!【/etc/hosts文化和ssh目录下面的known_hosts文件,新建的服务器一般不会出现这问题】

    查看MHA-manger的状态

    [root@test3 masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf 
    app1 is stopped(2:NOT_RUNNING).
    [root@test3 masterha]# 

    开启MHa-manager

    [root@test3 masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover &

    #参数说明
    remove_dead_master_conf:设置了这个参数后,如果MHA failover结束后,MHA Manager会自动在配置文件中删除dead master的相关项。如果不设置,
    由于dead master的配置还存在文件中,那么当MHA failover后,当再次restart MHA manager后,会报错(there is a dead slave previous dead master)。
    ignore_last_failover:默认情况下,如果一个或者多个slave down掉了,master monitor进程就会停掉,就算你设置了ignore_fail。如果设置了–ignore_fail_on_start参数,ignore_fail标记了slave挂掉也不会让master monitor进程停掉。

    启动之后查看状态:

    [root@test3 masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf 
    app1 (pid:18866) is running(0:PING_OK), master:10.0.102.204

    如果启动没有报错,那么一个MHA的集群就已经搭建成功!

    关闭MHA-manager可以使用如下命令:

     masterha_stop --conf=/etc/masterha/app1.cnf

    最后:我们进行一个failover测试!

    停掉MySQL主从集群中的主,查看是否会自动切换到从!在测试主从之前最后可以写入一点数据,这里我利用tpcc写入了一些数据!

    ./tpcc_load -h 10.0.102.204 -P 3306 -d tpcc_test -u root -p 123456 -w 3

    tpcc的测试使用:https://www.cnblogs.com/wxzhe/p/10027474.html

    停掉当前的主服务器!

    [root@test2 ~]# service mysqld stop
    Shutting down MySQL............ SUCCESS!

    然后查看MHA的管理日志

    Sat Dec  8 17:21:50 2018 - [info] Executing secondary network check script: masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306  --user=root  --master_host=10.0.102.204  --master_ip=10.0.102.204  --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT
    Sat Dec  8 17:21:50 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/log/masterha/save_binary_logs_test --manager_version=0.56 --binlog_prefix=test2-bin
    Monitoring server test1 is reachable, Master is not reachable from test1. OK.
    Sat Dec  8 17:21:50 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable.
    Monitoring server mgt01 is reachable, Master is not reachable from mgt01. OK.
    Sat Dec  8 17:21:50 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start.
    Sat Dec  8 17:21:53 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
    Sat Dec  8 17:21:53 2018 - [warning] Connection failed 2 time(s)..
    Sat Dec  8 17:21:56 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
    Sat Dec  8 17:21:56 2018 - [warning] Connection failed 3 time(s)..
    Sat Dec  8 17:21:59 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
    Sat Dec  8 17:21:59 2018 - [warning] Connection failed 4 time(s)..
    Sat Dec  8 17:21:59 2018 - [warning] Master is not reachable from health checker!
    Sat Dec  8 17:21:59 2018 - [warning] Master 10.0.102.204(10.0.102.204:3306) is not reachable!
    Sat Dec  8 17:21:59 2018 - [warning] SSH is reachable.
    Sat Dec  8 17:21:59 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
    Sat Dec  8 17:21:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Dec  8 17:21:59 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
    Sat Dec  8 17:21:59 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
    Sat Dec  8 17:21:59 2018 - [info] GTID failover mode = 0
    Sat Dec  8 17:21:59 2018 - [info] Dead Servers:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info] Alive Servers:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)
    Sat Dec  8 17:21:59 2018 - [info] Alive Slaves:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:21:59 2018 - [info] Checking slave configurations..
    Sat Dec  8 17:21:59 2018 - [info]  read_only=1 is not set on slave 10.0.102.179(10.0.102.179:3306).
    Sat Dec  8 17:21:59 2018 - [info]  read_only=1 is not set on slave 10.0.102.221(10.0.102.221:3306).
    Sat Dec  8 17:21:59 2018 - [warning]  log-bin is not set on slave 10.0.102.221(10.0.102.221:3306). This host cannot be a master.
    Sat Dec  8 17:21:59 2018 - [info] Checking replication filtering settings..
    Sat Dec  8 17:21:59 2018 - [info]  Replication filtering check ok.
    Sat Dec  8 17:21:59 2018 - [info] Master is down!
    Sat Dec  8 17:21:59 2018 - [info] Terminating monitoring script.
    Sat Dec  8 17:21:59 2018 - [info] Got exit code 20 (Master dead).
    Sat Dec  8 17:21:59 2018 - [info] MHA::MasterFailover version 0.56.
    Sat Dec  8 17:21:59 2018 - [info] Starting master failover.
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] * Phase 1: Configuration Check Phase..
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] GTID failover mode = 0
    Sat Dec  8 17:21:59 2018 - [info] Dead Servers:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info] Checking master reachability via MySQL(double check)...
    Sat Dec  8 17:21:59 2018 - [info]  ok.
    Sat Dec  8 17:21:59 2018 - [info] Alive Servers:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)
    Sat Dec  8 17:21:59 2018 - [info] Alive Slaves:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:21:59 2018 - [info] Starting Non-GTID based failover. 
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] ** Phase 1: Configuration Check Phase completed.
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] Forcing shutdown so that applications never connect to the current master..
    Sat Dec  8 17:21:59 2018 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
    Sat Dec  8 17:21:59 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Sat Dec  8 17:21:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] * Phase 3: Master Recovery Phase..
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] The latest binary log file/position on all slaves is test2-bin.000007:154
    Sat Dec  8 17:21:59 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:21:59 2018 - [info] The oldest binary log file/position on all slaves is test2-bin.000007:154
    Sat Dec  8 17:21:59 2018 - [info] Oldest slaves:
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:21:59 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:21:59 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:21:59 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
    Sat Dec  8 17:21:59 2018 - [info] 
    Sat Dec  8 17:21:59 2018 - [info] Fetching dead master's binary logs..
    Sat Dec  8 17:21:59 2018 - [info] Executing command on the dead master 10.0.102.204(10.0.102.204:3306): save_binary_logs --command=save --start_file=test2-bin.000007  --start_pos=154 --binlog_dir=/data/mysql --output_file=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
      Creating /data/log/masterha if not exists..    ok.
     Concat binary/relay logs from test2-bin.000007 pos 154 to test2-bin.000007 EOF into /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog ..
     Binlog Checksum enabled
      Dumping binlog format description event, from position 0 to 154.. ok.
      Dumping effective binlog data from /data/mysql/test2-bin.000007 position 154 to tail(177).. ok.
     Binlog Checksum enabled
     Concat succeeded.
    Sat Dec  8 17:22:00 2018 - [info] scp from root@10.0.102.204:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded.
    Sat Dec  8 17:22:00 2018 - [info] HealthCheck: SSH to 10.0.102.179 is reachable.
    Sat Dec  8 17:22:00 2018 - [info] HealthCheck: SSH to 10.0.102.221 is reachable.
    Sat Dec  8 17:22:00 2018 - [info] 
    Sat Dec  8 17:22:00 2018 - [info] * Phase 3.3: Determining New Master Phase..
    Sat Dec  8 17:22:00 2018 - [info] 
    Sat Dec  8 17:22:00 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
    Sat Dec  8 17:22:00 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
    Sat Dec  8 17:22:00 2018 - [info] Searching new master from slaves..
    Sat Dec  8 17:22:00 2018 - [info]  Candidate masters from the configuration file:
    Sat Dec  8 17:22:00 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
    Sat Dec  8 17:22:00 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:22:00 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Dec  8 17:22:00 2018 - [info]  Non-candidate masters:
    Sat Dec  8 17:22:00 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22 (oldest major version between slaves) log-bin:disabled
    Sat Dec  8 17:22:00 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
    Sat Dec  8 17:22:00 2018 - [info]     Not candidate for the new Master (no_master is set)
    Sat Dec  8 17:22:00 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
    Sat Dec  8 17:22:00 2018 - [info] New master is 10.0.102.179(10.0.102.179:3306)
    Sat Dec  8 17:22:00 2018 - [info] Starting master failover..
    Sat Dec  8 17:22:00 2018 - [info] 
    From:
    10.0.102.204(10.0.102.204:3306) (current master)
     +--10.0.102.179(10.0.102.179:3306)
     +--10.0.102.221(10.0.102.221:3306)
    
    To:
    10.0.102.179(10.0.102.179:3306) (new master)
     +--10.0.102.221(10.0.102.221:3306)
    Sat Dec  8 17:22:00 2018 - [info] 
    Sat Dec  8 17:22:00 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
    Sat Dec  8 17:22:00 2018 - [info] 
    Sat Dec  8 17:22:00 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
    Sat Dec  8 17:22:00 2018 - [info] Sending binlog..
    Sat Dec  8 17:22:01 2018 - [info] scp from local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to root@10.0.102.179:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded.
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] * Phase 3.4: Master Log Apply Phase..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
    Sat Dec  8 17:22:01 2018 - [info] Starting recovery on 10.0.102.179(10.0.102.179:3306)..
    Sat Dec  8 17:22:01 2018 - [info]  Generating diffs succeeded.
    Sat Dec  8 17:22:01 2018 - [info] Waiting until all relay logs are applied.
    Sat Dec  8 17:22:01 2018 - [info]  done.
    Sat Dec  8 17:22:01 2018 - [info] Getting slave status..
    Sat Dec  8 17:22:01 2018 - [info] This slave(10.0.102.179)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(test2-bin.000007:154). No need to recover from Exec_Master_Log_Pos.
    Sat Dec  8 17:22:01 2018 - [info] Connecting to the target slave host 10.0.102.179, running recover script..
    Sat Dec  8 17:22:01 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.102.179 --slave_ip=10.0.102.179  --slave_port=3306 --apply_files=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --workdir=/data/log/masterha --target_version=5.7.22-log --timestamp=20181208172159 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
    Sat Dec  8 17:22:01 2018 - [info] 
    MySQL client version is 5.7.22. Using --binary-mode.
    Applying differential binary/relay log files /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog on 10.0.102.179:3306. This may take long time...
    Applying log files succeeded.
    Sat Dec  8 17:22:01 2018 - [info]  All relay logs were successfully applied.
    Sat Dec  8 17:22:01 2018 - [info] Getting new master's binlog name and position..
    Sat Dec  8 17:22:01 2018 - [info]  test1-bin.000001:154
    Sat Dec  8 17:22:01 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.102.179', MASTER_PORT=3306, MASTER_LOG_FILE='test1-bin.000001', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    Sat Dec  8 17:22:01 2018 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
    Sat Dec  8 17:22:01 2018 - [info] ** Finished master recovery successfully.
    Sat Dec  8 17:22:01 2018 - [info] * Phase 3: Master Recovery Phase completed.
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] * Phase 4: Slaves Recovery Phase..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] -- Slave diff file generation on host 10.0.102.221(10.0.102.221:3306) started, pid: 19729. Check tmp log /data/log/app1/10.0.102.221_3306_20181208172159.log if it takes time..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] Log messages from 10.0.102.221 ...
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
    Sat Dec  8 17:22:01 2018 - [info] End of log messages from 10.0.102.221.
    Sat Dec  8 17:22:01 2018 - [info] -- 10.0.102.221(10.0.102.221:3306) has the latest relay log events.
    Sat Dec  8 17:22:01 2018 - [info] Generating relay diff files from the latest slave succeeded.
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) started, pid: 19731. Check tmp log /data/log/app1/10.0.102.221_3306_20181208172159.log if it takes time..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] Log messages from 10.0.102.221 ...
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] Sending binlog..
    Sat Dec  8 17:22:01 2018 - [info] scp from local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to root@10.0.102.221:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded.
    Sat Dec  8 17:22:01 2018 - [info] Starting recovery on 10.0.102.221(10.0.102.221:3306)..
    Sat Dec  8 17:22:01 2018 - [info]  Generating diffs succeeded.
    Sat Dec  8 17:22:01 2018 - [info] Waiting until all relay logs are applied.
    Sat Dec  8 17:22:01 2018 - [info]  done.
    Sat Dec  8 17:22:01 2018 - [info] Getting slave status..
    Sat Dec  8 17:22:01 2018 - [info] This slave(10.0.102.221)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(test2-bin.000007:154). No need to recover from Exec_Master_Log_Pos.
    Sat Dec  8 17:22:01 2018 - [info] Connecting to the target slave host 10.0.102.221, running recover script..
    Sat Dec  8 17:22:01 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.102.221 --slave_ip=10.0.102.221  --slave_port=3306 --apply_files=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --workdir=/data/log/masterha --target_version=5.7.22 --timestamp=20181208172159 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
    Sat Dec  8 17:22:01 2018 - [info] 
    MySQL client version is 5.7.22. Using --binary-mode.
    Applying differential binary/relay log files /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog on 10.0.102.221:3306. This may take long time...
    Applying log files succeeded.
    Sat Dec  8 17:22:01 2018 - [info]  All relay logs were successfully applied.
    Sat Dec  8 17:22:01 2018 - [info]  Resetting slave 10.0.102.221(10.0.102.221:3306) and starting replication from the new master 10.0.102.179(10.0.102.179:3306)..
    Sat Dec  8 17:22:01 2018 - [info]  Executed CHANGE MASTER.
    Sat Dec  8 17:22:01 2018 - [info]  Slave started.
    Sat Dec  8 17:22:01 2018 - [info] End of log messages from 10.0.102.221.
    Sat Dec  8 17:22:01 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) succeeded.
    Sat Dec  8 17:22:01 2018 - [info] All new slave servers recovered successfully.
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] * Phase 5: New master cleanup phase..
    Sat Dec  8 17:22:01 2018 - [info] 
    Sat Dec  8 17:22:01 2018 - [info] Resetting slave info on the new master..
    Sat Dec  8 17:22:02 2018 - [info]  10.0.102.179: Resetting slave info succeeded.
    Sat Dec  8 17:22:02 2018 - [info] Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully.
    Sat Dec  8 17:22:02 2018 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
    Sat Dec  8 17:22:02 2018 - [info] 
    
    ----- Failover Report -----
    
    app1: MySQL Master failover 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306) succeeded
    
    Master 10.0.102.204(10.0.102.204:3306) is down!
    
    Check MHA Manager logs at test3:/data/log/app1/manager.log for details.
    
    Started automated(non-interactive) failover.
    The latest slave 10.0.102.179(10.0.102.179:3306) has all relay logs for recovery.
    Selected 10.0.102.179(10.0.102.179:3306) as a new master.
    10.0.102.179(10.0.102.179:3306): OK: Applying all logs succeeded.
    10.0.102.221(10.0.102.221:3306): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    10.0.102.221(10.0.102.221:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.0.102.179(10.0.102.179:3306)
    10.0.102.179(10.0.102.179:3306): Resetting slave info succeeded.
    Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully.
    Sat Dec  8 17:22:02 2018 - [info] Sending mail..
    Unknown option: conf

    由mha切换日志可以看出,整个故障切换已经完成了。整个过程各个阶段核心切换逻辑简化后如下:

    Phase 1:配置文件检查

    Phase 2:非存活Master关闭服务

    Phase 3:Master恢复

    Phase 3.1:获取与Master延迟最小的Slave节点

    Phase 3.2:生成Master与延迟最小的Slave节点的差异binlog并保存到manager节点

    Phase 3.3:找出新的New Master,如果New Master不是最新的Slave节点,那么需要生成它们之间的差异Relay log

    Phase 3.4: New Master恢复差异Relay log,随后获取Master binlog位点信息,然后恢复差异binlog日志

    Phase 4: Slaves恢复

    Phase 4.1:多线程生成延迟最小的Slave节点与其他一个或多个Slave差异Relay log

    Phase 4.2:多线程恢复Slaves节点差异Relay log,然后change master到NEW MASTER节点

    Phase 5:New Master清理Slave信息,并删除掉MHA配置文件中的选主信息防止误操作

  • 相关阅读:
    Windows 8实例教程系列 开篇
    qt 开发发布于 windeploy.exe
    qt qoci 测试验证
    vmware vmx 版本不兼容
    qt oracle
    vc qt dll
    QOCIDriver unable to create environment
    qoci 编译完 放置位置 具体根据情况
    calling 'lastError' with incomplete return type 'QSqlError' qsqlquer
    Hbase 操作工具类
  • 原文地址:https://www.cnblogs.com/wxzhe/p/10088627.html
Copyright © 2011-2022 走看看