zoukankan      html  css  js  c++  java
  • MHA在线切换过程

      MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等。该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左右,大大减少了停机时间。Online master switch开始只有当所有下列条件得到满足:

     1. IO threads on all slaves are running   // 在所有slave上IO线程运行。
     2. SQL threads on all slaves are running  //SQL线程在所有的slave上正常运行。
     3. Seconds_Behind_Master on all slaves are less or equal than --running_updates_limit seconds  // 在所有的slaves上 Seconds_Behind_Master 要小于等于  running_updates_limit seconds
     4. On master, none of update queries take more than --running_updates_limit seconds in the show processlist output  // 在主上,没有更新查询操作多于running_updates_limit seconds 在show processlist输出结果上。

    这些限制的原因是出于安全原因,并尽快切换到新主库。

    1.校验当前是否启用masterha_manager(建议停掉)

    [root@DBproxy app2]# masterha_check_status --conf=/data/masterha/app1/app1.cnf
    app1 (pid:6769) is running(0:PING_OK), master:192.168.0.50
    [root@DBproxy app2]#

    2.校验slave的IO_threads、SQL_threads、Seconds_Behind_Master

    [mysql@MyDB02 masterha]$ mysql -uroot -p123456 -h192.168.0.60 -e 'show slave status G'|grep -E "Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master"
    Warning: Using a password on the command line interface can be insecure.
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
            Seconds_Behind_Master: 0
          Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
    [mysql@MyDB02 masterha]$

    3.实施在线切换

    [root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
    Sat Jul 16 09:11:00 2016 - [info] MHA::MasterRotate version 0.56.
    Sat Jul 16 09:11:00 2016 - [info] Starting online master switch..
    Sat Jul 16 09:11:00 2016 - [info] 
    Sat Jul 16 09:11:00 2016 - [info] * Phase 1: Configuration Check Phase..
    Sat Jul 16 09:11:00 2016 - [info] 
    Sat Jul 16 09:11:00 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Jul 16 09:11:00 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 09:11:00 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 09:11:00 2016 - [info] GTID failover mode = 0
    Sat Jul 16 09:11:00 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 09:11:00 2016 - [info] Alive Slaves:
    Sat Jul 16 09:11:00 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
    Sat Jul 16 09:11:00 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 09:11:00 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Jul 16 09:11:00 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
    Sat Jul 16 09:11:00 2016 - [info]  ok.
    Sat Jul 16 09:11:00 2016 - [info] Checking MHA is not monitoring or doing failover..
    Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
    Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53
    [root@DBproxy masterha]#
    
    将MHA停掉再进行测试
    [root@DBproxy masterha]# masterha_stop  --conf=/data/masterha/app1/app1.cnf
    Stopped app1 successfully.
    [2]-  Exit 1                  nohup masterha_manager --conf=/data/masterha/app1/app1.cnf 2>&1  (wd: /data/masterha/app2)
    (wd now: /data/masterha)
    [root@DBproxy masterha]#

    4.再次实施在线切换

    [root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
    Sat Jul 16 09:15:03 2016 - [info] MHA::MasterRotate version 0.56.
    Sat Jul 16 09:15:03 2016 - [info] Starting online master switch..
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [info] * Phase 1: Configuration Check Phase..
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Jul 16 09:15:03 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 09:15:03 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 09:15:03 2016 - [info] GTID failover mode = 0
    Sat Jul 16 09:15:03 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 09:15:03 2016 - [info] Alive Slaves:
    Sat Jul 16 09:15:03 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
    Sat Jul 16 09:15:03 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 09:15:03 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
    Sat Jul 16 09:15:03 2016 - [info]  ok.
    Sat Jul 16 09:15:03 2016 - [info] Checking MHA is not monitoring or doing failover..
    Sat Jul 16 09:15:03 2016 - [info] Checking replication health on 192.168.0.60..
    Sat Jul 16 09:15:03 2016 - [info]  ok.
    Sat Jul 16 09:15:03 2016 - [info] 192.168.0.60 can be new master.
    Sat Jul 16 09:15:03 2016 - [info] 
    From:
    192.168.0.50(192.168.0.50:3306) (current master)
     +--192.168.0.60(192.168.0.60:3306)
    
    To:
    192.168.0.60(192.168.0.60:3306) (new master)
     +--192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 09:15:03 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..
    Sat Jul 16 09:15:03 2016 - [info]  ok.
    Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
    Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.
    Sat Jul 16 09:15:03 2016 - [info] ** Phase 1: Configuration Check Phase completed.
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [info] * Phase 2: Rejecting updates Phase..
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
    Sat Jul 16 09:15:03 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
    Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
    Sat Jul 16 09:15:03 2016 - [info]  ok.
    Sat Jul 16 09:15:03 2016 - [info] Orig master binlog:pos is mysql-bin.000009:40355591.
    Sat Jul 16 09:15:03 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..
    Sat Jul 16 09:15:03 2016 - [info]  master_pos_wait(mysql-bin.000009:40355591) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.
    Sat Jul 16 09:15:03 2016 - [info]   done.
    Sat Jul 16 09:15:03 2016 - [info] Getting new master's binlog name and position..
    Sat Jul 16 09:15:03 2016 - [info]  mysql-bin.000006:120
    Sat Jul 16 09:15:03 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=120, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [info] * Switching slaves in parallel..
    Sat Jul 16 09:15:03 2016 - [info] 
    Sat Jul 16 09:15:03 2016 - [info] Unlocking all tables on the orig master:
    Sat Jul 16 09:15:03 2016 - [info] Executing UNLOCK TABLES..
    Sat Jul 16 09:15:03 2016 - [info]  ok.
    Sat Jul 16 09:15:03 2016 - [info] Starting orig master as a new slave..
    Sat Jul 16 09:15:03 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..
    Sat Jul 16 09:15:03 2016 - [info]  Executed CHANGE MASTER.
    Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784] Slave could not be started on 192.168.0.50(192.168.0.50:3306)! Check slave status.
    Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862] Starting slave IO/SQL thread on 192.168.0.50(192.168.0.50:3306) failed!
    Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573]  Failed!
    Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switching master to 192.168.0.60(192.168.0.60:3306) done, but switching slaves partially failed.
    [root@DBproxy masterha]# 

    通过主从机本身的日志判断 可能是主从机中ip和主机名的未做映射导致的。修改hosts

    主机的/etc/hosts
    127.0.0.1 MyDB01
    从机的/etc/hosts
    127.0.0.1 MyDB02
    
    修改后主从机器的/etc/hosts
    [root@MyDB02 ~]# more /etc/hosts
    192.168.0.60  MyDB02
    192.168.0.50  MyDB01

    因之前的操作为完全成功,导致两台机器为双主架构。手动切换后调整为最初架构一主一从。在线切换前做一次检查:

    [root@DBproxy app1]# masterha_check_repl --conf=/data/masterha/app1/app1.cnf
    Sat Jul 16 10:24:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Jul 16 10:24:49 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 10:24:49 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 10:24:49 2016 - [info] MHA::MasterMonitor version 0.56.
    Sat Jul 16 10:24:49 2016 - [info] GTID failover mode = 0
    Sat Jul 16 10:24:49 2016 - [info] Dead Servers:
    Sat Jul 16 10:24:49 2016 - [info] Alive Servers:
    Sat Jul 16 10:24:49 2016 - [info]   192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)
    Sat Jul 16 10:24:49 2016 - [info] Alive Slaves:
    Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
    Sat Jul 16 10:24:49 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:24:49 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Jul 16 10:24:49 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:24:49 2016 - [info] Checking slave configurations..
    Sat Jul 16 10:24:49 2016 - [info]  read_only=1 is not set on slave 192.168.0.60(192.168.0.60:3306).
    Sat Jul 16 10:24:49 2016 - [info] Checking replication filtering settings..
    Sat Jul 16 10:24:49 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
    Sat Jul 16 10:24:49 2016 - [info]  Replication filtering check ok.
    Sat Jul 16 10:24:49 2016 - [info] GTID (with auto-pos) is not supported
    Sat Jul 16 10:24:49 2016 - [info] Starting SSH connection tests..
    Sat Jul 16 10:24:50 2016 - [info] All SSH connection tests passed successfully.
    Sat Jul 16 10:24:50 2016 - [info] Checking MHA Node version..
    Sat Jul 16 10:24:51 2016 - [info]  Version check ok.
    Sat Jul 16 10:24:51 2016 - [info] Checking SSH publickey authentication settings on the current master..
    Sat Jul 16 10:24:51 2016 - [info] HealthCheck: SSH to 192.168.0.50 is reachable.
    Sat Jul 16 10:24:51 2016 - [info] Master MHA Node version is 0.56.
    Sat Jul 16 10:24:51 2016 - [info] Checking recovery script configurations on 192.168.0.50(192.168.0.50:3306)..
    Sat Jul 16 10:24:51 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/3306/binlog --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000010 
    Sat Jul 16 10:24:51 2016 - [info]   Connecting to root@192.168.0.50(192.168.0.50:22).. 
      Creating /data/masterha/app1 if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /data/mysql/3306/binlog, up to mysql-bin.000010
    Sat Jul 16 10:24:52 2016 - [info] Binlog setting check done.
    Sat Jul 16 10:24:52 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Sat Jul 16 10:24:52 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.0.60 --slave_ip=192.168.0.60 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/data/mysql/3306/data/relay-log.info  --relay_dir=/data/mysql/3306/data/  --slave_pass=xxx
    Sat Jul 16 10:24:52 2016 - [info]   Connecting to root@192.168.0.60(192.168.0.60:22).. 
      Checking slave recovery environment settings..
        Opening /data/mysql/3306/data/relay-log.info ... ok.
        Relay log found at /data/mysql/3306/binlog, up to relay-bin.000002
        Temporary relay log file is /data/mysql/3306/binlog/relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sat Jul 16 10:24:53 2016 - [info] Slaves settings check done.
    Sat Jul 16 10:24:53 2016 - [info] 
    192.168.0.50(192.168.0.50:3306) (current master)
     +--192.168.0.60(192.168.0.60:3306)
    
    Sat Jul 16 10:24:53 2016 - [info] Checking replication health on 192.168.0.60..
    Sat Jul 16 10:24:53 2016 - [info]  ok.
    Sat Jul 16 10:24:53 2016 - [warning] master_ip_failover_script is not defined.
    Sat Jul 16 10:24:53 2016 - [warning] shutdown_script is not defined.
    Sat Jul 16 10:24:53 2016 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

    5.实施切换

    [root@DBproxy app1]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
    Sat Jul 16 10:26:59 2016 - [info] MHA::MasterRotate version 0.56.
    Sat Jul 16 10:26:59 2016 - [info] Starting online master switch..
    Sat Jul 16 10:26:59 2016 - [info] 
    Sat Jul 16 10:26:59 2016 - [info] * Phase 1: Configuration Check Phase..
    Sat Jul 16 10:26:59 2016 - [info] 
    Sat Jul 16 10:26:59 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sat Jul 16 10:26:59 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 10:26:59 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
    Sat Jul 16 10:26:59 2016 - [info] GTID failover mode = 0
    Sat Jul 16 10:26:59 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:26:59 2016 - [info] Alive Slaves:
    Sat Jul 16 10:26:59 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
    Sat Jul 16 10:26:59 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:26:59 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
    Sat Jul 16 10:26:59 2016 - [info]  ok.
    Sat Jul 16 10:26:59 2016 - [info] Checking MHA is not monitoring or doing failover..
    Sat Jul 16 10:26:59 2016 - [info] Checking replication health on 192.168.0.60..
    Sat Jul 16 10:26:59 2016 - [info]  ok.
    Sat Jul 16 10:26:59 2016 - [info] 192.168.0.60 can be new master.
    Sat Jul 16 10:26:59 2016 - [info] 
    From:
    192.168.0.50(192.168.0.50:3306) (current master)
     +--192.168.0.60(192.168.0.60:3306)
    
    To:
    192.168.0.60(192.168.0.60:3306) (new master)
     +--192.168.0.50(192.168.0.50:3306)
    Sat Jul 16 10:26:59 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..
    Sat Jul 16 10:26:59 2016 - [info]  ok.
    Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
    Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.
    Sat Jul 16 10:26:59 2016 - [info] ** Phase 1: Configuration Check Phase completed.
    Sat Jul 16 10:26:59 2016 - [info] 
    Sat Jul 16 10:26:59 2016 - [info] * Phase 2: Rejecting updates Phase..
    Sat Jul 16 10:26:59 2016 - [info] 
    Sat Jul 16 10:26:59 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
    Sat Jul 16 10:26:59 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
    Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
    Sat Jul 16 10:26:59 2016 - [info]  ok.
    Sat Jul 16 10:26:59 2016 - [info] Orig master binlog:pos is mysql-bin.000010:120.
    Sat Jul 16 10:26:59 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..
    Sat Jul 16 10:27:00 2016 - [info]  master_pos_wait(mysql-bin.000010:120) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.
    Sat Jul 16 10:27:00 2016 - [info]   done.
    Sat Jul 16 10:27:00 2016 - [info] Getting new master's binlog name and position..
    Sat Jul 16 10:27:00 2016 - [info]  mysql-bin.000008:239
    Sat Jul 16 10:27:00 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=239, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    Sat Jul 16 10:27:00 2016 - [info] 
    Sat Jul 16 10:27:00 2016 - [info] * Switching slaves in parallel..
    Sat Jul 16 10:27:00 2016 - [info] 
    Sat Jul 16 10:27:00 2016 - [info] Unlocking all tables on the orig master:
    Sat Jul 16 10:27:00 2016 - [info] Executing UNLOCK TABLES..
    Sat Jul 16 10:27:00 2016 - [info]  ok.
    Sat Jul 16 10:27:00 2016 - [info] Starting orig master as a new slave..
    Sat Jul 16 10:27:00 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..
    Sat Jul 16 10:27:00 2016 - [info]  Executed CHANGE MASTER.
    Sat Jul 16 10:27:00 2016 - [info]  Slave started.
    Sat Jul 16 10:27:00 2016 - [info] All new slave servers switched successfully.
    Sat Jul 16 10:27:00 2016 - [info] 
    Sat Jul 16 10:27:00 2016 - [info] * Phase 5: New master cleanup phase..
    Sat Jul 16 10:27:00 2016 - [info] 
    Sat Jul 16 10:27:00 2016 - [info]  192.168.0.60: Resetting slave info succeeded.
    Sat Jul 16 10:27:00 2016 - [info] Switching master to 192.168.0.60(192.168.0.60:3306) completed successfully.
    [root@DBproxy app1]# 
  • 相关阅读:
    SPComm的一点小诀窍 spcomm的问题导致数据丢失 0x11与0x13错误
    关于DELPHI数组,指针,字符串转换的例子!(转)
    SQL常用语法大全
    SQL触发器实例讲解
    Delphi 变体类型(Variant)的介绍(流与变体类型的相互转换、变体类型常用的函数)
    delphi 生成条形码(fastreport 实现)
    delphi 判断字符串有中文
    delphi const
    delphi as
    delphi 字符串常识
  • 原文地址:https://www.cnblogs.com/polestar/p/5737121.html
Copyright © 2011-2022 走看看