Preface
I've installed MasterHA yesterday,Now let's test the master-slave switch and failover feature.
Framework
Hostname | IP | Port | Identity | OS Version | MySQL Version |
zlm2 | 192.168.1.101 | 3306 | master | CentOS 7.0 | 5.7.21 |
zlm3 | 192.168.1.102 | 3306 | slave/mha-manager | CentOS 7.0 | 5.7.21 |
null | 192.168.1.200 | null | vip | null | null |
Procedure
Test 1:Manual master switchover
Check state of MHA-manager on zlm3.
1 [root@zlm3 07:35:00 ~] 2 #masterha_check_ssh --conf=/etc/masterha/app1.conf 3 Fri Aug 3 07:37:13 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 4 Fri Aug 3 07:37:13 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 5 Fri Aug 3 07:37:13 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 6 Fri Aug 3 07:37:13 2018 - [info] Starting SSH connection tests.. 7 Fri Aug 3 07:37:13 2018 - [debug] 8 Fri Aug 3 07:37:13 2018 - [debug] Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.102(192.168.1.102:22).. 9 Fri Aug 3 07:37:13 2018 - [debug] ok. 10 Fri Aug 3 07:37:14 2018 - [debug] 11 Fri Aug 3 07:37:13 2018 - [debug] Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.101(192.168.1.101:22).. 12 Fri Aug 3 07:37:13 2018 - [debug] ok. 13 Fri Aug 3 07:37:14 2018 - [info] All SSH connection tests passed successfully. 14 15 [root@zlm3 07:37:14 ~] 16 #masterha_check_repl --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 17 Fri Aug 3 07:37:37 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 18 Fri Aug 3 07:37:37 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 19 Fri Aug 3 07:37:37 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 20 Fri Aug 3 07:37:37 2018 - [info] MHA::MasterMonitor version 0.56. 21 Fri Aug 3 07:37:38 2018 - [info] GTID failover mode = 1 22 Fri Aug 3 07:37:38 2018 - [info] Dead Servers: 23 Fri Aug 3 07:37:38 2018 - [info] Alive Servers: 24 Fri Aug 3 07:37:38 2018 - [info] 192.168.1.101(192.168.1.101:3306) 25 Fri Aug 3 07:37:38 2018 - [info] 192.168.1.102(192.168.1.102:3306) 26 Fri Aug 3 07:37:38 2018 - [info] Alive Slaves: 27 Fri Aug 3 07:37:38 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 28 Fri Aug 3 07:37:38 2018 - [info] GTID ON 29 Fri Aug 3 07:37:38 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 30 Fri Aug 3 07:37:38 2018 - [info] Primary candidate for the new Master (candidate_master is set) 31 Fri Aug 3 07:37:38 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 32 Fri Aug 3 07:37:38 2018 - [info] Checking slave configurations.. 33 Fri Aug 3 07:37:38 2018 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). 34 Fri Aug 3 07:37:38 2018 - [info] Checking replication filtering settings.. 35 Fri Aug 3 07:37:38 2018 - [info] binlog_do_db= , binlog_ignore_db= 36 Fri Aug 3 07:37:38 2018 - [info] Replication filtering check ok. 37 Fri Aug 3 07:37:38 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. 38 Fri Aug 3 07:37:38 2018 - [info] Checking SSH publickey authentication settings on the current master.. 39 ssh_exchange_identification: Connection closed by remote host 40 Fri Aug 3 07:37:38 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable. 41 Fri Aug 3 07:37:38 2018 - [info] 42 192.168.1.101(192.168.1.101:3306) (current master) 43 +--192.168.1.102(192.168.1.102:3306) 44 45 Fri Aug 3 07:37:38 2018 - [info] Checking replication health on 192.168.1.102.. 46 Fri Aug 3 07:37:38 2018 - [info] ok. 47 Fri Aug 3 07:37:38 2018 - [info] Checking master_ip_failover_script status: 48 Fri Aug 3 07:37:38 2018 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_ssh_port=3306 49 Fri Aug 3 07:37:38 2018 - [info] OK. 50 Fri Aug 3 07:37:38 2018 - [warning] shutdown_script is not defined. 51 Fri Aug 3 07:37:38 2018 - [info] Got exit code 0 (Not master dead). 52 53 MySQL Replication Health is OK. 54 55 [root@zlm3 07:40:03 ~] 56 #Fri Aug 3 07:40:03 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 57 Fri Aug 3 07:40:03 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 58 Fri Aug 3 07:40:03 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 59 ssh_exchange_identification: Connection closed by remote host 60 ^C 61 62 [root@zlm3 07:40:11 ~] 63 #masterha_check_status --conf=/etc/masterha/app1.conf 64 app1 (pid:5628) is running(0:PING_OK), master:192.168.1.101
Switch master to slave and make it become a new slave of new master.
1 [root@zlm3 08:21:27 ~] 2 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60 3 Fri Aug 3 08:21:29 2018 - [info] MHA::MasterRotate version 0.56. 4 Fri Aug 3 08:21:29 2018 - [info] Starting online master switch.. 5 Fri Aug 3 08:21:29 2018 - [info] 6 Fri Aug 3 08:21:29 2018 - [info] * Phase 1: Configuration Check Phase.. 7 Fri Aug 3 08:21:29 2018 - [info] 8 Fri Aug 3 08:21:29 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 9 Fri Aug 3 08:21:29 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 10 Fri Aug 3 08:21:29 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 11 Fri Aug 3 08:21:30 2018 - [info] GTID failover mode = 1 12 Fri Aug 3 08:21:30 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 13 Fri Aug 3 08:21:30 2018 - [info] Alive Slaves: 14 Fri Aug 3 08:21:30 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 15 Fri Aug 3 08:21:30 2018 - [info] GTID ON 16 Fri Aug 3 08:21:30 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 17 Fri Aug 3 08:21:30 2018 - [info] Primary candidate for the new Master (candidate_master is set) 18 19 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes 20 Fri Aug 3 08:21:33 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. 21 Fri Aug 3 08:21:33 2018 - [info] ok. 22 Fri Aug 3 08:21:33 2018 - [info] Checking MHA is not monitoring or doing failover.. 23 Fri Aug 3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again. 24 Fri Aug 3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53. 25 26 //It means that we should stop MHA-manager when donging switchover master. 27 28 [root@zlm3 08:21:33 ~] 29 #masterha_stop --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 30 Stopped app1 successfully. 31 [1]+ Exit 1 masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 32 33 [root@zlm3 08:28:07 ~] 34 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60 35 Fri Aug 3 08:28:21 2018 - [info] MHA::MasterRotate version 0.56. 36 Fri Aug 3 08:28:21 2018 - [info] Starting online master switch.. 37 Fri Aug 3 08:28:21 2018 - [info] 38 Fri Aug 3 08:28:21 2018 - [info] * Phase 1: Configuration Check Phase.. 39 Fri Aug 3 08:28:21 2018 - [info] 40 Fri Aug 3 08:28:21 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 41 Fri Aug 3 08:28:21 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 42 Fri Aug 3 08:28:21 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 43 Fri Aug 3 08:28:22 2018 - [info] GTID failover mode = 1 44 Fri Aug 3 08:28:22 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 45 Fri Aug 3 08:28:22 2018 - [info] Alive Slaves: 46 Fri Aug 3 08:28:22 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 47 Fri Aug 3 08:28:22 2018 - [info] GTID ON 48 Fri Aug 3 08:28:22 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 49 Fri Aug 3 08:28:22 2018 - [info] Primary candidate for the new Master (candidate_master is set) 50 51 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes 52 Fri Aug 3 08:28:25 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. 53 Fri Aug 3 08:28:25 2018 - [info] ok. 54 Fri Aug 3 08:28:25 2018 - [info] Checking MHA is not monitoring or doing failover.. 55 Fri Aug 3 08:28:25 2018 - [info] Checking replication health on 192.168.1.102.. 56 Fri Aug 3 08:28:25 2018 - [info] ok. 57 Fri Aug 3 08:28:25 2018 - [info] 192.168.1.102 can be new master. 58 Fri Aug 3 08:28:25 2018 - [info] 59 From: 60 192.168.1.101(192.168.1.101:3306) (current master) 61 +--192.168.1.102(192.168.1.102:3306) 62 63 To: 64 192.168.1.102(192.168.1.102:3306) (new master) 65 +--192.168.1.101(192.168.1.101:3306) 66 67 Starting master switch from 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306)? (yes/NO): yes 68 Fri Aug 3 08:28:31 2018 - [info] Checking whether 192.168.1.102(192.168.1.102:3306) is ok for the new master.. 69 Fri Aug 3 08:28:31 2018 - [info] ok. 70 Fri Aug 3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. 71 Fri Aug 3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): Resetting slave pointing to the dummy host. 72 Fri Aug 3 08:28:31 2018 - [info] ** Phase 1: Configuration Check Phase completed. 73 Fri Aug 3 08:28:31 2018 - [info] 74 Fri Aug 3 08:28:31 2018 - [info] * Phase 2: Rejecting updates Phase.. 75 Fri Aug 3 08:28:31 2018 - [info] 76 Fri Aug 3 08:28:31 2018 - [info] Executing master ip online change script to disable write on the current master: 77 Fri Aug 3 08:28:31 2018 - [info] /etc/masterha/master_ip_online_change --command=stop --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user='zlm' --orig_master_password='zlmzlm' --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_ssh_port=3306 --new_master_ssh_port=3306 --orig_master_is_new_slave 78 Unknown option: new_master_ssh_port 79 Fri Aug 3 08:28:32 2018 116409 Set read_only on the new master.. ok. 80 Fri Aug 3 08:28:32 2018 125643 drop vip 10.33.101.239.. 81 ssh_exchange_identification: Connection closed by remote host 82 Fri Aug 3 08:28:32 2018 142948 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds) 83 {'Time' => '13435','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'} 84 Fri Aug 3 08:28:32 2018 646769 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds) 85 {'Time' => '13435','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'} 86 Fri Aug 3 08:28:33 2018 149221 Waiting all running 1 threads are disconnected.. (max 500 milliseconds) 87 {'Time' => '13436','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'} 88 Fri Aug 3 08:28:33 2018 650816 Set read_only=1 on the orig master.. ok. 89 Fri Aug 3 08:28:33 2018 653323 Waiting all running 1 queries are disconnected.. (max 500 milliseconds) 90 {'Time' => '13436','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'} 91 Fri Aug 3 08:28:34 2018 154965 Killing all application threads.. 92 Fri Aug 3 08:28:34 2018 167919 done. 93 Fri Aug 3 08:28:34 2018 - [info] ok. 94 Fri Aug 3 08:28:34 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root): 95 Fri Aug 3 08:28:34 2018 - [info] Executing FLUSH TABLES WITH READ LOCK.. 96 Fri Aug 3 08:28:34 2018 - [info] ok. 97 Fri Aug 3 08:28:34 2018 - [info] Orig master binlog:pos is mysql-bin.000050:2361. 98 Fri Aug 3 08:28:34 2018 - [info] Waiting to execute all relay logs on 192.168.1.102(192.168.1.102:3306).. 99 Fri Aug 3 08:28:34 2018 - [info] master_pos_wait(mysql-bin.000050:2361) completed on 192.168.1.102(192.168.1.102:3306). Executed 0 events. 100 Fri Aug 3 08:28:34 2018 - [info] done. 101 Fri Aug 3 08:28:34 2018 - [info] Getting new master's binlog name and position.. 102 Fri Aug 3 08:28:34 2018 - [info] mysql-bin.000003:2321 103 Fri Aug 3 08:28:34 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; 104 Fri Aug 3 08:28:34 2018 - [info] Executing master ip online change script to allow write on the new master: 105 Fri Aug 3 08:28:34 2018 - [info] /etc/masterha/master_ip_online_change --command=start --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user='zlm' --orig_master_password='zlmzlm' --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_ssh_port=3306 --new_master_ssh_port=3306 --orig_master_is_new_slave 106 Unknown option: new_master_ssh_port 107 Fri Aug 3 08:28:34 2018 327146 Set read_only=0 on the new master. 108 Fri Aug 3 08:28:34 2018 328259Add vip 10.33.101.239 on p3p1.. 109 ssh_exchange_identification: Connection closed by remote host 110 Fri Aug 3 08:28:34 2018 - [info] ok. 111 Fri Aug 3 08:28:34 2018 - [info] 112 Fri Aug 3 08:28:34 2018 - [info] * Switching slaves in parallel.. 113 Fri Aug 3 08:28:34 2018 - [info] 114 Fri Aug 3 08:28:34 2018 - [info] Unlocking all tables on the orig master: 115 Fri Aug 3 08:28:34 2018 - [info] Executing UNLOCK TABLES.. 116 Fri Aug 3 08:28:34 2018 - [info] ok. 117 Fri Aug 3 08:28:34 2018 - [info] Starting orig master as a new slave.. 118 Fri Aug 3 08:28:34 2018 - [info] Resetting slave 192.168.1.101(192.168.1.101:3306) and starting replication from the new master 192.168.1.102(192.168.1.102:3306).. 119 Fri Aug 3 08:28:34 2018 - [info] Executed CHANGE MASTER. 120 Fri Aug 3 08:28:35 2018 - [info] Slave started. 121 Fri Aug 3 08:28:35 2018 - [info] All new slave servers switched successfully. 122 Fri Aug 3 08:28:35 2018 - [info] 123 Fri Aug 3 08:28:35 2018 - [info] * Phase 5: New master cleanup phase.. 124 Fri Aug 3 08:28:35 2018 - [info] 125 Fri Aug 3 08:28:35 2018 - [info] 192.168.1.102: Resetting slave info succeeded. 126 Fri Aug 3 08:28:35 2018 - [info] Switching master to 192.168.1.102(192.168.1.102:3306) completed successfully. 127 128 [root@zlm3 08:28:35 ~] 129 #
Check the master-slave replication status.
1 //New master(original slave) 2 (zlm@192.168.1.102 3306)[(none)]>show master status; 3 +------------------+----------+--------------+------------------+------------------------------------------------+ 4 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 5 +------------------+----------+--------------+------------------+------------------------------------------------+ 6 | mysql-bin.000003 | 2321 | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 | 7 +------------------+----------+--------------+------------------+------------------------------------------------+ 8 1 row in set (0.00 sec) 9 10 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 11 Empty set (0.00 sec) 12 13 //New slave(original master) 14 (zlm@192.168.1.101 3306)[(none)]>show master status; 15 +------------------+----------+--------------+------------------+------------------------------------------------+ 16 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 17 +------------------+----------+--------------+------------------+------------------------------------------------+ 18 | mysql-bin.000050 | 2361 | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 | 19 +------------------+----------+--------------+------------------+------------------------------------------------+ 20 1 row in set (0.01 sec) 21 22 (zlm@192.168.1.101 3306)[(none)]>show slave statusG 23 *************************** 1. row *************************** 24 Slave_IO_State: Waiting for master to send event 25 Master_Host: 192.168.1.102 26 Master_User: repl 27 Master_Port: 3306 28 Connect_Retry: 60 29 Master_Log_File: mysql-bin.000003 30 Read_Master_Log_Pos: 2321 31 Relay_Log_File: relay-bin.000002 32 Relay_Log_Pos: 398 33 Relay_Master_Log_File: mysql-bin.000003 34 Slave_IO_Running: Yes 35 Slave_SQL_Running: Yes 36 Replicate_Do_DB: 37 Replicate_Ignore_DB: 38 Replicate_Do_Table: 39 Replicate_Ignore_Table: 40 Replicate_Wild_Do_Table: 41 Replicate_Wild_Ignore_Table: 42 Last_Errno: 0 43 Last_Error: 44 Skip_Counter: 0 45 Exec_Master_Log_Pos: 2321 46 Relay_Log_Space: 591 47 Until_Condition: None 48 Until_Log_File: 49 Until_Log_Pos: 0 50 Master_SSL_Allowed: No 51 Master_SSL_CA_File: 52 Master_SSL_CA_Path: 53 Master_SSL_Cert: 54 Master_SSL_Cipher: 55 Master_SSL_Key: 56 Seconds_Behind_Master: 0 57 Master_SSL_Verify_Server_Cert: No 58 Last_IO_Errno: 0 59 Last_IO_Error: 60 Last_SQL_Errno: 0 61 Last_SQL_Error: 62 Replicate_Ignore_Server_Ids: 63 Master_Server_Id: 1023306 64 Master_UUID: 842ea497-9551-11e8-83ca-080027de0e0e 65 Master_Info_File: mysql.slave_master_info 66 SQL_Delay: 0 67 SQL_Remaining_Delay: NULL 68 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 69 Master_Retry_Count: 86400 70 Master_Bind: 71 Last_IO_Error_Timestamp: 72 Last_SQL_Error_Timestamp: 73 Master_SSL_Crl: 74 Master_SSL_Crlpath: 75 Retrieved_Gtid_Set: 76 Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 77 Auto_Position: 1 78 Replicate_Rewrite_DB: 79 Channel_Name: 80 Master_TLS_Version: 81 1 row in set (0.00 sec)
Check the log of MasterHA on zlm3.
1 [root@zlm3 08:28:35 ~] 2 #cd /var/log/masterha/app1 3 4 [root@zlm3 08:29:12 /var/log/masterha/app1] 5 #cat app1.log 6 Fri Aug 3 07:39:13 2018 - [info] MHA::MasterMonitor version 0.56. 7 Fri Aug 3 07:39:14 2018 - [info] GTID failover mode = 1 8 Fri Aug 3 07:39:14 2018 - [info] Dead Servers: 9 Fri Aug 3 07:39:14 2018 - [info] Alive Servers: 10 Fri Aug 3 07:39:14 2018 - [info] 192.168.1.101(192.168.1.101:3306) 11 Fri Aug 3 07:39:14 2018 - [info] 192.168.1.102(192.168.1.102:3306) 12 Fri Aug 3 07:39:14 2018 - [info] Alive Slaves: 13 Fri Aug 3 07:39:14 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 14 Fri Aug 3 07:39:14 2018 - [info] GTID ON 15 Fri Aug 3 07:39:14 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 16 Fri Aug 3 07:39:14 2018 - [info] Primary candidate for the new Master (candidate_master is set) 17 Fri Aug 3 07:39:14 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 18 Fri Aug 3 07:39:14 2018 - [info] Checking slave configurations.. 19 Fri Aug 3 07:39:14 2018 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). 20 Fri Aug 3 07:39:14 2018 - [info] Checking replication filtering settings.. 21 Fri Aug 3 07:39:14 2018 - [info] binlog_do_db= , binlog_ignore_db= 22 Fri Aug 3 07:39:14 2018 - [info] Replication filtering check ok. 23 Fri Aug 3 07:39:14 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. 24 Fri Aug 3 07:39:14 2018 - [info] Checking SSH publickey authentication settings on the current master.. 25 Fri Aug 3 07:39:14 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable. 26 Fri Aug 3 07:39:14 2018 - [info] 27 192.168.1.101(192.168.1.101:3306) (current master) 28 +--192.168.1.102(192.168.1.102:3306) 29 30 Fri Aug 3 07:39:14 2018 - [info] Checking master_ip_failover_script status: 31 Fri Aug 3 07:39:14 2018 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_ssh_port=3306 32 Fri Aug 3 07:39:14 2018 - [info] OK. 33 Fri Aug 3 07:39:14 2018 - [warning] shutdown_script is not defined. 34 Fri Aug 3 07:39:14 2018 - [info] Set master ping interval 1 seconds. 35 Fri Aug 3 07:39:14 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. 36 Fri Aug 3 07:39:14 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306).. 37 Fri Aug 3 07:39:14 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 38 Fri Aug 3 07:39:27 2018 - [info] Got terminate signal. Exit. 39 Fri Aug 3 07:40:03 2018 - [info] MHA::MasterMonitor version 0.56. 40 Fri Aug 3 07:40:04 2018 - [info] GTID failover mode = 1 41 Fri Aug 3 07:40:04 2018 - [info] Dead Servers: 42 Fri Aug 3 07:40:04 2018 - [info] Alive Servers: 43 Fri Aug 3 07:40:04 2018 - [info] 192.168.1.101(192.168.1.101:3306) 44 Fri Aug 3 07:40:04 2018 - [info] 192.168.1.102(192.168.1.102:3306) 45 Fri Aug 3 07:40:04 2018 - [info] Alive Slaves: 46 Fri Aug 3 07:40:04 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 47 Fri Aug 3 07:40:04 2018 - [info] GTID ON 48 Fri Aug 3 07:40:04 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 49 Fri Aug 3 07:40:04 2018 - [info] Primary candidate for the new Master (candidate_master is set) 50 Fri Aug 3 07:40:04 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 51 Fri Aug 3 07:40:04 2018 - [info] Checking slave configurations.. 52 Fri Aug 3 07:40:04 2018 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). 53 Fri Aug 3 07:40:04 2018 - [info] Checking replication filtering settings.. 54 Fri Aug 3 07:40:04 2018 - [info] binlog_do_db= , binlog_ignore_db= 55 Fri Aug 3 07:40:04 2018 - [info] Replication filtering check ok. 56 Fri Aug 3 07:40:04 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. 57 Fri Aug 3 07:40:04 2018 - [info] Checking SSH publickey authentication settings on the current master.. 58 Fri Aug 3 07:40:04 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable. 59 Fri Aug 3 07:40:04 2018 - [info] 60 192.168.1.101(192.168.1.101:3306) (current master) 61 +--192.168.1.102(192.168.1.102:3306) 62 63 Fri Aug 3 07:40:04 2018 - [info] Checking master_ip_failover_script status: 64 Fri Aug 3 07:40:04 2018 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_ssh_port=3306 65 Fri Aug 3 07:40:04 2018 - [info] OK. 66 Fri Aug 3 07:40:04 2018 - [warning] shutdown_script is not defined. 67 Fri Aug 3 07:40:04 2018 - [info] Set master ping interval 1 seconds. 68 Fri Aug 3 07:40:04 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. 69 Fri Aug 3 07:40:04 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306).. 70 Fri Aug 3 07:40:04 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 71 Fri Aug 3 08:28:07 2018 - [info] Got terminate signal. Exit.
Test 2:Manual master failover
Execute "masterha_master_switch" script again to generate a failover on zlm2.
1 [root@zlm2 10:11:10 ~] 2 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --dead_master_host=192.168.1.102 --master_state=dead --new_master_host=192.168.1.101 --ignore_last_failover 3 --dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.102. 4 --dead_master_port=<dead_master_port> is not set. Using 3306. 5 Fri Aug 3 10:11:50 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 6 Fri Aug 3 10:11:50 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 7 Fri Aug 3 10:11:50 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 8 Fri Aug 3 10:11:50 2018 - [info] MHA::MasterFailover version 0.56. 9 Fri Aug 3 10:11:50 2018 - [info] Starting master failover. 10 Fri Aug 3 10:11:50 2018 - [info] 11 Fri Aug 3 10:11:50 2018 - [info] * Phase 1: Configuration Check Phase.. 12 Fri Aug 3 10:11:50 2018 - [info] 13 Fri Aug 3 10:11:51 2018 - [info] GTID failover mode = 1 14 Fri Aug 3 10:11:51 2018 - [info] Dead Servers: 15 Fri Aug 3 10:11:51 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover. 16 Fri Aug 3 10:11:51 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53. 17 18 //Stop mysqld of master on zlm3. 19 [root@zlm3 10:13:56 ~] 20 #mysqladmin shutdown 21 22 [root@zlm3 10:14:04 ~] 23 #ps aux|grep mysqld 24 mysql 5368 0.0 19.6 1110812 200292 pts/0 Sl 04:44 0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf 25 root 8827 0.0 0.0 112640 960 pts/0 R+ 10:14 0:00 grep --color=auto mysqld 26 27 [root@zlm3 10:14:08 ~] 28 #ps aux|grep mysqld 29 mysql 5368 0.0 19.6 1110812 200292 pts/0 Sl 04:44 0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf 30 root 8833 0.0 0.0 112640 960 pts/0 R+ 10:14 0:00 grep --color=auto mysqld 31 32 [root@zlm3 10:14:12 ~] 33 #ps aux|grep mysqld 34 mysql 5368 0.0 19.1 995088 194692 pts/0 Sl 04:44 0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf 35 root 8839 0.0 0.0 112640 960 pts/0 R+ 10:14 0:00 grep --color=auto mysqld 36 37 [root@zlm3 10:14:23 ~] 38 #ps aux|grep mysqld 39 root 8854 0.0 0.0 112640 960 pts/0 R+ 10:14 0:00 grep --color=auto mysqld 40 41 //Execute the above command again on zlm2. 42 [root@zlm2 10:15:43 ~] 43 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --dead_master_host=192.168.1.102 --master_state=dead --new_master_host=192.168.1.101 --ignore_last_failover 44 --dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.102. 45 --dead_master_port=<dead_master_port> is not set. Using 3306. 46 Fri Aug 3 10:15:43 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 47 Fri Aug 3 10:15:43 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 48 Fri Aug 3 10:15:43 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 49 Fri Aug 3 10:15:43 2018 - [info] MHA::MasterFailover version 0.56. 50 Fri Aug 3 10:15:43 2018 - [info] Starting master failover. 51 Fri Aug 3 10:15:43 2018 - [info] 52 Fri Aug 3 10:15:43 2018 - [info] * Phase 1: Configuration Check Phase.. 53 Fri Aug 3 10:15:43 2018 - [info] 54 Fri Aug 3 10:15:44 2018 - [info] GTID failover mode = 1 55 Fri Aug 3 10:15:44 2018 - [info] Dead Servers: 56 Fri Aug 3 10:15:44 2018 - [info] 192.168.1.102(192.168.1.102:3306) 57 Fri Aug 3 10:15:44 2018 - [info] Checking master reachability via MySQL(double check)... 58 Fri Aug 3 10:15:44 2018 - [info] ok. 59 Fri Aug 3 10:15:44 2018 - [info] Alive Servers: 60 Fri Aug 3 10:15:44 2018 - [info] 192.168.1.101(192.168.1.101:3306) 61 Fri Aug 3 10:15:44 2018 - [info] Alive Slaves: 62 Fri Aug 3 10:15:44 2018 - [info] 192.168.1.101(192.168.1.101:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 63 Fri Aug 3 10:15:44 2018 - [info] GTID ON 64 Fri Aug 3 10:15:44 2018 - [info] Replicating from 192.168.1.102(192.168.1.102:3306) 65 Fri Aug 3 10:15:44 2018 - [info] Primary candidate for the new Master (candidate_master is set) 66 Master 192.168.1.102(192.168.1.102:3306) is dead. Proceed? (yes/NO): yes 67 Fri Aug 3 10:15:46 2018 - [info] Starting GTID based failover. 68 Fri Aug 3 10:15:46 2018 - [info] 69 Fri Aug 3 10:15:46 2018 - [info] ** Phase 1: Configuration Check Phase completed. 70 Fri Aug 3 10:15:46 2018 - [info] 71 Fri Aug 3 10:15:46 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. 72 Fri Aug 3 10:15:46 2018 - [info] 73 ssh: connect to host 192.168.1.102 port 3306: Connection refused 74 Fri Aug 3 10:15:46 2018 - [warning] HealthCheck: SSH to 192.168.1.102 is NOT reachable. 75 Fri Aug 3 10:15:46 2018 - [info] Forcing shutdown so that applications never connect to the current master.. 76 Fri Aug 3 10:15:46 2018 - [info] Executing master IP deactivation script: 77 Fri Aug 3 10:15:46 2018 - [info] /etc/masterha/master_ip_failover --orig_master_host=192.168.1.102 --orig_master_ip=192.168.1.102 --orig_master_port=3306 --command=stop --orig_master_ssh_port=3306 78 ssh: connect to host 192.168.1.102 port 3306: Connection refused 79 Fri Aug 3 10:15:48 2018 - [info] done. 80 Fri Aug 3 10:15:48 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. 81 Fri Aug 3 10:15:48 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. 82 Fri Aug 3 10:15:48 2018 - [info] 83 Fri Aug 3 10:15:48 2018 - [info] * Phase 3: Master Recovery Phase.. 84 Fri Aug 3 10:15:48 2018 - [info] 85 Fri Aug 3 10:15:48 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. 86 Fri Aug 3 10:15:48 2018 - [info] 87 Fri Aug 3 10:15:48 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:2321 88 Fri Aug 3 10:15:48 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): 89 Fri Aug 3 10:15:48 2018 - [info] 192.168.1.101(192.168.1.101:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 90 Fri Aug 3 10:15:48 2018 - [info] GTID ON 91 Fri Aug 3 10:15:48 2018 - [info] Replicating from 192.168.1.102(192.168.1.102:3306) 92 Fri Aug 3 10:15:48 2018 - [info] Primary candidate for the new Master (candidate_master is set) 93 Fri Aug 3 10:15:48 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:2321 94 Fri Aug 3 10:15:48 2018 - [info] Oldest slaves: 95 Fri Aug 3 10:15:48 2018 - [info] 192.168.1.101(192.168.1.101:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 96 Fri Aug 3 10:15:48 2018 - [info] GTID ON 97 Fri Aug 3 10:15:48 2018 - [info] Replicating from 192.168.1.102(192.168.1.102:3306) 98 Fri Aug 3 10:15:48 2018 - [info] Primary candidate for the new Master (candidate_master is set) 99 Fri Aug 3 10:15:48 2018 - [info] 100 Fri Aug 3 10:15:48 2018 - [info] * Phase 3.3: Determining New Master Phase.. 101 Fri Aug 3 10:15:48 2018 - [info] 102 Fri Aug 3 10:15:48 2018 - [info] 192.168.1.101 can be new master. 103 Fri Aug 3 10:15:48 2018 - [info] New master is 192.168.1.101(192.168.1.101:3306) 104 Fri Aug 3 10:15:48 2018 - [info] Starting master failover.. 105 Fri Aug 3 10:15:48 2018 - [info] 106 From: 107 192.168.1.102(192.168.1.102:3306) (current master) 108 +--192.168.1.101(192.168.1.101:3306) 109 110 To: 111 192.168.1.101(192.168.1.101:3306) (new master) 112 113 Starting master switch from 192.168.1.102(192.168.1.102:3306) to 192.168.1.101(192.168.1.101:3306)? (yes/NO): yes 114 Fri Aug 3 10:15:56 2018 - [info] New master decided manually is 192.168.1.101(192.168.1.101:3306) 115 Fri Aug 3 10:15:56 2018 - [info] 116 Fri Aug 3 10:15:56 2018 - [info] * Phase 3.3: New Master Recovery Phase.. 117 Fri Aug 3 10:15:56 2018 - [info] 118 Fri Aug 3 10:15:56 2018 - [info] Waiting all logs to be applied.. 119 Fri Aug 3 10:15:56 2018 - [info] done. 120 Fri Aug 3 10:15:56 2018 - [info] Getting new master's binlog name and position.. 121 Fri Aug 3 10:15:56 2018 - [info] mysql-bin.000051:190 122 Fri Aug 3 10:15:56 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.101', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; 123 Fri Aug 3 10:15:56 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000051, 190, 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 124 Fri Aug 3 10:15:56 2018 - [info] Executing master IP activate script: 125 Fri Aug 3 10:15:56 2018 - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.102 --orig_master_ip=192.168.1.102 --orig_master_port=3306 --new_master_host=192.168.1.101 --new_master_ip=192.168.1.101 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_port=3306 --new_master_ssh_port=3306 126 Unknown option: new_master_ssh_port 127 Set read_only=0 on the new master. 128 ssh_exchange_identification: Connection closed by remote host 129 Fri Aug 3 10:15:56 2018 - [info] OK. 130 Fri Aug 3 10:15:56 2018 - [info] ** Finished master recovery successfully. 131 Fri Aug 3 10:15:56 2018 - [info] * Phase 3: Master Recovery Phase completed. 132 Fri Aug 3 10:15:56 2018 - [info] 133 Fri Aug 3 10:15:56 2018 - [info] * Phase 4: Slaves Recovery Phase.. 134 Fri Aug 3 10:15:56 2018 - [info] 135 Fri Aug 3 10:15:56 2018 - [info] 136 Fri Aug 3 10:15:56 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. 137 Fri Aug 3 10:15:56 2018 - [info] 138 Fri Aug 3 10:15:56 2018 - [info] All new slave servers recovered successfully. 139 Fri Aug 3 10:15:56 2018 - [info] 140 Fri Aug 3 10:15:56 2018 - [info] * Phase 5: New master cleanup phase.. 141 Fri Aug 3 10:15:56 2018 - [info] 142 Fri Aug 3 10:15:56 2018 - [info] Resetting slave info on the new master.. 143 Fri Aug 3 10:15:56 2018 - [info] 192.168.1.101: Resetting slave info succeeded. 144 Fri Aug 3 10:15:56 2018 - [info] Master failover to 192.168.1.101(192.168.1.101:3306) completed successfully. 145 Fri Aug 3 10:15:56 2018 - [info] 146 147 ----- Failover Report ----- 148 149 app1: MySQL Master failover 192.168.1.102(192.168.1.102:3306) to 192.168.1.101(192.168.1.101:3306) succeeded 150 151 Master 192.168.1.102(192.168.1.102:3306) is down! 152 153 Check MHA Manager logs at zlm2 for details. 154 155 Started manual(interactive) failover. 156 Invalidated master IP address on 192.168.1.102(192.168.1.102:3306) 157 Selected 192.168.1.101(192.168.1.101:3306) as a new master. 158 192.168.1.101(192.168.1.101:3306): OK: Applying all logs succeeded. 159 192.168.1.101(192.168.1.101:3306): OK: Activated master IP address. 160 192.168.1.101(192.168.1.101:3306): Resetting slave info succeeded. 161 Master failover to 192.168.1.101(192.168.1.101:3306) completed successfully.
Check the status of new master on zlm2.
1 (zlm@192.168.1.101 3306)[(none)]>show master status; 2 +------------------+----------+--------------+------------------+------------------------------------------------+ 3 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 4 +------------------+----------+--------------+------------------+------------------------------------------------+ 5 | mysql-bin.000051 | 190 | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 | 6 +------------------+----------+--------------+------------------+------------------------------------------------+ 7 1 row in set (0.00 sec) 8 9 (zlm@192.168.1.101 3306)[(none)]>show slave statusG 10 Empty set (0.00 sec)
Check the file and log on MasterHA.
1 [root@zlm2 10:15:56 ~] 2 #cd /var/log/masterha/app1 3 4 [root@zlm2 10:20:04 /var/log/masterha/app1] 5 #ls -l 6 total 4 7 -rw-r--r-- 1 root root 0 Aug 3 10:15 app1.failover.complete 8 -rw-r--r-- 1 root root 3883 Aug 2 11:29 app1.log 9 10 //The option of "--ignore_last_failover" can neglect the influence of existence of "app1.failover.complete".Otherwise,the failover operation will be terminated by error. 11 //This file will be created after a failover operation and it will be created only on the original slave who wants to become a new master. 12 13 [root@zlm2 10:20:05 /var/log/masterha/app1] 14 #cat app1.log 15 Thu Aug 2 11:12:03 2018 - [info] MHA::MasterMonitor version 0.56. 16 Thu Aug 2 11:12:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 192.168.1.101(192.168.1.101:3306) :1045:Access denied for user 'root'@'zlm2' (using password: NO), but this is not a MySQL crash. Check MySQL server settings. 17 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297. 18 Thu Aug 2 11:12:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 192.168.1.102(192.168.1.102:3306) :1045:Access denied for user 'root'@'zlm2' (using password: NO), but this is not a MySQL crash. Check MySQL server settings. 19 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297. 20 Thu Aug 2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations 21 Thu Aug 2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326. 22 Thu Aug 2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. 23 Thu Aug 2 11:12:05 2018 - [info] Got exit code 1 (Not master dead). 24 Thu Aug 2 11:13:56 2018 - [info] MHA::MasterMonitor version 0.56. 25 Thu Aug 2 11:13:57 2018 - [info] GTID failover mode = 1 26 Thu Aug 2 11:13:57 2018 - [info] Dead Servers: 27 Thu Aug 2 11:13:57 2018 - [info] Alive Servers: 28 Thu Aug 2 11:13:57 2018 - [info] 192.168.1.101(192.168.1.101:3306) 29 Thu Aug 2 11:13:57 2018 - [info] 192.168.1.102(192.168.1.102:3306) 30 Thu Aug 2 11:13:57 2018 - [info] Alive Slaves: 31 Thu Aug 2 11:13:57 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 32 Thu Aug 2 11:13:57 2018 - [info] GTID ON 33 Thu Aug 2 11:13:57 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 34 Thu Aug 2 11:13:57 2018 - [info] Primary candidate for the new Master (candidate_master is set) 35 Thu Aug 2 11:13:57 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) 36 Thu Aug 2 11:13:57 2018 - [info] Checking slave configurations.. 37 Thu Aug 2 11:13:57 2018 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). 38 Thu Aug 2 11:13:57 2018 - [info] Checking replication filtering settings.. 39 Thu Aug 2 11:13:57 2018 - [info] binlog_do_db= , binlog_ignore_db= 40 Thu Aug 2 11:13:57 2018 - [info] Replication filtering check ok. 41 Thu Aug 2 11:13:57 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. 42 Thu Aug 2 11:13:57 2018 - [info] Checking SSH publickey authentication settings on the current master.. 43 Thu Aug 2 11:13:57 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable. 44 Thu Aug 2 11:13:57 2018 - [info] 45 192.168.1.101(192.168.1.101:3306) (current master) 46 +--192.168.1.102(192.168.1.102:3306) 47 48 Thu Aug 2 11:13:57 2018 - [info] Checking master_ip_failover_script status: 49 Thu Aug 2 11:13:57 2018 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_ssh_port=3306 50 Thu Aug 2 11:13:57 2018 - [info] OK. 51 Thu Aug 2 11:13:57 2018 - [warning] shutdown_script is not defined. 52 Thu Aug 2 11:13:57 2018 - [info] Set master ping interval 1 seconds. 53 Thu Aug 2 11:13:57 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. 54 Thu Aug 2 11:13:57 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306).. 55 Thu Aug 2 11:13:57 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 56 Thu Aug 2 11:29:51 2018 - [info] Got terminate signal. Exit.
Test3: Automation of master switchover.
Repair the salve replication on zlm3.
1 [root@zlm3 10:14:24 ~] 2 #mysql 3 Welcome to the MySQL monitor. Commands end with ; or g. 4 Your MySQL connection id is 3 5 Server version: 5.7.21-log MySQL Community Server (GPL) 6 7 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. 8 9 Oracle is a registered trademark of Oracle Corporation and/or its 10 affiliates. Other names may be trademarks of their respective 11 owners. 12 13 Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. 14 15 (zlm@192.168.1.102 3306)[(none)]>change master to 16 -> master_host='192.168.1.101', 17 -> master_port=3306, 18 -> master_user='repl', 19 -> master_password='repl4slave', 20 -> master_auto_position=1; 21 Query OK, 0 rows affected, 2 warnings (0.02 sec) 22 23 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 24 *************************** 1. row *************************** 25 Slave_IO_State: Waiting for master to send event 26 Master_Host: 192.168.1.101 27 Master_User: repl 28 Master_Port: 3306 29 Connect_Retry: 60 30 Master_Log_File: mysql-bin.000051 31 Read_Master_Log_Pos: 190 32 Relay_Log_File: relay-bin.000002 33 Relay_Log_Pos: 355 34 Relay_Master_Log_File: mysql-bin.000051 35 Slave_IO_Running: Yes 36 Slave_SQL_Running: Yes 37 Replicate_Do_DB: 38 Replicate_Ignore_DB: 39 Replicate_Do_Table: 40 Replicate_Ignore_Table: 41 Replicate_Wild_Do_Table: 42 Replicate_Wild_Ignore_Table: 43 Last_Errno: 0 44 Last_Error: 45 Skip_Counter: 0 46 Exec_Master_Log_Pos: 190 47 Relay_Log_Space: 548 48 Until_Condition: None 49 Until_Log_File: 50 Until_Log_Pos: 0 51 Master_SSL_Allowed: No 52 Master_SSL_CA_File: 53 Master_SSL_CA_Path: 54 Master_SSL_Cert: 55 Master_SSL_Cipher: 56 Master_SSL_Key: 57 Seconds_Behind_Master: 0 58 Master_SSL_Verify_Server_Cert: No 59 Last_IO_Errno: 0 60 Last_IO_Error: 61 Last_SQL_Errno: 0 62 Last_SQL_Error: 63 Replicate_Ignore_Server_Ids: 64 Master_Server_Id: 1013306 65 Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 66 Master_Info_File: mysql.slave_master_info 67 SQL_Delay: 0 68 SQL_Remaining_Delay: NULL 69 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 70 Master_Retry_Count: 86400 71 Master_Bind: 72 Last_IO_Error_Timestamp: 73 Last_SQL_Error_Timestamp: 74 Master_SSL_Crl: 75 Master_SSL_Crlpath: 76 Retrieved_Gtid_Set: 77 Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 78 Auto_Position: 1 79 Replicate_Rewrite_DB: 80 Channel_Name: 81 Master_TLS_Version: 82 1 row in set (0.00 sec)
Start MasterHA-manager.
1 [root@zlm3 10:48:00 /var/log/masterha/app1] 2 #nohup masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf & 3 [1] 9265 4 nohup: ignoring input and appending output to ‘nohup.out’ 5 6 [root@zlm3 10:48:12 /var/log/masterha/app1] 7 #ls -l 8 total 24 9 -rw-r--r-- 1 root root 16370 Aug 3 10:48 app1.log 10 -rw-r--r-- 1 root root 35 Aug 3 10:48 app1.master_status.health //This file is created only when MasterHA-manager is running.It will continuously record the health status between slave and master. 11 -rw------- 1 root root 371 Aug 3 10:48 nohup.out 12 13 [root@zlm3 10:48:14 /var/log/masterha/app1] 14 #cat nohup.out 15 Fri Aug 3 10:48:12 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 16 Fri Aug 3 10:48:12 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 17 Fri Aug 3 10:48:12 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 18 ssh_exchange_identification: Connection closed by remote host 19 20 [root@zlm3 10:48:26 /var/log/masterha/app1] 21 #cat app1.master_status.health 22 9265 0:PING_OK master:192.168.1.101 23 [root@zlm3 10:48:31 /var/log/masterha/app1] 24 #ps aux|grep manager 25 root 9265 0.6 2.1 299172 21516 pts/1 S 10:48 0:00 perl /usr/bin/masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 26 root 9332 0.0 0.0 112640 960 pts/1 R+ 10:48 0:00 grep --color=auto manager
Kill mysqld on zlm2 to pretend the master is dead.
1 [root@zlm2 10:54:31 /var/log/masterha/app1] 2 #pkill mysqld 3 4 [root@zlm2 10:55:28 /var/log/masterha/app1] 5 #ps aux|grep mysqld 6 root 6067 0.0 0.0 112640 960 pts/1 R+ 10:55 0:00 grep --color=auto mysqld
Observe the app1.log on zlm3.
1 [root@zlm3 10:54:59 /var/log/masterha/app1] 2 #echo ''> app1.log 3 4 [root@zlm3 10:55:17 /var/log/masterha/app1] 5 #tail -f app1.log 6 7 Fri Aug 3 10:55:29 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) 8 Fri Aug 3 10:55:29 2018 - [info] Executing SSH check script: exit 0 9 Fri Aug 3 10:55:29 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable. 10 Fri Aug 3 10:55:30 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111)) 11 Fri Aug 3 10:55:30 2018 - [warning] Connection failed 2 time(s).. 12 Fri Aug 3 10:55:31 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111)) 13 Fri Aug 3 10:55:31 2018 - [warning] Connection failed 3 time(s).. 14 Fri Aug 3 10:55:32 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111)) 15 Fri Aug 3 10:55:32 2018 - [warning] Connection failed 4 time(s).. 16 Fri Aug 3 10:55:32 2018 - [warning] Master is not reachable from health checker! 17 Fri Aug 3 10:55:32 2018 - [warning] Master 192.168.1.101(192.168.1.101:3306) is not reachable! 18 Fri Aug 3 10:55:32 2018 - [warning] SSH is NOT reachable. 19 Fri Aug 3 10:55:32 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/masterha_default.conf and /etc/masterha/app1.conf again, and trying to connect to all servers to check server status.. 20 Fri Aug 3 10:55:32 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. 21 Fri Aug 3 10:55:32 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. 22 Fri Aug 3 10:55:32 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. 23 Fri Aug 3 10:55:33 2018 - [info] GTID failover mode = 1 24 Fri Aug 3 10:55:33 2018 - [info] Dead Servers: 25 Fri Aug 3 10:55:33 2018 - [info] 192.168.1.101(192.168.1.101:3306) 26 Fri Aug 3 10:55:33 2018 - [info] Alive Servers: 27 Fri Aug 3 10:55:33 2018 - [info] 192.168.1.102(192.168.1.102:3306) 28 Fri Aug 3 10:55:33 2018 - [info] Alive Slaves: 29 Fri Aug 3 10:55:33 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 30 Fri Aug 3 10:55:33 2018 - [info] GTID ON 31 Fri Aug 3 10:55:33 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 32 Fri Aug 3 10:55:33 2018 - [info] Primary candidate for the new Master (candidate_master is set) 33 Fri Aug 3 10:55:33 2018 - [info] Checking slave configurations.. 34 Fri Aug 3 10:55:33 2018 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). 35 Fri Aug 3 10:55:33 2018 - [info] Checking replication filtering settings.. 36 Fri Aug 3 10:55:33 2018 - [info] Replication filtering check ok. 37 Fri Aug 3 10:55:33 2018 - [info] Master is down! 38 Fri Aug 3 10:55:33 2018 - [info] Terminating monitoring script. 39 Fri Aug 3 10:55:33 2018 - [info] Got exit code 20 (Master dead). 40 Fri Aug 3 10:55:33 2018 - [info] MHA::MasterFailover version 0.56. 41 Fri Aug 3 10:55:33 2018 - [info] Starting master failover. 42 Fri Aug 3 10:55:33 2018 - [info] 43 Fri Aug 3 10:55:33 2018 - [info] * Phase 1: Configuration Check Phase.. 44 Fri Aug 3 10:55:33 2018 - [info] 45 Fri Aug 3 10:55:34 2018 - [info] GTID failover mode = 1 46 Fri Aug 3 10:55:34 2018 - [info] Dead Servers: 47 Fri Aug 3 10:55:34 2018 - [info] 192.168.1.101(192.168.1.101:3306) 48 Fri Aug 3 10:55:34 2018 - [info] Checking master reachability via MySQL(double check)... 49 Fri Aug 3 10:55:34 2018 - [info] ok. 50 Fri Aug 3 10:55:34 2018 - [info] Alive Servers: 51 Fri Aug 3 10:55:34 2018 - [info] 192.168.1.102(192.168.1.102:3306) 52 Fri Aug 3 10:55:34 2018 - [info] Alive Slaves: 53 Fri Aug 3 10:55:34 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 54 Fri Aug 3 10:55:34 2018 - [info] GTID ON 55 Fri Aug 3 10:55:34 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 56 Fri Aug 3 10:55:34 2018 - [info] Primary candidate for the new Master (candidate_master is set) 57 Fri Aug 3 10:55:34 2018 - [info] Starting GTID based failover. 58 Fri Aug 3 10:55:34 2018 - [info] 59 Fri Aug 3 10:55:34 2018 - [info] ** Phase 1: Configuration Check Phase completed. 60 Fri Aug 3 10:55:34 2018 - [info] 61 Fri Aug 3 10:55:34 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. 62 Fri Aug 3 10:55:34 2018 - [info] 63 Fri Aug 3 10:55:34 2018 - [info] Forcing shutdown so that applications never connect to the current master.. 64 Fri Aug 3 10:55:34 2018 - [info] Executing master IP deactivation script: 65 Fri Aug 3 10:55:34 2018 - [info] /etc/masterha/master_ip_failover --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --command=stop --orig_master_ssh_port=3306 66 ssh: connect to host 192.168.1.101 port 3306: Connection refused 67 Fri Aug 3 10:55:37 2018 - [info] done. 68 Fri Aug 3 10:55:37 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. 69 Fri Aug 3 10:55:37 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. 70 Fri Aug 3 10:55:37 2018 - [info] 71 Fri Aug 3 10:55:37 2018 - [info] * Phase 3: Master Recovery Phase.. 72 Fri Aug 3 10:55:37 2018 - [info] 73 Fri Aug 3 10:55:37 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. 74 Fri Aug 3 10:55:37 2018 - [info] 75 Fri Aug 3 10:55:37 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000051:190 76 Fri Aug 3 10:55:37 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): 77 Fri Aug 3 10:55:37 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 78 Fri Aug 3 10:55:37 2018 - [info] GTID ON 79 Fri Aug 3 10:55:37 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 80 Fri Aug 3 10:55:37 2018 - [info] Primary candidate for the new Master (candidate_master is set) 81 Fri Aug 3 10:55:37 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000051:190 82 Fri Aug 3 10:55:37 2018 - [info] Oldest slaves: 83 Fri Aug 3 10:55:37 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 84 Fri Aug 3 10:55:37 2018 - [info] GTID ON 85 Fri Aug 3 10:55:37 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 86 Fri Aug 3 10:55:37 2018 - [info] Primary candidate for the new Master (candidate_master is set) 87 Fri Aug 3 10:55:37 2018 - [info] 88 Fri Aug 3 10:55:37 2018 - [info] * Phase 3.3: Determining New Master Phase.. 89 Fri Aug 3 10:55:37 2018 - [info] 90 Fri Aug 3 10:55:37 2018 - [info] Searching new master from slaves.. 91 Fri Aug 3 10:55:37 2018 - [info] Candidate masters from the configuration file: 92 Fri Aug 3 10:55:37 2018 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled 93 Fri Aug 3 10:55:37 2018 - [info] GTID ON 94 Fri Aug 3 10:55:37 2018 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) 95 Fri Aug 3 10:55:37 2018 - [info] Primary candidate for the new Master (candidate_master is set) 96 Fri Aug 3 10:55:37 2018 - [info] Non-candidate masters: 97 Fri Aug 3 10:55:37 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events.. 98 Fri Aug 3 10:55:37 2018 - [info] New master is 192.168.1.102(192.168.1.102:3306) 99 Fri Aug 3 10:55:37 2018 - [info] Starting master failover.. 100 Fri Aug 3 10:55:37 2018 - [info] 101 From: 102 192.168.1.101(192.168.1.101:3306) (current master) 103 +--192.168.1.102(192.168.1.102:3306) 104 105 To: 106 192.168.1.102(192.168.1.102:3306) (new master) 107 Fri Aug 3 10:55:37 2018 - [info] 108 Fri Aug 3 10:55:37 2018 - [info] * Phase 3.3: New Master Recovery Phase.. 109 Fri Aug 3 10:55:37 2018 - [info] 110 Fri Aug 3 10:55:37 2018 - [info] Waiting all logs to be applied.. 111 Fri Aug 3 10:55:37 2018 - [info] done. 112 Fri Aug 3 10:55:37 2018 - [info] Getting new master's binlog name and position.. 113 Fri Aug 3 10:55:37 2018 - [info] mysql-bin.000004:190 114 Fri Aug 3 10:55:37 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; 115 Fri Aug 3 10:55:37 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000004, 190, 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 116 Fri Aug 3 10:55:37 2018 - [info] Executing master IP activate script: 117 Fri Aug 3 10:55:37 2018 - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_port=3306 --new_master_ssh_port=3306 118 Unknown option: new_master_ssh_port 119 Set read_only=0 on the new master. 120 ssh_exchange_identification: Connection closed by remote host 121 Fri Aug 3 10:55:37 2018 - [info] OK. 122 Fri Aug 3 10:55:37 2018 - [info] ** Finished master recovery successfully. 123 Fri Aug 3 10:55:37 2018 - [info] * Phase 3: Master Recovery Phase completed. 124 Fri Aug 3 10:55:37 2018 - [info] 125 Fri Aug 3 10:55:37 2018 - [info] * Phase 4: Slaves Recovery Phase.. 126 Fri Aug 3 10:55:37 2018 - [info] 127 Fri Aug 3 10:55:37 2018 - [info] 128 Fri Aug 3 10:55:37 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. 129 Fri Aug 3 10:55:37 2018 - [info] 130 Fri Aug 3 10:55:37 2018 - [info] All new slave servers recovered successfully. 131 Fri Aug 3 10:55:37 2018 - [info] 132 Fri Aug 3 10:55:37 2018 - [info] * Phase 5: New master cleanup phase.. 133 Fri Aug 3 10:55:37 2018 - [info] 134 Fri Aug 3 10:55:37 2018 - [info] Resetting slave info on the new master.. 135 Fri Aug 3 10:55:37 2018 - [info] 192.168.1.102: Resetting slave info succeeded. 136 Fri Aug 3 10:55:37 2018 - [info] Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully. 137 Fri Aug 3 10:55:37 2018 - [info] 138 139 ----- Failover Report ----- 140 141 app1: MySQL Master failover 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306) succeeded 142 143 Master 192.168.1.101(192.168.1.101:3306) is down! 144 145 Check MHA Manager logs at zlm3:/var/log/masterha/app1/app1.log for details. 146 147 Started automated(non-interactive) failover. 148 Invalidated master IP address on 192.168.1.101(192.168.1.101:3306) 149 Selected 192.168.1.102(192.168.1.102:3306) as a new master. 150 192.168.1.102(192.168.1.102:3306): OK: Applying all logs succeeded. 151 192.168.1.102(192.168.1.102:3306): OK: Activated master IP address. 152 192.168.1.102(192.168.1.102:3306): Resetting slave info succeeded. 153 Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully. 154 155 //Above failover report shows all the evidence and results of automation master switchover.All of the steps are executed successfully.