zoukankan      html  css  js  c++  java
  • MHA快速搭建

    很早之前写过MHA的文章,但是常常在技术群看到有同学问MHA搭建的问题,不是权限问题就是配置问题,我在这里就再次一写下配置过程以及快速的搭建。如果想知道更多的细节与原理,请参考:MySQL高可用架构之MHA

    环境:

    1主1从,manager放在从库。

    主库:192.168.0.10

    从库:192.168.0.20

    两台机器的mysql安装完成初始化以后进行复制搭建,首先登录主库(192.168.0.10),查看pos点:

    mysql> show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000001 |      154 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)

    然后在主库(192.168.0.10)
    添加复制账号以及mha用的账号

    mysql> grant replication slave on *.* to 'repl'@'192.168.0.10'  identified by 'repl'; 
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> grant replication slave on *.* to 'repl'@'192.168.0.20'  identified by 'repl'; 
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> grant all on *.* to 'root'@'192.168.0.20'  identified by '123';                     
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> grant all on *.* to 'root'@'192.168.0.10'  identified by '123';  
    Query OK, 0 rows affected, 1 warning (0.01 sec)
    
    mysql> 

    从库(192.168.0.10 )change到mysql-bin.000001,pos点154

     CHANGE MASTER TO  MASTER_HOST='192.168.0.10',MASTER_USER='repl',MASTER_PASSWORD='repl',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=154; 
    mysql> start slave;
    Query OK, 0 rows affected (0.15 sec)
    
    mysql> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.0.10
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000001
              Read_Master_Log_Pos: 1344
                   Relay_Log_File: relaylog.000002
                    Relay_Log_Pos: 1510
            Relay_Master_Log_File: mysql-bin.000001
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 

    复制到这来就搭建完成了,然后配置192.168.0.10和192.168.0.20 ssh互信。(两台机器都执行)

    ssh-keygen -t rsa
    ssh-copy-id -i  /root/.ssh/id_rsa.pub  '-p 22 192.168.0.10'
    ssh-copy-id -i  /root/.ssh/id_rsa.pub  '-p 22 192.168.0.20'

    安装MHA软件,首先安装epel源。(2台机器)

    rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

    安装依赖包

    yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

    安装MHA软件(两台机器)

    tar xf mha4mysql-node-0.56.tar.gz 
    cd mha4mysql-node-0.56
    perl Makefile.PL
    make && make install
    
    tar xf mha4mysql-manager-0.56.tar.gz
    cd mha4mysql-manager-0.56
    perl Makefile.PL
    make && make install

    在从库(192.168.0.20)创建目录:

    mkdir /data/mha/3306/log
    cd /data/mha/3306/touch mha.cnf

    mha.cnf配置文件内容如下:

    [server default]
    client_bindir=/usr/local/mysql/bin/
    manager_log=/data/mha/3306/log/manager.log
    manager_workdir=/data/mha/3306/log
    master_binlog_dir=/data/mysql/3306/binlog/
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    master_ip_online_change_script= /usr/local/bin/master_ip_online_change
    report_script=/usr/local/bin/send_report
    init_conf_load_script=/usr/local/bin/load_cnf
    remote_workdir=/data/mysql/3306/mysqltmp
    #secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.0.30 -s 192.168.0.40
    user=root
    ping_interval=3
    repl_user=repl
    ssh_port=22
    ssh_user=root
    max_ping_errors=40
    
    [server1]
    hostname=192.168.0.10
    port=3306
    
    [server2]
    candidate_master=1
    check_repl_delay=0
    hostname=192.168.0.20
    port=3306

    编辑文件 /usr/local/bin/load_cnf 里面的密码修改成对应的密码

    #!/usr/bin/perl
    
      print "password=123
    ";
      print "repl_password=repl
    ";

    执行chek命令查看复制是否正常:

    [root@dbserver-yayun-02 3306]# masterha_check_repl --conf=/data/mha/3306/mha.cnf 
    Mon Mar 13 21:36:54 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Mar 13 21:36:54 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..
    Mon Mar 13 21:36:54 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..
    Mon Mar 13 21:36:54 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..
    Mon Mar 13 21:36:54 2017 - [info] Setting max_ping_errors to 40, ping_interval to 3.
    Mon Mar 13 21:36:54 2017 - [info] MHA::MasterMonitor version 0.56.
    Mon Mar 13 21:36:54 2017 - [info] GTID failover mode = 0
    Mon Mar 13 21:36:54 2017 - [info] Dead Servers:
    Mon Mar 13 21:36:54 2017 - [info] Alive Servers:
    Mon Mar 13 21:36:54 2017 - [info]   192.168.0.10(192.168.0.10:3306)
    Mon Mar 13 21:36:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)
    Mon Mar 13 21:36:54 2017 - [info] Alive Slaves:
    Mon Mar 13 21:36:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
    Mon Mar 13 21:36:54 2017 - [info]     Replicating from 192.168.0.10(192.168.0.10:3306)
    Mon Mar 13 21:36:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
    Mon Mar 13 21:36:54 2017 - [info] Current Alive Master: 192.168.0.10(192.168.0.10:3306)
    Mon Mar 13 21:36:54 2017 - [info] Checking slave configurations..
    Mon Mar 13 21:36:54 2017 - [info]  read_only=1 is not set on slave 192.168.0.20(192.168.0.20:3306).
    Mon Mar 13 21:36:54 2017 - [warning]  relay_log_purge=0 is not set on slave 192.168.0.20(192.168.0.20:3306).
    Mon Mar 13 21:36:54 2017 - [info] Checking replication filtering settings..
    Mon Mar 13 21:36:54 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
    Mon Mar 13 21:36:54 2017 - [info]  Replication filtering check ok.
    Mon Mar 13 21:36:54 2017 - [info] GTID (with auto-pos) is not supported
    Mon Mar 13 21:36:54 2017 - [info] Starting SSH connection tests..
    Mon Mar 13 21:36:55 2017 - [info] All SSH connection tests passed successfully.
    Mon Mar 13 21:36:55 2017 - [info] Checking MHA Node version..
    Mon Mar 13 21:36:55 2017 - [info]  Version check ok.
    Mon Mar 13 21:36:55 2017 - [info] Checking SSH publickey authentication settings on the current master..
    Mon Mar 13 21:36:55 2017 - [info] HealthCheck: SSH to 192.168.0.10 is reachable.
    Mon Mar 13 21:36:56 2017 - [info] Master MHA Node version is 0.56.
    Mon Mar 13 21:36:56 2017 - [info] Checking recovery script configurations on 192.168.0.10(192.168.0.10:3306)..
    Mon Mar 13 21:36:56 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/3306/binlog/ --output_file=/data/mysql/3306/mysqltmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 
    Mon Mar 13 21:36:56 2017 - [info]   Connecting to root@192.168.0.10(192.168.0.10:22).. 
      Creating /data/mysql/3306/mysqltmp if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /data/mysql/3306/binlog/, up to mysql-bin.000001
    Mon Mar 13 21:36:56 2017 - [info] Binlog setting check done.
    Mon Mar 13 21:36:56 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Mon Mar 13 21:36:56 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.0.20 --slave_ip=192.168.0.20 --slave_port=3306 --workdir=/data/mysql/3306/mysqltmp --target_version=5.7.17-log --manager_version=0.56 --client_bindir=/usr/local/mysql/bin/ --relay_dir=/data/mysql/3306/relaylog --current_relay_log=relaylog.000002  --slave_pass=xxx
    Mon Mar 13 21:36:56 2017 - [info]   Connecting to root@192.168.0.20(192.168.0.20:22).. 
      Checking slave recovery environment settings..
        Relay log found at /data/mysql/3306/relaylog, up to relaylog.000002
        Temporary relay log file is /data/mysql/3306/relaylog/relaylog.000002
        Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Mon Mar 13 21:36:56 2017 - [info] Slaves settings check done.
    Mon Mar 13 21:36:56 2017 - [info] 
    192.168.0.10(192.168.0.10:3306) (current master)
     +--192.168.0.20(192.168.0.20:3306)
    
    Mon Mar 13 21:36:56 2017 - [info] Checking replication health on 192.168.0.20..
    Mon Mar 13 21:36:56 2017 - [info]  ok.
    Mon Mar 13 21:36:56 2017 - [info] Checking master_ip_failover_script status:
    Mon Mar 13 21:36:56 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 
    
    
    IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.0.88/16===
    
    Checking the Status of the script.. OK 
    Mon Mar 13 21:36:56 2017 - [info]  OK.
    Mon Mar 13 21:36:56 2017 - [warning] shutdown_script is not defined.
    Mon Mar 13 21:36:56 2017 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.
    View Code

    主库(192.168.0.10)执行命令,启动vip:

    /sbin/ifconfig eth1:1 192.168.0.88/16

    在线切换,把主库切到192.168.0.20

    masterha_master_switch --master_state=alive --conf=/data/mha/3306/mha.cnf --new_master_host=192.168.0.20 --new_master_port=3306 --orig_master_is_new_slave

    输出如下:

    Mon Mar 13 22:15:08 2017 - [info] MHA::MasterRotate version 0.56.
    Mon Mar 13 22:15:08 2017 - [info] Starting online master switch..
    Mon Mar 13 22:15:08 2017 - [info] 
    Mon Mar 13 22:15:08 2017 - [info] * Phase 1: Configuration Check Phase..
    Mon Mar 13 22:15:08 2017 - [info] 
    Mon Mar 13 22:15:08 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Mar 13 22:15:08 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..
    Mon Mar 13 22:15:08 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..
    Mon Mar 13 22:15:08 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..
    Mon Mar 13 22:15:08 2017 - [info] Setting max_ping_errors to 40, ping_interval to 3.
    Mon Mar 13 22:15:08 2017 - [info] GTID failover mode = 0
    Mon Mar 13 22:15:08 2017 - [info] Current Alive Master: 192.168.0.10(192.168.0.10:3306)
    Mon Mar 13 22:15:08 2017 - [info] Alive Slaves:
    Mon Mar 13 22:15:08 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
    Mon Mar 13 22:15:08 2017 - [info]     Replicating from 192.168.0.10(192.168.0.10:3306)
    Mon Mar 13 22:15:08 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
    
    It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.0.10(192.168.0.10:3306)? (YES/no): yes
    Mon Mar 13 22:15:09 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
    Mon Mar 13 22:15:09 2017 - [info]  ok.
    Mon Mar 13 22:15:09 2017 - [info] Checking MHA is not monitoring or doing failover..
    Mon Mar 13 22:15:09 2017 - [info] Checking replication health on 192.168.0.20..
    Mon Mar 13 22:15:09 2017 - [info]  ok.
    Mon Mar 13 22:15:09 2017 - [info] 192.168.0.20 can be new master.
    Mon Mar 13 22:15:09 2017 - [info] 
    From:
    192.168.0.10(192.168.0.10:3306) (current master)
     +--192.168.0.20(192.168.0.20:3306)
    
    To:
    192.168.0.20(192.168.0.20:3306) (new master)
     +--192.168.0.10(192.168.0.10:3306)
    
    Starting master switch from 192.168.0.10(192.168.0.10:3306) to 192.168.0.20(192.168.0.20:3306)? (yes/NO): yes
    Mon Mar 13 22:15:10 2017 - [info] Checking whether 192.168.0.20(192.168.0.20:3306) is ok for the new master..
    Mon Mar 13 22:15:10 2017 - [info]  ok.
    Mon Mar 13 22:15:10 2017 - [info] 192.168.0.10(192.168.0.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
    Mon Mar 13 22:15:11 2017 - [info] 192.168.0.10(192.168.0.10:3306): Resetting slave pointing to the dummy host.
    Mon Mar 13 22:15:11 2017 - [info] ** Phase 1: Configuration Check Phase completed.
    Mon Mar 13 22:15:11 2017 - [info] 
    Mon Mar 13 22:15:11 2017 - [info] * Phase 2: Rejecting updates Phase..
    Mon Mar 13 22:15:11 2017 - [info] 
    Mon Mar 13 22:15:11 2017 - [info] Executing master ip online change script to disable write on the current master:
    Mon Mar 13 22:15:11 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123' --new_master_host=192.168.0.20 --new_master_ip=192.168.0.20 --new_master_port=3306 --new_master_user='root' --new_master_password='123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
    Mon Mar 13 22:15:11 2017 173907 Set read_only on the new master.. ok.
    Mon Mar 13 22:15:11 2017 177320 Drpping app user on the orig master..
    Mon Mar 13 22:15:11 2017 177844 Set read_only=1 on the orig master.. ok.
    Mon Mar 13 22:15:11 2017 179299 Killing all application threads..
    Mon Mar 13 22:15:11 2017 179318 done.
    Mon Mar 13 22:15:11 2017 - [info]  ok.
    Mon Mar 13 22:15:11 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
    Mon Mar 13 22:15:11 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
    Mon Mar 13 22:15:11 2017 - [info]  ok.
    Mon Mar 13 22:15:11 2017 - [info] Orig master binlog:pos is mysql-bin.000001:2265.
    Mon Mar 13 22:15:11 2017 - [info]  Waiting to execute all relay logs on 192.168.0.20(192.168.0.20:3306)..
    Mon Mar 13 22:15:11 2017 - [info]  master_pos_wait(mysql-bin.000001:2265) completed on 192.168.0.20(192.168.0.20:3306). Executed 0 events.
    Mon Mar 13 22:15:11 2017 - [info]   done.
    Mon Mar 13 22:15:11 2017 - [info] Getting new master's binlog name and position..
    Mon Mar 13 22:15:11 2017 - [info]  mysql-bin.000001:2265
    Mon Mar 13 22:15:11 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=2265, MASTER_USER='repl', MASTER_PASSWORD='xxx';
    Mon Mar 13 22:15:11 2017 - [info] Executing master ip online change script to allow write on the new master:
    Mon Mar 13 22:15:11 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123' --new_master_host=192.168.0.20 --new_master_ip=192.168.0.20 --new_master_port=3306 --new_master_user='root' --new_master_password='123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
    Mon Mar 13 22:15:11 2017 493612 Set read_only=0 on the new master.
    Mon Mar 13 22:15:11 2017 493874 Creating app user on the new master..
    Mon Mar 13 22:15:32 2017 - [info]  ok.
    Mon Mar 13 22:15:32 2017 - [info] 
    Mon Mar 13 22:15:32 2017 - [info] * Switching slaves in parallel..
    Mon Mar 13 22:15:32 2017 - [info] 
    Mon Mar 13 22:15:32 2017 - [info] Unlocking all tables on the orig master:
    Mon Mar 13 22:15:32 2017 - [info] Executing UNLOCK TABLES..
    Mon Mar 13 22:15:32 2017 - [info]  ok.
    Mon Mar 13 22:15:32 2017 - [info] Starting orig master as a new slave..
    Mon Mar 13 22:15:32 2017 - [info]  Resetting slave 192.168.0.10(192.168.0.10:3306) and starting replication from the new master 192.168.0.20(192.168.0.20:3306)..
    Mon Mar 13 22:15:32 2017 - [info]  Executed CHANGE MASTER.
    Mon Mar 13 22:15:32 2017 - [info]  Slave started.
    Mon Mar 13 22:15:32 2017 - [info] All new slave servers switched successfully.
    Mon Mar 13 22:15:32 2017 - [info] 
    Mon Mar 13 22:15:32 2017 - [info] * Phase 5: New master cleanup phase..
    Mon Mar 13 22:15:32 2017 - [info] 
    Mon Mar 13 22:15:32 2017 - [info]  192.168.0.20: Resetting slave info succeeded.
    Mon Mar 13 22:15:32 2017 - [info] Switching master to 192.168.0.20(192.168.0.20:3306) completed successfully.
    View Code

    关于配置文件中的参数:
    max_ping_errors=40,这个是修改了源码,增加了检测次数的定义,默认是3次,太容易误切换。

    启动管理进程:

    /usr/bin/nohup /usr/local/bin/masterha_manager --conf=/data/mha/3306/mha.cnf --ignore_last_failover > /data/mha/3306/log/manager.log 2>&1 &

    在主库(192.168.0.10)封掉ip,可以看到日志输出。

    iptables -I INPUT -s 192.168.0.20 -j DROP
    Mon Mar 13 22:22:50 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:22:50 2017 - [warning] Connection failed 3 time(s)..
    Mon Mar 13 22:22:53 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:22:53 2017 - [warning] Connection failed 4 time(s)..
    Mon Mar 13 22:22:56 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:22:56 2017 - [warning] Connection failed 5 time(s)..
    Mon Mar 13 22:22:59 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:22:59 2017 - [warning] Connection failed 6 time(s)..
    Mon Mar 13 22:23:02 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:23:02 2017 - [warning] Connection failed 7 time(s)..
    Mon Mar 13 22:23:05 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:23:05 2017 - [warning] Connection failed 8 time(s)..
    Mon Mar 13 22:23:08 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:23:08 2017 - [warning] Connection failed 9 time(s)..
    Mon Mar 13 22:23:11 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:23:11 2017 - [warning] Connection failed 10 time(s)..
    Mon Mar 13 22:23:14 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    Mon Mar 13 22:23:14 2017 - [warning] Connection failed 11 time(s)..
    Mon Mar 13 22:23:17 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
    View Code


    修改以后的软件包下载地址:链接:http://pan.baidu.com/s/1c3qg70 密码:f9u2

  • 相关阅读:
    jQuery插件实践之轮播练习(二)
    jQuery插件实践之轮播练习(一)
    AngularJS+Node.js+socket.io 开发在线聊天室
    Ubuntu上部署Ghost博客
    综合架构的简述
    进程
    路由配置
    计算机专用英语词汇1695个词汇表
    Linux打包压缩解压工具
    磁盘知识体系结构
  • 原文地址:https://www.cnblogs.com/gomysql/p/6547797.html
Copyright © 2011-2022 走看看