zoukankan      html  css  js  c++  java
  • mysql-MHA 故障收集

    在manager 主机上开启监控服务,启动不了

    [root@manager ~]# managerStart
    [1] 1472
    [root@manager ~]# managerStatus
    app1 is stopped(2:NOT_RUNNING).
    [1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
    #说明: 这里我对启动服务的命令做了 别名命令。

    #查看日志 发现有这么一句话:

    Sun Mar 11 14:18:58 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, 
    but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below: Master 10.0.0.50(10.0.0.50:3306) Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306)

    这句话的大概意思,有两个成为主,而且两个都可写,按照原则同一时间只能有一台主机可以数据写入,不然可能会造成数据不一致的灾难性故障!

    在10.0.0.60 上开启mysql设置开启只读

    mysql -e 'set global read_only=1'

    设置完,还没完依旧开启不了这个监控程序,错误依旧存在

    Sun Mar 11 14:44:29 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 14:44:29 2018 - [info] Master configurations are as below: 
    Master 10.0.0.50(10.0.0.50:3306)
    Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306), read-only
    
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln726] Slave 10.0.0.70(10.0.0.70:3306) replicates from 10.0.0.60:3306, but real master is 10.0.0.50(10.0.0.50:3306)!
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 326
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
    Sun Mar 11 14:44:29 2018 - [info] Got exit code 1 (Not master dead).

    分析了 下,为什么会出现两个master呢? 因为之前模拟master宕机故障之后,vip飘到60并且60主机被提升为主,70主机本来是50主机的小弟,现在成为了60主机的小弟,这就导致了出现两个master,

    为了验证我这样的猜想,我强行设置,70跟随50 混,就change master to  指定 主机是50  什么位置信息和binlog文件也是50主机的信息

    ( ̄▽ ̄)"哈哈,猜中。。。开森了下。。

    [root@manager ~]# managerStatus
    app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
    [root@manager ~]# managerStatus
    app1 (pid:1520) is running(0:PING_OK), master:10.0.0.50
    Sun Mar 11 15:02:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sun Mar 11 15:02:01 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
    Sun Mar 11 15:02:01 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
    Sun Mar 11 15:02:01 2018 - [info] MHA::MasterMonitor version 0.56.
    Sun Mar 11 15:02:01 2018 - [info] GTID failover mode = 0
    Sun Mar 11 15:02:01 2018 - [info] Dead Servers:
    Sun Mar 11 15:02:01 2018 - [info] Alive Servers:
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.60(10.0.0.60:3306)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.70(10.0.0.70:3306)
    Sun Mar 11 15:02:01 2018 - [info] Alive Slaves:
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.60(10.0.0.60:3306)  Version=5.6.16-log (oldest major version between slaves) log-bin:enabled
    Sun Mar 11 15:02:01 2018 - [info]     Replicating from 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.70(10.0.0.70:3306)  Version=5.6.16 (oldest major version between slaves) log-bin:disabled
    Sun Mar 11 15:02:01 2018 - [info]     Replicating from 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info] Current Alive Master: 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info] Checking slave configurations..
    Sun Mar 11 15:02:01 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.0.60(10.0.0.60:3306).
    Sun Mar 11 15:02:01 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.0.70(10.0.0.70:3306).
    Sun Mar 11 15:02:01 2018 - [warning]  log-bin is not set on slave 10.0.0.70(10.0.0.70:3306). This host cannot be a master.
    Sun Mar 11 15:02:01 2018 - [info] Checking replication filtering settings..
    Sun Mar 11 15:02:01 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
    Sun Mar 11 15:02:01 2018 - [info]  Replication filtering check ok.
    Sun Mar 11 15:02:01 2018 - [info] GTID (with auto-pos) is not supported
    Sun Mar 11 15:02:01 2018 - [info] Starting SSH connection tests..
    Sun Mar 11 15:02:02 2018 - [info] All SSH connection tests passed successfully.
    Sun Mar 11 15:02:02 2018 - [info] Checking MHA Node version..
    Sun Mar 11 15:02:03 2018 - [info]  Version check ok.
    Sun Mar 11 15:02:03 2018 - [info] Checking SSH publickey authentication settings on the current master..
    Sun Mar 11 15:02:04 2018 - [info] HealthCheck: SSH to 10.0.0.50 is reachable.
    Sun Mar 11 15:02:04 2018 - [info] Master MHA Node version is 0.56.
    Sun Mar 11 15:02:04 2018 - [info] Checking recovery script configurations on 10.0.0.50(10.0.0.50:3306)..
    Sun Mar 11 15:02:04 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000002 
    Sun Mar 11 15:02:04 2018 - [info]   Connecting to root@10.0.0.50(10.0.0.50:22).. 
      Creating /tmp if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /mysql/data, up to mysql-bin.000002
    Sun Mar 11 15:02:04 2018 - [info] Binlog setting check done.
    Sun Mar 11 15:02:04 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Sun Mar 11 15:02:04 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.60 --slave_ip=10.0.0.60 --slave_port=3306 --workdir=/tmp --target_version=5.6.16-log --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
    Sun Mar 11 15:02:04 2018 - [info]   Connecting to root@10.0.0.60(10.0.0.60:22).. 
      Checking slave recovery environment settings..
        Opening /mysql/data/relay-log.info ... ok.
        Relay log found at /mysql/data, up to cadicate-master-relay-bin.000005
        Temporary relay log file is /mysql/data/cadicate-master-relay-bin.000005
        Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sun Mar 11 15:02:05 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.70 --slave_ip=10.0.0.70 --slave_port=3306 --workdir=/tmp --target_version=5.6.16 --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
    Sun Mar 11 15:02:05 2018 - [info]   Connecting to root@10.0.0.70(10.0.0.70:22).. 
      Checking slave recovery environment settings..
        Opening /mysql/data/relay-log.info ... ok.
        Relay log found at /mysql/data, up to slave-relay-bin.000002
        Temporary relay log file is /mysql/data/slave-relay-bin.000002
        Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sun Mar 11 15:02:05 2018 - [info] Slaves settings check done.
    Sun Mar 11 15:02:05 2018 - [info] 
    10.0.0.50(10.0.0.50:3306) (current master)
     +--10.0.0.60(10.0.0.60:3306)
     +--10.0.0.70(10.0.0.70:3306)
    
    Sun Mar 11 15:02:05 2018 - [info] Checking master_ip_failover_script status:
    Sun Mar 11 15:02:05 2018 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.50 --orig_master_ip=10.0.0.50 --orig_master_port=3306 
    
    
    IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
    
    Checking the Status of the script.. OK 
    Sun Mar 11 15:02:05 2018 - [info]  OK.
    Sun Mar 11 15:02:05 2018 - [warning] shutdown_script is not defined.
    Sun Mar 11 15:02:05 2018 - [info] Set master ping interval 1 seconds.
    Sun Mar 11 15:02:05 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s server03 -s server02
    Sun Mar 11 15:02:05 2018 - [info] Starting ping health check on 10.0.0.50(10.0.0.50:3306)..
    Sun Mar 11 15:02:05 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

    分析日志,分析日志,分析日志,重要事情强调3遍!

  • 相关阅读:
    1.1 HTML5简介
    MATLAB基础知识——1.1MATLAB系统变量
    初识MATLAB
    Z-Stack
    [C语言]关于struct和typedef struct
    [Zigbee]定时器1
    常用数论算法
    SPFA&邻接表 PASCAL
    kruskal算法-Pascal
    懒惰的JY--关于遍历
  • 原文地址:https://www.cnblogs.com/benjamin77/p/8544286.html
Copyright © 2011-2022 走看看