zoukankan      html  css  js  c++  java
  • mysql-MHA 故障收集

    在manager 主机上开启监控服务,启动不了

    [root@manager ~]# managerStart
    [1] 1472
    [root@manager ~]# managerStatus
    app1 is stopped(2:NOT_RUNNING).
    [1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
    #说明: 这里我对启动服务的命令做了 别名命令。

    #查看日志 发现有这么一句话:

    Sun Mar 11 14:18:58 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, 
    but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below: Master 10.0.0.50(10.0.0.50:3306) Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306)

    这句话的大概意思,有两个成为主,而且两个都可写,按照原则同一时间只能有一台主机可以数据写入,不然可能会造成数据不一致的灾难性故障!

    在10.0.0.60 上开启mysql设置开启只读

    mysql -e 'set global read_only=1'

    设置完,还没完依旧开启不了这个监控程序,错误依旧存在

    Sun Mar 11 14:44:29 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 14:44:29 2018 - [info] Master configurations are as below: 
    Master 10.0.0.50(10.0.0.50:3306)
    Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306), read-only
    
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln726] Slave 10.0.0.70(10.0.0.70:3306) replicates from 10.0.0.60:3306, but real master is 10.0.0.50(10.0.0.50:3306)!
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 326
    Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
    Sun Mar 11 14:44:29 2018 - [info] Got exit code 1 (Not master dead).

    分析了 下,为什么会出现两个master呢? 因为之前模拟master宕机故障之后,vip飘到60并且60主机被提升为主,70主机本来是50主机的小弟,现在成为了60主机的小弟,这就导致了出现两个master,

    为了验证我这样的猜想,我强行设置,70跟随50 混,就change master to  指定 主机是50  什么位置信息和binlog文件也是50主机的信息

    ( ̄▽ ̄)"哈哈,猜中。。。开森了下。。

    [root@manager ~]# managerStatus
    app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
    [root@manager ~]# managerStatus
    app1 (pid:1520) is running(0:PING_OK), master:10.0.0.50
    Sun Mar 11 15:02:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sun Mar 11 15:02:01 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
    Sun Mar 11 15:02:01 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
    Sun Mar 11 15:02:01 2018 - [info] MHA::MasterMonitor version 0.56.
    Sun Mar 11 15:02:01 2018 - [info] GTID failover mode = 0
    Sun Mar 11 15:02:01 2018 - [info] Dead Servers:
    Sun Mar 11 15:02:01 2018 - [info] Alive Servers:
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.60(10.0.0.60:3306)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.70(10.0.0.70:3306)
    Sun Mar 11 15:02:01 2018 - [info] Alive Slaves:
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.60(10.0.0.60:3306)  Version=5.6.16-log (oldest major version between slaves) log-bin:enabled
    Sun Mar 11 15:02:01 2018 - [info]     Replicating from 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sun Mar 11 15:02:01 2018 - [info]   10.0.0.70(10.0.0.70:3306)  Version=5.6.16 (oldest major version between slaves) log-bin:disabled
    Sun Mar 11 15:02:01 2018 - [info]     Replicating from 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info] Current Alive Master: 10.0.0.50(10.0.0.50:3306)
    Sun Mar 11 15:02:01 2018 - [info] Checking slave configurations..
    Sun Mar 11 15:02:01 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.0.60(10.0.0.60:3306).
    Sun Mar 11 15:02:01 2018 - [warning]  relay_log_purge=0 is not set on slave 10.0.0.70(10.0.0.70:3306).
    Sun Mar 11 15:02:01 2018 - [warning]  log-bin is not set on slave 10.0.0.70(10.0.0.70:3306). This host cannot be a master.
    Sun Mar 11 15:02:01 2018 - [info] Checking replication filtering settings..
    Sun Mar 11 15:02:01 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
    Sun Mar 11 15:02:01 2018 - [info]  Replication filtering check ok.
    Sun Mar 11 15:02:01 2018 - [info] GTID (with auto-pos) is not supported
    Sun Mar 11 15:02:01 2018 - [info] Starting SSH connection tests..
    Sun Mar 11 15:02:02 2018 - [info] All SSH connection tests passed successfully.
    Sun Mar 11 15:02:02 2018 - [info] Checking MHA Node version..
    Sun Mar 11 15:02:03 2018 - [info]  Version check ok.
    Sun Mar 11 15:02:03 2018 - [info] Checking SSH publickey authentication settings on the current master..
    Sun Mar 11 15:02:04 2018 - [info] HealthCheck: SSH to 10.0.0.50 is reachable.
    Sun Mar 11 15:02:04 2018 - [info] Master MHA Node version is 0.56.
    Sun Mar 11 15:02:04 2018 - [info] Checking recovery script configurations on 10.0.0.50(10.0.0.50:3306)..
    Sun Mar 11 15:02:04 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000002 
    Sun Mar 11 15:02:04 2018 - [info]   Connecting to root@10.0.0.50(10.0.0.50:22).. 
      Creating /tmp if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /mysql/data, up to mysql-bin.000002
    Sun Mar 11 15:02:04 2018 - [info] Binlog setting check done.
    Sun Mar 11 15:02:04 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Sun Mar 11 15:02:04 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.60 --slave_ip=10.0.0.60 --slave_port=3306 --workdir=/tmp --target_version=5.6.16-log --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
    Sun Mar 11 15:02:04 2018 - [info]   Connecting to root@10.0.0.60(10.0.0.60:22).. 
      Checking slave recovery environment settings..
        Opening /mysql/data/relay-log.info ... ok.
        Relay log found at /mysql/data, up to cadicate-master-relay-bin.000005
        Temporary relay log file is /mysql/data/cadicate-master-relay-bin.000005
        Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sun Mar 11 15:02:05 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.70 --slave_ip=10.0.0.70 --slave_port=3306 --workdir=/tmp --target_version=5.6.16 --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
    Sun Mar 11 15:02:05 2018 - [info]   Connecting to root@10.0.0.70(10.0.0.70:22).. 
      Checking slave recovery environment settings..
        Opening /mysql/data/relay-log.info ... ok.
        Relay log found at /mysql/data, up to slave-relay-bin.000002
        Temporary relay log file is /mysql/data/slave-relay-bin.000002
        Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
     done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sun Mar 11 15:02:05 2018 - [info] Slaves settings check done.
    Sun Mar 11 15:02:05 2018 - [info] 
    10.0.0.50(10.0.0.50:3306) (current master)
     +--10.0.0.60(10.0.0.60:3306)
     +--10.0.0.70(10.0.0.70:3306)
    
    Sun Mar 11 15:02:05 2018 - [info] Checking master_ip_failover_script status:
    Sun Mar 11 15:02:05 2018 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.50 --orig_master_ip=10.0.0.50 --orig_master_port=3306 
    
    
    IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
    
    Checking the Status of the script.. OK 
    Sun Mar 11 15:02:05 2018 - [info]  OK.
    Sun Mar 11 15:02:05 2018 - [warning] shutdown_script is not defined.
    Sun Mar 11 15:02:05 2018 - [info] Set master ping interval 1 seconds.
    Sun Mar 11 15:02:05 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s server03 -s server02
    Sun Mar 11 15:02:05 2018 - [info] Starting ping health check on 10.0.0.50(10.0.0.50:3306)..
    Sun Mar 11 15:02:05 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

    分析日志,分析日志,分析日志,重要事情强调3遍!

  • 相关阅读:
    OutputCache 缓存key的创建 CreateOutputCachedItemKey
    Asp.net Web Api源码调试
    asp.net mvc源码分析DefaultModelBinder 自定义的普通数据类型的绑定和验证
    Asp.net web Api源码分析HttpParameterBinding
    Asp.net web Api源码分析HttpRequestMessage的创建
    asp.net mvc源码分析ActionResult篇 RazorView.RenderView
    Asp.Net MVC 项目预编译 View
    Asp.net Web.config文件读取路径你真的清楚吗?
    asp.net 动态创建TextBox控件 如何加载状态信息
    asp.net mvc源码分析BeginForm方法 和ClientValidationEnabled 属性
  • 原文地址:https://www.cnblogs.com/benjamin77/p/8544286.html
Copyright © 2011-2022 走看看