zoukankan      html  css  js  c++  java
  • Mysql 主主复制失败恢复【转】

    Mysql 主主复制失败



    故障描述

    原因描述因为机柜PDU老化, 导致整个机柜掉电.
    故障时间 20160923-10:09
    发现时间 20160929-13:56

    架构信息

    TomcatMemcacheKeepaliveMysql主主复制

    节点信息

    序号节点名称IP地址报错信息
    1 aipprd1 10.66.1.52 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000084’ at 91941417, the last event read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91941783, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91942912.’
    2 aipprd2 10.66.1.51 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000082’ at 6369026, the last event read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369026, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369280.’

    故障分析

    • 由于Zabbix的Mysql监控脚本的缘故, 没有触发事件, 所以直至20160929检查Zabbix日志的时候, 才发现该故障, 当时KeepaliveVIPaipprd1上, 这个节点上, 数据库是对外服务的;
    • 那么首先以aipprd1为主, 先将aipprd2的从环境同步起来;
    • aipprd2从环境同步完成后, 再将aipprd1的从环境同步起来.

    同步AIPPRD2的从环境

    检查aipprd2的Slave状态 
    根据代码区的描述, Last_Errno的错误代码为1062, 需要手动修改Position.

    [root@aipprd2 ~]# mysql -uzabbixmoniter -ppassw0rd -hlocalhost -e "show slave statusG;"
    *************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.66.1.52
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000082
          Read_Master_Log_Pos: 6369026
               Relay_Log_File: mysql-relay-bin.000257
                Relay_Log_Pos: 8999
        Relay_Master_Log_File: mysql-bin.000082
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6367510
              Relay_Log_Space: 11099
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000082' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.000082' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.000082' at 6369280.'
               Last_SQL_Errno: 1062
               Last_SQL_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1

    aipprd2上按照Last_IO_Error中的PosfilePos修改. 
    按照报错给出的提示PosfilePos修改后, 报错依旧.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root',     master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6369026;
    mysql> slave start;

    aipprd2检查Mysql的日志 
    Mysql的日志中记录了Crash开始的时间, 并给出了建议,Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510

    160923 12:35:22 [Note] Starting crash recovery...
    160923 12:35:22 [Note] Crash recovery finished.
    160923 12:36:27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510
    
    
    

    aipprd2上按照日志中建议的PosfilePos修改. 
    按照日志中建议的PosfilePos修改后, 报错依旧.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6367510;
    mysql> slave start;

    aipprd1上检查posfile
    首先检查show slave statusG;中给出的pos, 发现日志中根本不存在.

    [root@aipprd1 log]# mysqlbinlog --no-defaults --start-position=6369026 mysql-bin.000083 
    /*!40019 SET @@session.max_insert_delayed_threads=0*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    
    # at 4
    
    
    #160923 12:38:59 server id 1 end_log_pos 107 Start: binlog v 4, server v 5.5.24-log created 160923 12:38:59 at startup
    
    ROLLBACK/*!*/;
    BINLOG '
    47HkVw8BAAAAZwAAAGsAAAAAAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAADjseRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
    '/*!*/;
    ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 912208946, event_type: 71
    ERROR: Could not read entry at offset 6369026: Error in log format or read error.
    DELIMITER
    
    # End of log file
    
    ROLLBACK /* added by mysqlbinlog */;
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    此后检查Mysql日志中建议的pos, 发现posfile中是存在此记录的, 而此posfile的最后一个pos6368660, 而show slave statusG;pos6369026, 显然不存在日志文件中.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6367510 mysql-bin.000082 
    /*!40019 SET @@session.max_insert_delayed_threads=0*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    
    # at 4
    
    
    #160923 10:39:15 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log created 160923 10:39:15 at startup
    
    
    # Warning: this binlog is either in use or was not closed properly.
    
    ROLLBACK/*!*/;
    BINLOG '
    05XkVw8BAAAAZwAAAGsAAAABAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAADTleRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
    '/*!*/;
        # at 6367510
        # at 6367585
        # at 6367963
        # at 6367990
        # at 6368073
        # at 6368163
        # at 6368275
        # at 6368358
        # at 6368558
        # at 6368585
        # at 6368660
        # End of log file
    ROLLBACK /* added by mysqlbinlog */;
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    最后在mysql-bin.000083中检查show slave statusG;提示pos:6369026, 也不存在.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6369026 mysql-bin.000083 

    aipprd2上重新发起修改posfilepos操作 
    检查aipprd1上的日志, 既然mysql-bin.000082日志末不存在pos:6369026, 且mysql-bin.000083为下一个日志, 那么重新发起修改posfilepos的操作.

    [root@aipprd1 log]# ll
    total 526908
    -rw-rw---- 1 mysql mysql 104858346 Sep 19 23:04 mysql-bin.000079
    -rw-rw---- 1 mysql mysql 104859112 Sep 22 13:01 mysql-bin.000080
    -rw-rw---- 1 mysql mysql  42558345 Sep 23 10:23 mysql-bin.000081
    -rw-rw---- 1 mysql mysql   6369280 Sep 23 12:13 mysql-bin.000082
    -rw-rw---- 1 mysql mysql 104857953 Sep 26 15:02 mysql-bin.000083
    -rw-rw---- 1 mysql mysql  70844542 Sep 27 17:42 mysql-bin.000084
    -rw-rw---- 1 mysql mysql 104858026 Sep 29 15:48 mysql-bin.000085
    -rw-rw---- 1 mysql mysql    297019 Sep 29 15:54 mysql-bin.000086
    -rw-rw---- 1 mysql mysql       296 Sep 29 15:48 mysql-bin.log.index
    -rw-rw---- 1 mysql mysql       107 Sep 27 18:05 mysql-relay-bin.000182
    -rw-rw---- 1 mysql mysql        43 Sep 27 18:05 mysql-relay-bin.index

    重新修改posfilemysql-bin.000083pos0, 启动Slave后, 现在同步正常.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000083', master_log_pos=0;
    mysql> slave start;
    mysql> show slave statusG;
    *************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.66.1.52
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000086
          Read_Master_Log_Pos: 18585
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 8119286
        Relay_Master_Log_File: mysql-bin.000083
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 8119140
              Relay_Log_Space: 280580804
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 521370
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
    1 row in set (0.00 sec)
    • 在同步的过程中, 发现有几个Last_SQL_Error: Error ‘Duplicate entry 1026SQL Error, 这个是因为重复主键导致Slave停止工作, 执行以下操作解决(如果有多条重复的主键, 需要执行多次):

      mysql> slave stop;
      mysql> set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
      mysql> slave start;
    • 还有另一种办法就是修改mysql配置文件/etc/my.cnf[mysqld]下加一行slave_skip_errors = 1062 ,保存后重启mysql,mysql slave可以正常同步了.


    同步AIPPRD1的从环境

    • 检查aipprd1的Slave状态

      mysql> show slave statusG;
      *************************** 1. row ***************************
                 Slave_IO_State: 
                    Master_Host: 10.66.1.51
                    Master_User: root
                    Master_Port: 3388
                  Connect_Retry: 60
                Master_Log_File: mysql-bin.000084
            Read_Master_Log_Pos: 91941783
                 Relay_Log_File: mysql-relay-bin.000002
                  Relay_Log_Pos: 296
          Relay_Master_Log_File: mysql-bin.000084
               Slave_IO_Running: No
              Slave_SQL_Running: Yes
                Replicate_Do_DB: 
            Replicate_Ignore_DB: 
             Replicate_Do_Table: 
         Replicate_Ignore_Table: 
        Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                     Last_Errno: 0
                     Last_Error: 
                   Skip_Counter: 0
            Exec_Master_Log_Pos: 91941783
                Relay_Log_Space: 452
                Until_Condition: None
                 Until_Log_File: 
                  Until_Log_Pos: 0
             Master_SSL_Allowed: No
             Master_SSL_CA_File: 
             Master_SSL_CA_Path: 
                Master_SSL_Cert: 
              Master_SSL_Cipher: 
                 Master_SSL_Key: 
          Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                  Last_IO_Errno: 1236
                  Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000084' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.000084' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.000084' at 91942912.'
                 Last_SQL_Errno: 0
                 Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
               Master_Server_Id: 2
      1 row in set (0.00 sec)
    • aipprd2上检查日志文件 
      aipprd1上检查show slave statusG;后, 提示需要修改posfile为mysql-bin.000084pos91942912, 因为在aipprd2同步完成后, 实际同步的数据是从aipprd1过来的, 这些数据在aipprd1上本身就存在的.

      [root@aipprd2 log]# ll
      total 653832
      -rw-rw---- 1 mysql mysql 104857774 Sep 21 13:12 mysql-bin.000083
      -rw-rw---- 1 mysql mysql  91942912 Sep 23 10:21 mysql-bin.000084
      -rw-rw---- 1 mysql mysql   6295552 Sep 23 12:10 mysql-bin.000085
      -rw-rw---- 1 mysql mysql      8250 Sep 27 17:39 mysql-bin.000086
      -rw-rw---- 1 mysql mysql 104858014 Sep 29 17:20 mysql-bin.000087
      -rw-rw---- 1 mysql mysql 104857971 Sep 29 19:18 mysql-bin.000088
      -rw-rw---- 1 mysql mysql  55744926 Sep 29 20:23 mysql-bin.000089
      -rw-rw---- 1 mysql mysql       259 Sep 29 19:18 mysql-bin.log.index
      -rw-rw---- 1 mysql mysql  70844718 Sep 29 15:35 mysql-relay-bin.000005
      -rw-rw---- 1 mysql mysql 104857729 Sep 29 15:44 mysql-relay-bin.000006
      -rw-rw---- 1 mysql mysql       708 Sep 29 15:44 mysql-relay-bin.000007
      -rw-rw---- 1 mysql mysql       199 Sep 29 15:44 mysql-relay-bin.000008
      -rw-rw---- 1 mysql mysql  25188420 Sep 30 08:40 mysql-relay-bin.000009
      -rw-rw---- 1 mysql mysql       215 Sep 29 18:42 mysql-relay-bin.index
    • aipprd2上检查show master status;后, 记录Posfilemysql-bin.000089, 既然aipprd1的数据为最新的, 且aipprd2已经从aipprd1后同步完成了(通过检查show slave statusG;中的Seconds_Behind_Master:, 如果此项值很小, 应该是同步完成了.), 那么两边的数据应该差不多的.

      mysql> show master status;
      +------------------+-----------+--------------+------------------+
      | File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
      +------------------+-----------+--------------+------------------+
      | mysql-bin.000089 | 104341982 |              |                  |
      +------------------+-----------+--------------+------------------+
      1 row in set (0.00 sec)
    • aipprd1上发起修改posfilepos操作 
      所以在此用posfilemysql-bin.000089pos0来修改, 启动Slave后, 开始同步.

      mysql> slave stop;
      Query OK, 0 rows affected (0.11 sec)
      mysql> CHANGE MASTER TO master_host='10.66.1.51', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000089', master_log_pos=0;
      Query OK, 0 rows affected (0.06 sec)
      mysql> slave start;
      Query OK, 0 rows affected (0.00 sec)
      mysql> show slave statusG;
      *************************** 1. row ***************************
                  Slave_IO_State: Waiting for master to send event
                    Master_Host: 10.66.1.51
                    Master_User: root
                    Master_Port: 3388
                  Connect_Retry: 60
                Master_Log_File: mysql-bin.000089
            Read_Master_Log_Pos: 55744926
                 Relay_Log_File: mysql-relay-bin.000002
                  Relay_Log_Pos: 253
          Relay_Master_Log_File: mysql-bin.000089
               Slave_IO_Running: Yes
              Slave_SQL_Running: Yes
                Replicate_Do_DB: 
            Replicate_Ignore_DB: 
             Replicate_Do_Table: 
         Replicate_Ignore_Table: 
        Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                     Last_Errno: 0
                     Last_Error: 
                   Skip_Counter: 0
            Exec_Master_Log_Pos: 55744926
                Relay_Log_Space: 409
                Until_Condition: None
                 Until_Log_File: 
                  Until_Log_Pos: 0
             Master_SSL_Allowed: No
             Master_SSL_CA_File: 
             Master_SSL_CA_Path: 
                Master_SSL_Cert: 
              Master_SSL_Cipher: 
                 Master_SSL_Key: 
          Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                  Last_IO_Errno: 0
                  Last_IO_Error: 
                 Last_SQL_Errno: 0
                 Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
               Master_Server_Id: 2
      1 row in set (0.00 sec)

    转自

    Mysql 主主复制失败 - bluetom520的博客 - CSDN博客
    http://blog.csdn.net/bluetom520/article/details/54893183

  • 相关阅读:
    软件测试总结
    接口测试总结与分享
    Android自动化测试框架
    Jmeter系列- Jmeter 分布式测试
    python+requests接口自动化测试实战
    测试十年的前辈工作心得与经验分享
    一次压测实战的复盘
    (纯技术干货)完整的框架搭建过程 实战 Python+unittest+requests 接口自动化测试
    Android自动化测试框架必用工具
    第八周作业
  • 原文地址:https://www.cnblogs.com/paul8339/p/7615220.html
Copyright © 2011-2022 走看看