zoukankan      html  css  js  c++  java
  • MySQL并行复制的一个坑

    早上巡检数据库,发现一个延迟从库的sql_thread中断了。

    Last_SQL_Errno: 1755
    Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..

    检查performance_schema下的replication_applier_status_by_worker表,除了GTID之外也没有更具体的信息:

    "root@localhost:mysql3308.sock  [(none)]>select * from performance_schema.replication_applier_status_by_worker;
    +--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
    | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                          | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
    +--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
    |              |         1 |      NULL | OFF           | 0b961fcc-41c2-11e7-84fd-286ed488c7da:156369774 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         2 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         3 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         4 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         5 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         6 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         7 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         8 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
    +--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
    

    既然relay_log的位置信息都有了,那就去日志里看看吧:

    解析Binlog文件:

    mysqlbinlog -v --base64-output=decode-rows oracle-relay-bin.000093 >1.sql
    

    找到152912092位置点附近的日志:

    检查了一下数据库中这个表ID为14816035的数据确实是不存在的。

    另外除了这条日志,其它日志的last_committed和sequence_number都为0,last_committed表示事务提交的时候,上次事务提交的编号。last_committed和sequence_number代表的就是所谓的LOGICAL_CLOCK。

    猜测如果手动把这条数据插入延迟从库,并且使用注入一个空事务跳过这个GTID的方法重启sql_thread,相信这个错误也能被解决。

    但既然带了LOGICAL_CLOCK的事务就会出错,跳过事务的方法很难保证以后不会出错。

    注意到这条日志的last_committed是一个异常大的值,且错误信息中有提到The master event is logically timestamped incorrectly。我怀疑是不是并行配置的问题。

    从库配置:

    "root@localhost:mysql3308.sock  [(none)]>show variables like '%para%';
    +------------------------+---------------+
    | Variable_name          | Value         |
    +------------------------+---------------+
    | slave_parallel_type    | LOGICAL_CLOCK |
    | slave_parallel_workers | 8             |
    +------------------------+---------------+
    

     再检查主库配置:

    (root@localhost:mysql.sock) [(none)]>show variables like '%para%';
    +------------------------+-------+
    | Variable_name          | Value |
    +------------------------+-------+
    | slave_parallel_workers | 0     |
    +------------------------+-------+

     发现主库根本就没有slave_parallel_type这项配置。想起来主库是mysql5.6了。

    (root@localhost:mysql.sock) [(none)]>select version();
    +------------+
    | version()  |
    +------------+
    | 5.6.35-log |
    +------------+
    

     那么问题基本上就知道了,主库5.6只支持基于DATABASE的并行复制,而5.7的从库配置成LOGICAL_CLOCK导致了异常。

    明白了问题所在,那就好解决了,把从库的slave_parallel_type改为DATABASE,再起sql_thread问题应该就解决了:

    "root@localhost:mysql3308.sock  [none]>set global slave_parallel_type='DATABASE';
    Query OK, 0 rows affected (0.00 sec)
    
    "root@localhost:mysql3308.sock  [none]>show global variables like '%slave_parallel_type%';
    +---------------------+----------+
    | Variable_name       | Value    |
    +---------------------+----------+
    | slave_parallel_type | DATABASE |
    +---------------------+----------+
    1 row in set (0.00 sec)
    
    "root@localhost:mysql3308.sock  [none]>show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: master
                      Master_User: rep
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000104
              Read_Master_Log_Pos: 160115307
                   Relay_Log_File: oracle-relay-bin.000093
                    Relay_Log_Pos: 152912092
            Relay_Master_Log_File: binlog.000100
                 Slave_IO_Running: Yes
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 1755
                       Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 152911925
                  Relay_Log_Space: 4455094667
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 1755
                   Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 50
                      Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 3600
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 180716 18:02:56
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843604
                Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156369774
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    "root@localhost:mysql3308.sock  [none]>stop slave sql_thread;
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    "root@localhost:mysql3308.sock  [none]>start slave sql_thread;
    Query OK, 0 rows affected (0.01 sec)
    
    "root@localhost:mysql3308.sock  [none]>show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: master
                      Master_User: rep
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000104
              Read_Master_Log_Pos: 160161836
                   Relay_Log_File: oracle-relay-bin.000093
                    Relay_Log_Pos: 169205552
            Relay_Master_Log_File: binlog.000100
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 169205385
                  Relay_Log_Space: 4455141196
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 5351
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 50
                      Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 3600
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Waiting for Slave Worker to release partition
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843692
                Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156400100
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    

    打完收工。

    转载请注明出处。

    本文地址:https://www.cnblogs.com/ajiangg/p/9399892.html

  • 相关阅读:
    AT5200 [AGC038C] LCMs 莫比乌斯反演
    P5445 [APIO2019]路灯 树套树
    CF617E XOR and Favorite Number 莫队
    P5404 [CTS2019]重复 KMP自动机
    P4364 [九省联考2018]IIIDX 线段树上二分
    P3749 [六省联考2017]寿司餐厅 最大权闭合子图
    CF906E Reverses PAM+border
    AGC 补题记录
    做题记录
    题解 loj #3524. 「IOI2021」钥匙
  • 原文地址:https://www.cnblogs.com/ajiangg/p/9399892.html
Copyright © 2011-2022 走看看