zoukankan      html  css  js  c++  java
  • 通过performance_schema获取造成死锁的事务语句(转)

    数据库日常维护中我们经常遇到死锁的问题,由于无法获取造成死锁的事务内执行过的语句,对我们死锁的分析造成很大的困难。但是在MySQL 5.7中我们可以利用performance_schema来获取这些语句,为我们解决死锁问题提供了一个新的思路,下面我们将解释这种方法的使用。

    一、开启相关统计的方法

    如果打开performance_schema选项来收集执行过的语句和事务会有性能损失,一般建议需要的时候开启,然后在线关闭掉。

    (1) 在my.cnf中设置打开、关闭performance_schema选项随数据库启动
    #设置setup_instruments表收集相关statement event
    performance-schema-instrument='statement/%=ON'  
    #开启events_statements_current表存储当前连接线程执行的最后一条statement event信息               
    performance-schema-consumer-events-statements-current=ON   
    #开启events-statements-history表默认存储每个线程最近10条statement event信息 
    performance-schema-consumer-events-statements-history=ON        
    #开启events-statements-history-long表默认存储最近10000条语句event信息
    performance-schema-consumer-events-statements-history-long=ON   
    
    performance-schema-consumer-statements-digest=ON       
             
    #设置setup_instruments表收集transaction event
    performance-schema-instrument='transaction=ON'    
    #开启events_transactions_current表存储当前连接线程执行的transaction event信息              
    performance-schema-consumer-events-transactions-current=ON      
    #开启events_transactions_history表默认存储每个线程最近10条transaction event信息
    performance-schema-consumer-events-transactions-history=ON   
    #开启events_statements_history_long表默认存储最近10000条语句event信息。   
    performance-schema-consumer-events-transactions-history-long=ON 
    
    
    (2) 在my.cnf中设置关闭performance_schema选项语句和事务收集
    performance-schema-instrument='statement/%=OFF'              
    performance-schema-consumer-events-statements-current=OFF
    performance-schema-consumer-events-statements-history=OFF
    performance-schema-consumer-events-statements-history-long=OFF
    performance-schema-consumer-statements-digest=OFF
    
    performance-schema-instrument='transaction=OFF'
    performance-schema-consumer-events-transactions-current=OFF
    performance-schema-consumer-events-transactions-history=OFF
    performance-schema-consumer-events-transactions-history-long=OFF
    
    
    (3) 在线打开performance_schema选项收集执行过的语句和事务
    update setup_consumers set ENABLED='yes' where name like 'events_statements%';     
    update setup_consumers set ENABLED='yes' where name like 'events_transactions%';   
    update setup_consumers set ENABLED='yes' where name like 'statements-digest';      
    update setup_instruments set ENABLED='yes',TIMED='yes' where name ='transaction';  
    
    (4) 在线关闭performance_schema选项收集执行过的语句和事务
    update setup_consumers set ENABLED='no' where name like 'events_statements%';     
    update setup_consumers set ENABLED='no' where name like 'events_transactions%';   
    update setup_consumers set ENABLED='no' where name like 'statements-digest';      
    update setup_instruments set ENABLED='no',TIMED='no' where name ='transaction';   
    
    (5) 相关参数
    • performance_schema_events_statements_history_size:定义events_statements_history表中每个线程最大行数,静态参数,修改需要重启实例。
    • performance_schema_events_statements_history_long_size:定义events_statements_history_long表最大行数,静态参数,修改需要重启实例。
    • performance_schema_events_transactions_history_size:定义events_transactions_history表每个线程最大行数,静态参数,修改需要重启实例。
    • performance_schema_events_transactions_history_long_size:定义events_transactions_history_long表最大行数,静态参数,修改需要重启实例。

    二、根据死锁信息来获取造成死锁的语句

    下面是一个典型的死锁日志:

    *** (1) TRANSACTION:
    
    TRANSACTION 897551, ACTIVE 48 sec starting index read
    mysql tables in use 1, locked 1
    LOCK WAIT 5 lock struct(s), heap size 1136, 3 row lock(s)
    MySQL thread id 4, OS thread handle 140029509830400, query id 123 localhost root 
    statistics
    select * from t1 where id=1 for update
    2019-07-09T10:29:40.710270+08:00 3 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO 
    BE GRANTED:
    
    RECORD LOCKS space id 591 page no 3 n bits 80 index PRIMARY of table `txc_test`.`t1` 
    trx id 897551 lock_mode X locks rec but not gap waiting
    ......
    
    2019-07-09T10:29:40.711199+08:00 3 [Note] InnoDB: *** (2) TRANSACTION:
    
    TRANSACTION 897550, ACTIVE 62 sec starting index read
    mysql tables in use 1, locked 1
    5 lock struct(s), heap size 1136, 3 row lock(s)
    MySQL thread id 3, OS thread handle 140029510035200, query id 124 localhost root 
    statistics
    select * from t2 where id=1 for update
    2019-07-09T10:29:40.711364+08:00 3 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
    
    RECORD LOCKS space id 591 page no 3 n bits 80 index PRIMARY of table `txc_test`.`t1` 
    trx id 897550 lock_mode X locks rec but not gap
    ......
    2019-07-09T10:29:40.712249+08:00 3 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO
    BE GRANTED:
    
    RECORD LOCKS space id 592 page no 3 n bits 80 index PRIMARY of table `txc_test`.`t2`
    trx id 897550 lock_mode X locks rec but not gap waiting
    ......
    2019-07-09T10:29:40.713153+08:00 3 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
    
    
    (1)通过events_statements_history_long来查询死锁语句
    • 事务1

    在死锁日志中我们可以看到导致死锁的最后一个语句是‘select * from t1 where id=1 for update’,那么我们就可以使用它来查询events_statements_history_long表。

    mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
    CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT,timer_start 
    from events_statements_history_long where sql_text like 
    'select * from t1 where id=1 for update' G
    *************************** 1. row ***************************
             THREAD_ID: 37       
              EVENT_ID: 98
          END_EVENT_ID: 106
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t1 where id=1 for update
                DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 96
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 266799699801000
    *************************** 2. row ***************************
             THREAD_ID: 38
              EVENT_ID: 35
          END_EVENT_ID: 37
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t1 where id=1 for update
                DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 19
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
    
    
    • 事务2

    在死锁日志中我们可以看到导致死锁的最后一个语句是‘select * from t2 where id=1 for update’,那么我们就可以使用它来查询events_statements_history_long表。

     mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
    CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT 
    from events_statements_history_long where sql_text like 
    'select * from t2 where id=1 for update' G
    *************************** 1. row ***************************
    (event关联的线程id)
             THREAD_ID: 38                                          
              EVENT_ID: 21                                       
          END_EVENT_ID: 29                                          
            EVENT_NAME: statement/sql/select                      
                SOURCE: socket_connection.cc:101                  
    (执行的sql语句信息)
              SQL_TEXT: select * from t2 where id=1 for update      
                DIGEST: 315b4a6a8f7424bc7591256a8937a213 
        CURRENT_SCHEMA: txc_test                                
    (父语句event id)  
      NESTING_EVENT_ID: 19                                                                                
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
    (event开始时间(从实例启动到现在的时间差),单位皮秒)
           timer_start: 280428752972000                             
    *************************** 2. row ***************************
    (event关联的线程id)
             THREAD_ID: 37                                          
              EVENT_ID: 112
          END_EVENT_ID: 114
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
    (执行的sql语句信息)
              SQL_TEXT: select * from t2 where id=1 for update      
                DIGEST: 315b4a6a8f7424bc7591256a8937a213
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 96
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction
    (event开始时间(从实例启动到现在的时间差),单位皮秒)
           timer_start: 328978250011000                             
                      
    
    

    虽然我们根据死锁日志进行了语句查询,但是我们发现没法确认哪一个是事务1,哪一个是事务2。不过我们可以通过MESSAGE_TEXT中的信息‘Deadlock found when trying to get lock; try restarting transaction’,确认THREAD_ID:37为事务2,THREAD_ID: 38为事务1的线程,因为我们的死锁日志显示事务2被回滚掉了。需要注意这里的THREAD_ID是performance_schema内部使用的,和我们平时show processlist访问的PROCESSLIST_ID不一样。它是一个performance_schema内部的计数器源码如下:

    PFS_thread* create_thread(PFS_thread_class *klass, const void *identity,
                              ulonglong processlist_id){
      PFS_thread *pfs;
      pfs_dirty_state dirty_state;
      pfs= global_thread_container.allocate(& dirty_state);  if (pfs != NULL)
      {
        pfs->m_thread_internal_id=
          PFS_atomic::add_u64(&thread_internal_id_counter.m_u64, 1);
        pfs->m_parent_thread_internal_id= 0;
        pfs->m_processlist_id= static_cast<ulong>(processlist_id);
    
    
    (2)提取信息
    • 事务1

    最后一条语句的END_EVENT_ID=37
    最后一条语句的NESTING_EVENT_ID=19
    THREAD_ID=38

    因此我们可以通过如下语句得出经历的所有语句如下:

     mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
    CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT,timer_start 
    from events_statements_history_long where thread_id=38 and END_EVENT_ID>=19 and 
    END_EVENT_ID <=37 G
    
    *************************** 1. row ***************************
             THREAD_ID: 38                                           
              EVENT_ID: 18                                            
          END_EVENT_ID: 19                                                                                        
            EVENT_NAME: statement/sql/begin
                SOURCE: socket_connection.cc:101
              SQL_TEXT: begin                                        
                DIGEST: f57daa74a09445d1e1c496f28fe6d906
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: NULL
    NESTING_EVENT_TYPE: NULL                                       
          MESSAGE_TEXT: NULL
           timer_start: 274749880798000                          
    *************************** 2. row ***************************
             THREAD_ID: 38
              EVENT_ID: 21
          END_EVENT_ID: 29
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t2 where id=1 for update
                DIGEST: 315b4a6a8f7424bc7591256a8937a213
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 19
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 280428752972000
    *************************** 3. row ***************************
             THREAD_ID: 38
              EVENT_ID: 31
          END_EVENT_ID: 33
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t1 where id=3 for update
                DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 19
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 296208754204000
    *************************** 4. row ***************************
             THREAD_ID: 38
              EVENT_ID: 35
          END_EVENT_ID: 37
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t1 where id=1 for update
                DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 19
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 319574947823000
    4 rows in set (0.00 sec)
    
    • 事务2

    最后一条语句的END_EVENT_ID=114
    最后一条语句的NESTING_EVENT_ID=96
    THREAD_ID=37

    因此我们可以通过如下语句得出经历的所有语句如下:

    mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
    CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE,MESSAGE_TEXT,timer_start 
    from events_statements_history_long where thread_id = 37 and END_EVENT_ID >= 96 
    and END_EVENT_ID <= 114G
    *************************** 1. row ***************************
             THREAD_ID: 37
              EVENT_ID: 95
          END_EVENT_ID: 96
            EVENT_NAME: statement/sql/begin
                SOURCE: socket_connection.cc:101
              SQL_TEXT: begin
                DIGEST: f57daa74a09445d1e1c496f28fe6d906
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: NULL
    NESTING_EVENT_TYPE: NULL
          MESSAGE_TEXT: NULL
           timer_start: 251219441701000
    *************************** 2. row ***************************
             THREAD_ID: 37
              EVENT_ID: 98
          END_EVENT_ID: 106
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t1 where id=1 for update
                DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 96
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 266799699801000
    *************************** 3. row ***************************
             THREAD_ID: 37
              EVENT_ID: 108
          END_EVENT_ID: 110
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t2 where id=6 for update
                DIGEST: 315b4a6a8f7424bc7591256a8937a213
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 96
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: NULL
           timer_start: 308095739676000
    *************************** 4. row ***************************
             THREAD_ID: 37
              EVENT_ID: 112
          END_EVENT_ID: 114
            EVENT_NAME: statement/sql/select
                SOURCE: socket_connection.cc:101
              SQL_TEXT: select * from t2 where id=1 for update
                DIGEST: 315b4a6a8f7424bc7591256a8937a213
        CURRENT_SCHEMA: txc_test
      NESTING_EVENT_ID: 96
    NESTING_EVENT_TYPE: TRANSACTION
          MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction
           timer_start: 328978250011000
    4 rows in set (0.01 sec)
    
    
    
    (3)猜测死锁形成过程

    根据上面两个事务执行的sql大致可以推断出死锁的sql语句,如果两个事务里面执行的sql很多那可能就需要花更多的时间来找出造成死锁的语句:

    TX1TX2
      select * from t1 where id=1 for update;
    select * from t2 where id=1 for update;  
    select * from t1 where id=3 for update;  
      select * from t2 where id=6 for update;
    select * from t1 where id=1 for update;  
      select * from t2 where id=1 for update;

    三、总结

    • 通过以上的查询基本可以获取造成死锁的事务内执行的语句,由于线上业务量大可能造成events_statements_history_long查询不到需要的语句(默认存储10000条),需要及时监控发现死锁。
    • 打开performance_schema的选项,有性能损失。
    • 如果线上实例是每个database对应一个独立用户,可以通过设置收集指定用户执行的所有event。
      1).关闭收集所有用户的event
      update setup_actors set ENABLED='NO',HISTORY='NO';
      2).插入需要收集event的指定用户(例如我只想收集txc用户下的所有event,参考如下)
      insert into setup_actors select '%','txc','%','YES','YES';
      3).select * from setup_actors;
             +------+------+------+---------+---------+
             | HOST | USER | ROLE | ENABLED | HISTORY |
             +------+------+------+---------+---------+
             | %    | %    | %    | NO      | NO      |
             | %    | txc  | %    | YES     | YES     |
             +------+------+------+---------+---------+
    
     


    作者:重庆八怪
    链接:https://www.jianshu.com/p/268889c997e8
    来源:简书
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
  • 相关阅读:
    BZOJ 1014 火星人prefix
    BZOJ 1013 球形空间产生器
    BZOJ 1012 最大数
    BZOJ 1011 遥远的行星
    BZOJ 1010 玩具装箱
    BZOJ 1009 GT考试
    BZOJ 1008 越狱
    BZOJ 1007 水平可见直线
    BZOJ 1006 神奇的国度
    Luogu 1450 [HAOI2008]硬币购物
  • 原文地址:https://www.cnblogs.com/zping/p/14378247.html
Copyright © 2011-2022 走看看