zoukankan      html  css  js  c++  java
  • MySQL死锁案例分析

    最近项目中某个模块稳定复现MySQL死锁问题,本文记录死锁的发生原因以及解决办法。

    1. 预备知识

    1.1 表锁和行锁

    • 表锁

    表锁是MySQL中最基本的锁策略,并且是开销最小的策略。表锁会锁定整张数据表,用户的写操作(插入/删除/更新)前,都需要获取写锁(写锁会相互阻塞);没有写锁时,读取用户才能获取读锁(读锁不会相互阻塞)。

    • 行锁(仅限定于InnoDB)

    行级锁可以最大程度的支持并发处理(同时也带来了最大的锁开销)。行级锁只在存储引擎实现,而MySQL服务器层没有实现。服务器层完全不了解存储引擎中的具体实现。

    1.2 行锁简介

    行锁的模式有:读/写意向锁(IS/IX锁),读锁(S锁),写锁(X锁)以及自增锁(AI)。

    行锁根据场景的不同又可以进一步细分,依次为Next-Key Lock,Gap Lock间隙锁,Record Lock记录锁和插入意向GAP锁。不同的锁锁定的位置是不同的,比如说记录锁只锁住对应的记录,而间隙锁锁住记录和记录之间的间隔,Next-Key Lock锁住记录和记录之前的间隙。不同类型锁的锁定范围大致如下图所示。

    avatar

    此外,锁对应的死锁日志信息标记如下所示:

    • 记录锁(LOCK_REC_NOT_GAP): lock_mode X locks rec but not gap
    • 间隙锁(LOCK_GAP): lock_mode X locks gap before rec
    • Next-key锁(LOCK_ORNIDARY): lock_mode X
    • 插入意向锁(LOCK_INSERT_INTENTION): lock_mode X locks gap before rec insert intention

    1.3 行锁加锁示例

    InnoDB是聚簇索引,也就是B+树的叶子节点存储了主键索引以及数据行;InnoDB的二级索引的叶子节点存储的则是主键值,所以通过二级索引查询数据时,需要根据查询到的主键去聚簇索引中再次进行查询。

    update user set age = 10 where id = 49;
    update user set age = 10 where name = 'Tom';
    

    (1)第一条SQL使用主键进行查询,则只需要在id=49主键上加上写锁(X锁);
    (2)第二条SQL使用二级索引查询,首先在name='Tom'上加写锁,然后根据获取的主键索引查询,在id=49主键上添加写锁。

    具体如下图所示:
    avatar

    以上是基于单条数据讨论,针对多条数据:

    update user set age = 10 where id > 49;
    

    执行步骤:

    (1)MySQL Server根据where条件读取满足条件的第一条记录,InnoDB引擎返回行记录并加锁;
    (2)MySQL Server发起更新行记录的update请求,更新此记录;
    (3)反复循环(1)(2)步骤,直到所有满足条件的记录均被修改。

    具体如下图所示:
    avatar

    2.准备工作

    2.1 创建数据表并初始化

    create table dead_lock_test
    (
        id int auto_increment
            primary key,
        v1 int not null,
        v2 int not null
    );
    
    insert into  dead_lock_test (v1,v2) value (1,1);
    insert into  dead_lock_test (v1,v2) value (2,2);
    insert into  dead_lock_test (v1,v2) value (3,3);
    

    需要注意,数据表中仅存在主键索引。此外,默认数据库引擎为InnoDB,事务隔离级别为RR(可重复读,相对于RC解决了幻读)。

    2.2 开启锁监控

    使用如下语句,开启MySQL锁监控:

    # 开启
    set GLOBAL innodb_status_output=ON;
    set GLOBAL innodb_status_output_locks=ON;
    
    # 关闭
    set GLOBAL innodb_status_output_locks=OFF;
    

    3.场景复现

    开启两个数据库连接,分别执行如下SQL语句:

    # session1
    start transaction ;
    insert into  dead_lock_test (v1,v2) value (4,4);
    delete from dead_lock_test where v1 = 4 and v2 = 4;
    commit;
    
    # session2
    start transaction;
    insert into  dead_lock_test (v1,v2) value (5,5);
    delete from dead_lock_test where v1 = 5 and v2 = 5;
    commit;
    

    不要问事务里就两条SQL,插入后删除走回滚就可以了之类的问题(我也不知道为什么这么写的)。

    事务执行步骤如下表所示:

    session1 session2 stage 备注
    start transaction; start transaction;
    insert into dead_lock_test (v1,v2) value (4,4); do nothing 执行成功
    do nothing insert into dead_lock_test (v1,v2) value (5,5); stage1 执行成功
    delete from dead_lock_test where v1 = 4 and v2 = 4; do nothing stage2 session1执行结果阻塞
    do nothing delete from dead_lock_test where v1 = 5 and v2 = 5; stage3 session2执行结果报deadlock

    3.1 stage1

    执行show engine innodb status;节选事务信息如下所示:

    ------------
    TRANSACTIONS
    ------------
    Trx id counter 91328
    Purge done for trx's n:o < 91327 undo n:o < 0 state: running but idle
    History list length 19
    LIST OF TRANSACTIONS FOR EACH SESSION:
    ---TRANSACTION 91327, ACTIVE 37 sec
    1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
    MySQL thread id 24, OS thread handle 15668, query id 3147 localhost 127.0.0.1 root
    TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock mode IX
    ---TRANSACTION 91322, ACTIVE 44 sec
    1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
    MySQL thread id 23, OS thread handle 22788, query id 3103 localhost 127.0.0.1 root
    TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock mode IX
    

    输出内容中节选当前事务信息,当前存在两个运行中事务,trx id分别为91322以及91327。

    • TABLE LOCK table igw_proxy_rule_management.dead_lock_test trx id 91322 lock mode IX: dead_lock_test表上添加IX锁。

    91322事务对应session1,91327事务对应session2.

    3.2 stage2

    执行delete from dead_lock_test where v1 = 4 and v2 = 4;后可发现,当前事务被阻塞。
    执行show engine innodb status;节选事务信息如下所示:

    因为输出内容较多,直接在输出内容中添加注释进行解析。

    ------------
    TRANSACTIONS
    ------------
    Trx id counter 91332
    Purge done for trx's n:o < 91332 undo n:o < 0 state: running but idle
    History list length 21
    LIST OF TRANSACTIONS FOR EACH SESSION:
    ---TRANSACTION 91327, ACTIVE 58 sec
    * 2 lock strcut(s): 事务91327中锁链表长度为2(每个链表节点表示该事务持有的一个锁结构,包括表锁/记录锁等),当前事务包含表锁(IX)以及一个行锁(记录锁);
    * 1 row lock(s):当前事务持有的行锁个数;
    * undo log entries 1:当前事务的undo log个数
    2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
    MySQL thread id 24, OS thread handle 15668, query id 3147 localhost 127.0.0.1 root
    * TABLE LOCK:当前事务持有的表锁(IX)
    TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock mode IX
    * RECORD LOCKS:当前事务持有的行锁(lock_mode X locks rec but not gap)
    * space id 92: dead_lock_test所在空间编号
    * page no 4: 当前记录所在页码
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X locks rec but not gap
    * 行锁信息: heap no=6
    Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
    
     0: len 4; hex 80000005; asc     ;; * hex 80000005:当前加锁的记录id=5
     1: len 6; hex 0000000164bf; asc     d ;;   * hex 0000000164bf: 事务ID;
     2: len 7; hex 81000000b20110; asc        ;;    * hex 81000000b20110: 回滚指针;
     3: len 4; hex 80000005; asc     ;; * hex 80000005: v1字段对应数值;
     4: len 4; hex 80000005; asc     ;; * hex 80000005:v2字段对应数值;
    
    ---TRANSACTION 91322, ACTIVE 65 sec fetching rows
    * tables in use 1: 有1个表正在被使用;
    * locked 1: 有一个表锁
    mysql tables in use 1, locked 1
    * LOCK WAIT:事务91322处于锁等待状态;其他字段解释详见上问
    LOCK WAIT 5 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
    MySQL thread id 23, OS thread handle 22788, query id 3199 localhost 127.0.0.1 root updating
    * 事务91322当前执行SQL语句
    /* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 4 and v2 = 4
    * 事务91322等待的锁信息
    ------- TRX HAS BEEN WAITING 8 SEC FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
    * 事务91322等待的记录锁(锁对应记录主键为5,被事务91327持有)
    Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000005; asc     ;;
     1: len 6; hex 0000000164bf; asc     d ;;
     2: len 7; hex 81000000b20110; asc        ;;
     3: len 4; hex 80000005; asc     ;;
     4: len 4; hex 80000005; asc     ;;
    
    ------------------
    * 以下展示事务91322所持有的锁以及尝试获取的锁,首先是表意向锁(IX锁)
    TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock mode IX
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X
    * 记录锁(锁对应记录主键为1)
    Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000001; asc     ;;
     1: len 6; hex 0000000164b3; asc     d ;;
     2: len 7; hex 81000000ad0110; asc        ;;
     3: len 4; hex 80000001; asc     ;;
     4: len 4; hex 80000001; asc     ;;
    
    * 记录锁(锁对应记录主键为2)
    Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000002; asc     ;;
     1: len 6; hex 0000000164b4; asc     d ;;
     2: len 7; hex 82000000ad0110; asc        ;;
     3: len 4; hex 80000002; asc     ;;
     4: len 4; hex 80000002; asc     ;;
    
    * 记录锁(锁对应记录主键为3)
    Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000003; asc     ;;
     1: len 6; hex 0000000164b9; asc     d ;;
     2: len 7; hex 81000000b00110; asc        ;;
     3: len 4; hex 80000003; asc     ;;
     4: len 4; hex 80000003; asc     ;;
    
    * 记录锁:锁定记录(添加记录时创建的锁)
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X locks rec but not gap
    Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
     0: len 4; hex 80000004; asc     ;;
     1: len 6; hex 0000000164ba; asc     d ;;
     2: len 7; hex 020000011a03cb; asc        ;;
     3: len 4; hex 80000004; asc     ;;
     4: len 4; hex 80000004; asc     ;;
    
    * 间隙锁:锁定记录(删除记录时创建的锁,在RR模式下生效,主要解决幻读)
    * 需要注意,InnoDB的删除记录不是物理删除,而是标记删除(等待后续记录覆盖),因此可理解删除类似于更新操作
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X locks gap before rec
    Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
     0: len 4; hex 80000004; asc     ;;
     1: len 6; hex 0000000164ba; asc     d ;;
     2: len 7; hex 020000011a03cb; asc        ;;
     3: len 4; hex 80000004; asc     ;;
     4: len 4; hex 80000004; asc     ;;
    
    * 事务91322尝试获取的锁(被事务91327持有)
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
    Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000005; asc     ;;
     1: len 6; hex 0000000164bf; asc     d ;;
     2: len 7; hex 81000000b20110; asc        ;;
     3: len 4; hex 80000005; asc     ;;
     4: len 4; hex 80000005; asc     ;;
    

    (1)由以上注释可知,事务91322在尝试删除时,会对表中所有记录添加记录锁。
    这是因为当前删除记录条件为v1 = 4 and v2 = 4,在v1与v2字段上,并未建立相应的索引。
    因为无法通过索引确定主键,导致MySQL Server会先尝试锁定当前dead_lock_test表中所有记录添加记录锁(可以设置参数进行优化,根据where条件逐渐解除不满足条件记录上的记录锁)。

    (2)事务91322尝试对dead_lock_test表中所有记录添加锁,发现记录(id=5)已经被事务91327添加记录锁,导致事务91322只能等待事务91327放弃记录锁。

    3.3 stage3

    执行delete from dead_lock_test where v1 = 5 and v2 = 5;后即可发现终端输出:

    [2021-05-13 15:33:29] [40001][1213] Deadlock found when trying to get lock; try restarting transaction
    

    执行show engine innodb status;节选死锁信息如下所示:

    因为内容较多,不再列出解释,详见输出信息中文注释部分

    ------------------------
    LATEST DETECTED DEADLOCK
    ------------------------
    2021-05-13 17:27:09 0xca4
    *** (1) TRANSACTION:
    * 事务91322持有锁情况,在stage2已经详细解释,此处不再赘述
    TRANSACTION 91322, ACTIVE 78 sec fetching rows
    mysql tables in use 1, locked 1
    LOCK WAIT 5 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
    MySQL thread id 23, OS thread handle 22788, query id 3199 localhost 127.0.0.1 root updating
    /* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 4 and v2 = 4
    
    *** (1) HOLDS THE LOCK(S):
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X
    Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000001; asc     ;;
     1: len 6; hex 0000000164b3; asc     d ;;
     2: len 7; hex 81000000ad0110; asc        ;;
     3: len 4; hex 80000001; asc     ;;
     4: len 4; hex 80000001; asc     ;;
    
    Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000002; asc     ;;
     1: len 6; hex 0000000164b4; asc     d ;;
     2: len 7; hex 82000000ad0110; asc        ;;
     3: len 4; hex 80000002; asc     ;;
     4: len 4; hex 80000002; asc     ;;
    
    Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000003; asc     ;;
     1: len 6; hex 0000000164b9; asc     d ;;
     2: len 7; hex 81000000b00110; asc        ;;
     3: len 4; hex 80000003; asc     ;;
     4: len 4; hex 80000003; asc     ;;
    
    
    *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
    Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000005; asc     ;;
     1: len 6; hex 0000000164bf; asc     d ;;
     2: len 7; hex 81000000b20110; asc        ;;
     3: len 4; hex 80000005; asc     ;;
     4: len 4; hex 80000005; asc     ;;
    
    
    *** (2) TRANSACTION:
    TRANSACTION 91327, ACTIVE 71 sec starting index read
    mysql tables in use 1, locked 1
    LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
    MySQL thread id 24, OS thread handle 15668, query id 3237 localhost 127.0.0.1 root updating
    /* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 5 and v2 = 5
    
    *** (2) HOLDS THE LOCK(S):
    * 事务91327持有记录(id=5)的记录锁,此锁正在被事务91322等待持有
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X locks rec but not gap
    Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000005; asc     ;;
     1: len 6; hex 0000000164bf; asc     d ;;
     2: len 7; hex 81000000b20110; asc        ;;
     3: len 4; hex 80000005; asc     ;;
     4: len 4; hex 80000005; asc     ;;
    
    
    *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
    * 事务91327等待持有记录锁,锁信息见后续解释
    RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X waiting
    * 事务91327等待持有记录(id=1)的记录锁(delete无法走索引查询,因此会尝试对所有表记录进行加锁,但是事务91322持有id=1/2/3/4的记录锁,死锁条件构成)
    Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
     0: len 4; hex 80000001; asc     ;;
     1: len 6; hex 0000000164b3; asc     d ;;
     2: len 7; hex 81000000ad0110; asc        ;;
     3: len 4; hex 80000001; asc     ;;
     4: len 4; hex 80000001; asc     ;;
    
    *** WE ROLL BACK TRANSACTION (2)
    

    由上可知:
    (1)事务91322执行删除操作时,尝试获取表中所有记录的记录锁,其中记录(id=5)的锁被事务91327持有;
    (2)事务91327执行删除操作时,尝试获取表中所有记录的记录锁,发现记录(id=1/2/3/4)的锁被事务91322持有;
    (3)至此,事务91322与事务91327构成互相等待,死锁形成。

    解决方案

    4.1 添加索引

    由上述的分析可知,删除时因为where条件无法利用索引,导致MySQL会尝试对表中所有记录加记录锁,产生死锁。
    我们仅需在v1以及v2字段上建立联合索引,缩小记录冲突范围。

    create index dead_lock_test_v2_v1_index on dead_lock_test (v1, v2);
    

    此处没有设置唯一索引,如果多个事务根据索引查询,锁定的记录存在重叠,也容易复现死锁现象。
    不过当前业务侧的数据插入,可保证在短暂时间范围内,不存在重叠记录,且表中存在一些重复数据,因此不使用唯一索引。

    4.2 最终

    表中添加索引。事务中添加后再删除,通过回滚实现。

    参考文章

    把MySQL中的各种锁及其原理都画出来
    MySQL锁系列(二)之 锁解读

  • 相关阅读:
    小程序 订阅消息功能实现 wx.requestSubscribeMessage
    Azure API Management (2) API Managment 保护内网服务,且通过JWT验证
    Azure Firewall (1) Azure虚拟桌面结合Azure防火墙设置访问白名单
    一文详解脏读、不可重复读、幻读
    SpringBoot引入第三方jar的Bean的三种方式
    单例模式,真不简单
    一文详解MySQL的锁机制
    高并发下秒杀商品,必须知道的9个细节
    MySQL记录锁、间隙锁、临键锁小案例演示
    Git五个常见问题及解决方法
  • 原文地址:https://www.cnblogs.com/jason1990/p/14767280.html
Copyright © 2011-2022 走看看