1.1问题现象
DBA朋友遇到的问题,LINUX Redhat 6.9,2节点RAC,生产环境报ORA 死锁错误,检查发现存在TM锁,属于外键无索引问题,影响业务。 通过外键列创建索引解决问题。
本篇文档进行模拟测试,并在此熟悉外键无索引的情况。
1.2死锁报错模拟
SQL> create table dept1 as select * from dept; SQL> create table emp1 as select * from emp; SQL> alter table dept1 add constraint d_pk primary key(DEPTNO); SQL> alter table emp1 add constraint e_pk primary key(EMPNO); SQL> alter table emp1 add constraint for_key foreign key(DEPTNO) references dept1(DEPTNO); 模拟 session1 执行删除emp1 表deptno 10的数据 session2 执行删除emp1 表deptno 20的数据 session1 执行删除dept1 表deptno 10的数据 session2 执行删除dept2 表deptno 10的数据 session1 执行删除emp1 表deptno 10的数据 SQL> delete emp1 where deptno=10; 3 rows deleted. SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM'); SID TY ID1 ID2 LMODE REQUEST ---------- -- ---------- ---------- ---------- ---------- 312 TM 90545 0 3 0 312 TM 90546 0 3 0 312 TX 589847 9382 6 0 SQL> select object_id,object_name from dba_objects where object_id in(90546,90545); OBJECT_ID OBJECT_NAME ---------- ------------------------------ 90545 DEPT1 90546 EMP1 session2 执行删除emp1 表deptno 20的数据 SQL> delete emp1 where deptno=20; 5 rows deleted. session1 执行删除dept1 表deptno 10的数据 SQL> delete dept1 where deptno=10; hang SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM'); SID TY ID1 ID2 LMODE REQUEST ---------- -- ---------- ---------- ---------- ---------- 330 TM 90545 0 3 0 330 TM 90546 0 3 0 312 TM 90545 0 3 0 312 TM 90546 0 3 5 330 TX 131096 9439 6 0 312 TX 589847 9382 6 0 6 rows selected. 会话1,在子表存在会话2持有的TM3号锁的情况下,会话1执行删除父表主键的记录,需要申请子表的TM5号锁,无法获取到,会话hang住了! session2 执行删除dept2 表deptno 10的数据 SQL> delete dept1 where deptno=20; 会话hang 并且会话一报错! delete dept1 where deptno=10 * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource 反过来想,如果会话1,删除子表的记录,在删除父表的记录,会话2,删除父表的记录,在删除子表的记录无法操作,因为删除父表的记录就hang住了。 因此测试流程是符合业务逻辑的预期,及两个会话,一个在删除部门1,另一个在删除部门2,正常情况下不存在阻塞! 外键无索引才会产生这种独特的现象!
1.3观察死锁TRACE
Mon Oct 19 07:01:51 2020 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/tt/tt/trace/tt_ora_3910.trc. *** 2020-10-19 07:01:51.036 DEADLOCK DETECTED ( ORA-00060 ) [Transaction Deadlock] The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock: Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TM-000161b2-00000000 19 312 SX SSX 21 330 SX SSX TM-000161b2-00000000 21 330 SX SSX 19 312 SX SSX session 312: DID 0001-0013-0000000F session 330: DID 0001-0015-0000000B session 330: DID 0001-0015-0000000B session 312: DID 0001-0013-0000000F Rows waited on: Session 312: no row Session 330: no row ----- Information for the OTHER waiting sessions ----- Session 330: sid: 330 ser: 53 audsid: 734385 user: 83/SCOTT flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC pid: 21 O/S info: user: oracle, term: UNKNOWN, ospid: 3996 image: oracle@test (TNS V1-V3) client details: O/S info: user: oracle, term: pts/2, ospid: 3989 machine: test program: sqlplus@test (TNS V1-V3) application name: SQL*Plus, hash value=3669949024 current SQL: delete dept1 where deptno=20
······
Current Wait Stack:
0: waiting for 'enq: TM - contention'
name|mode=0x544d0005, object #=0x161b2, table/partition=0x0
wait_id=371 seq_num=372 snap_id=1
wait times: snap=1 min 39 sec, exc=1 min 39 sec, total=1 min 39 sec
wait times: max=infinite, heur=1 min 39 sec
wait counts: calls=33 os=33
in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 330, ser: 53
Dumping final blocker:
inst: 1, sid: 330, ser: 53
可以发现,会话当前存在TM等待阻塞现象!!!
----------------------------------------
SO: 0xf0854ab8, type: 54, owner: 0xf0a85c70, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0xf59eff48, name=DML lock, file=kta.h LINE:2261, pg=0
DML LOCK: tab=90546 flg=11 chi=0
his[0]: mod=3 spn=13039
(enqueue) TM-000161B2-00000000 DID: 0001-0013-0000000F
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x6
mode: SX, req: SSX, lock_flag: 0x0, lock: 0xf0854b18, res: 0xf3a89480
own: 0xf33d4e60, sess: 0xf33d4e60, proc: 0xf59eff48, prv: 0xf3a894b0
----------------------------------------
SO: 0xf0854958, type: 54, owner: 0xf0a85c70, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0xf59eff48, name=DML lock, file=kta.h LINE:2261, pg=0
DML LOCK: tab=90545 flg=11 chi=0
his[0]: mod=3 spn=13039
(enqueue) TM-000161B1-00000000 DID: 0001-0013-0000000F
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x6
mode: SX, lock_flag: 0x0, lock: 0xf08549b8, res: 0xf3a88be8
own: 0xf33d4e60, sess: 0xf33d4e60, proc: 0xf59eff48, prv: 0xf3a88bf8
----------------------------------------
SQL> select to_number('161B1','xxxxxx') as "161B1",to_number('161B2','xxxxx') as "161B2" from dual;
161B1 161B2
---------- ----------
90545 90546
SQL> select object_id,object_name from dba_objects where object_id in(90546,90545);
OBJECT_ID OBJECT_NAME ---------- ------------------------------ 90545 DEPT1 90546 EMP1
这两者什么关系? 主外键关系。
1.4子表外键创建索引后,观察测试业务执行的session1,session2是否正常
会话一,rollback; SQL> roll; 此时会话2,持有子表父表的TM多少级别的锁?
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
330 TM 90545 0 3 0
330 TM 90546 0 3 0
330 TX 131096 9439 6 0
也就是说,子表没有TM3号锁的情况下,会话对子表删除记录,对父表删除记录,是正常的申请子表、父表的TM3号锁的!!!但是如果子表存在其它会话TM3号锁,父表的删除会话会申请子表的TM5号锁,从而阻塞后续正常的业务!
会话1尝试,再次删除父表的记录,hang
SQL> delete dept1 where deptno=10; 满足前面说的情况。
会话1尝试,再次删除子表的记录,deptno=10 正常情况下,与会话2 删除deptno 20不冲突!
SQL> delete emp1 where deptno=10;
3 rows deleted. 可以删除! 进一步理解,外键无索引在什么情况下会出现TM等待现象了吧。
===对子表创建索引,会话1 rollback;
SQL> create index e_ind on emp1(deptno);
create index e_ind on emp1(deptno)
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
?创建索引时,会申请什么锁呢,本次测试顺带验证下。
SQL> alter session set ddl_lock_timeout=600;
SQL> create index e_ind on emp1(deptno);
hang
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
320 TM 90546 0 0 4
330 TM 90545 0 3 0
330 TM 90546 0 3 0
330 TX 131096 9439 6 0
这个时候再次测试会话1,执行删除deptno 10能成功?
SQL> delete emp1 where deptno=10; 会话hang
索引不加Online会对表申请TM4号锁,因此创建索引老DBA都建议加Online。
测试环境会话1,2 rollback;
创建索引成功后。再次测试!最初流程。
模拟
session1 执行删除emp1 表deptno 10的数据
session2 执行删除emp1 表deptno 20的数据
session1 执行删除dept1 表deptno 10的数据
session2 执行删除dept2 表deptno 10的数据
session1 执行删除emp1 表deptno 10的数据
SQL> delete emp1 where deptno=10;
3 rows deleted.
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
312 TM 90545 0 3 0
312 TM 90546 0 3 0
312 TX 655387 8935 6 0
SQL> select object_id,object_name from dba_objects where object_id in(90546,90545);
OBJECT_ID OBJECT_NAME
---------- ------------------------------
90545 DEPT1
90546 EMP1
session2 执行删除emp1 表deptno 20的数据
SQL> delete emp1 where deptno=20;
5 rows deleted.
session1 执行删除dept1 表deptno 10的数据
SQL> delete dept1 where deptno=10;
1 row deleted.
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
330 TM 90545 0 3 0
330 TM 90546 0 3 0
312 TM 90545 0 3 0
312 TM 90546 0 3 0
330 TX 524312 9604 6 0
312 TX 655387 8935 6 0
6 rows selected.
会话1,在子表存在会话2持有的TM3号锁的情况下,会话1执行删除父表主键的记录,需要申请子表的TM3号锁! 只要数据行不冲突,TM3号锁可以兼容,一个对象同时可以允许多个TM3。
session2 执行删除dept2 表deptno 10的数据
SQL> set autotrace on
SQL> delete dept1 where deptno=20;
1 row deleted.
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 13 | 0 (0)| 00:00:01 |
| 1 | DELETE | DEPT1 | | | | |
|* 2 | INDEX UNIQUE SCAN| D_PK | 1 | 13 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPTNO"=20)
这里面看不到任何关于子表外键索引的信息,但是实质上Oracle内部还是通过子表的外键列索引,快速定位到了子表的行记录。