数据库总是报错:
Tue May 6 13:44:47 2008
SMON: about to recover undo segment 119
ORACLE Instance topcs2 (pid = 11) - Error 1591 encountered while recovering tran
saction (119, 18) on object 2309045.
Errors in file /u2/oracle/ora92/rdbms/log/topcs2_ora_2899.trc:
ORA-01591: lock held by in-doubt distributed transaction 108.28.46269
SMON: mark undo segment 119 as needs recovery
我问小孙这个接口是不是Tuxedo的,小孙说是Tuxedo 6.5的应用。这种XA应用出问题导致ORA-1591的问题
也算是常见问题了,我马上查找119号回滚段里是不是有死事务:
SELECT KTUXEUSN,KTUXESLT,KTUXESQN,
KTUXESTA Status,KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
AND ktuxeusn=119;
查询结果是没有任何记录。我一看原来in-doubt 的分布式事务是108.28.46269,于是马上查询108号回滚段里
是不是有死事务:
SELECT KTUXEUSN, KTUXESLT, KTUXESQN,
KTUXESTA Status, KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
AND ktuxeusn=108;
KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
108 28 46269 PREPARED SCO|COL|REV|DEAD
确实存在一个Prepared 状态的死事务。于是马上检查了一下pending_trans$表:
SELECT local_tran_id, global_tran_fmt, global_oracle_id,
global_foreign_id, state, status, heuristic_dflt,
session_vector, reco_vector,
global_commit#
FROM PENDING_TRANS$;
LOCAL_TRAN_ID GLOBAL_TRAN_FMT GLOBAL_ORACLE_ID GLOBAL_FOREIGN_ID STATE STATUS HEURISTIC_DFLT SESSION_VECTOR RECO_VECTOR GLOBAL_COMMIT#
70.6.108873 306206 TOPCS.c7dd20c6.70.6.108873 forced rollback P 00000001 00000001 1486865870
96.42.84009 306206 TOPCS.c7dd20c6.96.42.84009 forced rollback P 00000001 00000001 1286932454
85.10.101067 306206 TOPCS.c7dd20c6.85.10.101067 forced rollback P 00000001 00000001 1487167659
9.35.29156 306206 TOPCS.c7dd20c6.9.35.29156 collecting P 00000001 00000001 1672793351
80.8.132177 306206 TOPCS.c7dd20c6.80.8.132177 forced rollback P 00000001 00000001 1679427495
18.32.162778 306206 TOPCS.c7dd20c6.18.32.162778 forced commit P 00000001 00000001
64.21.136442 306206 TOPCS.c7dd20c6.64.21.136442 forced commit P 00000001 00000001
73.11.124822 306206 TOPCS.c7dd20c6.73.11.124822 forced rollback P 00000001 00000001 1731227073
63.29.148558 306206 TOPCS.c7dd20c6.63.29.148558 forced commit P 00000001 00000001
发现并不存在108.28.46249这个事务。看样子是系统字典表出现问题了,导致这个分布式事务无法被自动回退。首先看看这个
事务锁定的对象是什么:
Select NAME from OBJ$ WHERE OBJ#=2309045;
NAME
---------------------------------------
TPTB_IDX
这个一个索引,基表是T_TPTB.对这张表进行全表扫描,看会发生些什么:
SELECT /*+ full(a) */ count(*) FROM T_TPTB a;
ORA-01591:lock held by in-doubt distributed transaction 73.11.124822
这张表被另外一个分布式事务73.11.124822 锁定了。查看下73号回滚段的情况:
SELECT KTUXEUSN, KTUXESLT, KTUXESQN,
KTUXESTA Status, KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
AND ktuxeusn=73;
KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
73 11 124822 PREPARED SCO|COL|REV|DEAD
执行ROLLBACK FORCE 73.11.124822时出现了错误:
SQL>ROLLBACK FORCE '73.11.124822'
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 73.11.124822
在 pending_trans$ 查看目前这个事务的状态不是PREPARED,因此需要首先在pending_trans$里将这个事务的状态设置为
prepared:
UPDATE pending_trans$SET STATE='prepared',
STATUS='p'
WHERE local_tran_id='73.11.124822';
COMMIT;
执行了这个语句后,再次执行 ROLLBACK FORCE '73.11.124822';
SQL>ROLLBACK FORCE '73.11.124822';
ROLLBACK FORCE '73.11.124822'
*
ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 108.28.46269
看样子这个事务被我们最初发现的那个事务锁住了。这两个事务一个锁住了表,一个锁住了索引,所以只能看看能不能强制提交了:
SQL>UPDATE pending_trans$ SET STATE='prepared',
STATUS='P'
where local_tran_id='73.11.124822';
SQL>commit;
SQL>commit force '73.11.124822';
使用这个命令,这个事务被成功的强制提交了。下面来处理108回滚段的那个事务。由于这个事务在pending_trans$中缺少记录,所以首先
将这个事务清理掉:
SQL>exec dbms_transaction.purge_lost_db_entry('108.28.46269')
然后手工插入相关记录:
SQL>alter system disable distributed recovery;
SQL> insert into pending_trans$ (
LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE#,
FAIL_TIME,
RECO_TIME)
values('108.28.46269',
306206,
'XXXXXXX.12345.1.2.3',
'prepared','P',
hextoraw( '00000001' ),
hextoraw( '00000000' ),
0, sysdate, sysdate );
SQL>insert into pending_sessions$
values( '108.28.46269',
1, hextoraw('05004F003A1500000104'),
'C', 0, 30258592, '',
146
);
SQL>Commit;
然后再次强制提交:
SQL>commit force '108.28.46249';
commit compelte.
完成提交后,小孙再次检查了一下那个应用,发现问题解决了。
主题:ORA-1591的补充说明
前几天流沙的ORA-1591问题,由于是QQ对话,可能对于对ORA-1591缺乏经验的人来说不容易看懂,本帖针对这个问题进行进一步的介绍。这样参考那个实例就可以更加清晰的了解ORA-1591问题了。
ORA-01591: "lock held by in-doubt distributed transaction %s"
Cause: Trying to access resource that is locked by a dead
two-phase commit transaction that is in prepared state.
Action: DBA should query the pending_trans$ and related tables,
and attempt to repair network connection(s) to
coordinator and commit point. If timely repair is not
possible, DBA should contact DBA at commit point if
known or end user for correct outcome, or use heuristic
default if given to issue a heuristic commit or abort
command to finalize the local portion of the
distributed transaction.
总的来说ORA-1591的产生原因是分布式事务失败,失败的原因很多,比如网络问题、XA资源管理器存在BUG等,都可能引起失败。一旦分布式事务失败,本地事务中,如果有一个事务挣处于活跃状态,那么该事务相关的数据就会被锁定(无论读写都会被锁定),如果访问这个事务关联的数据,就会报ORA-1591。一般情况下,ORA-1591可以自动的解开,SMON会在一定时间周期内检查DBA_2PC_PENDING,找出需要回退的事务,并进行自动的恢复。这里就有几个问题,由于分布式事务超时判断以及RECO处理周期的关系,一般来说事务自动恢复的时间为1分钟以上,较长的可以达到5-10分钟。可能会对生产系统造成比较大的影响。为了加快解锁,可以使用手工处理。这个时候可以使用ROLLBACK FORCE或者COMMIT FORCE。
有时候由于分布式事务恢复出现故障,会出现数据字典不一致,此时该分布式事务就无法正常解除,需要手工干预来处理。
分析方法:
1、检查分布式事务的状态:
SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, HOST, COMMIT#
FROM DBA_2PC_PENDING
WHERE LOCAL_TRAN_ID = '报错的本地事务号'
2、检查分布式事务相关其他节点的情况:
SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, INTERFACE
FROM DBA_2PC_NEIGHBORS;
3、检查本地回滚段:
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
KTUXESTA Status,
KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
AND ktuxeusn= 回滚段编码
最常见的是DBA_2PC_PENDING中存在一条PREPARED状态的记录,在回滚段中也发现一个是PREPARED状态的活跃事务,但是事务有DEAD标志。这种情况,可以根据事务号进行COMMIT FORCE或者ROLLBACK FORCE.
另外一种情况是DBA_2PC_PENDING中也有记录,但是状态和回滚段中的状态不同,这个时候,手工修改sys.pending_trans$中的状态值,然后再进行ROLLBACK FORCE/COMMIT FORCE。
如果在DBA_2PC_PENDING中找不到记录,那么要检查sys.pending_trans$、sys.pending_sessions$和sys.pending_sub_sessions$。如果是记录丢失,就可以手工插入记录,然后进行ROLLBACK/COMMIT FORCE
(参考流沙那个问题的帖子)
insert into pending_trans$ (
LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE#,
FAIL_TIME,
RECO_TIME)
values('transid',
306206, 固定值/
'XXXXXXX.12345.1.2.3', 随意取值
'prepared','P', 表示PREPARED状态
hextoraw( '00000001' ), /* constant. */
hextoraw( '00000000' ), /* */
0, sysdate, sysdate );
insert into pending_sessions$
values( '事务号',
1, hextoraw('05004F003A1500000104'),
'C', 0, 30258592, '',
146
);
commit;
4、做完ROLLBACK FORCE/COMMIT FORCE后,如果DBA_2PC_PENDING中的记录未清除或者DBA_2PC_PENDING中有记录,但是回滚段中已经无记录。那么使用:
alter session set "_smu_debug_mode" = 4; //如果9i,使用AUM需要设置,否则后面会出错commit; exec dbms_transaction.purge_lost_db_entry( '事务号' ) 实际情况要复杂的多,可以参考401302.1