11.2.0.3 下尝试使用11g health monitor新特性时出现了ORA-00604、ORA-01427, 查询MOS发现 (Bug 12385172: ORA-01427 WHEN EXECUTING DBMS_HM.RUN_CHECK),当 DB中存在case when then的function index时会触发该BUG:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com
SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-2');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-2'); END;
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1
可以通过以下脚本找出 DB中case when then类型的函数索引:
-- Determine DDL statements (note: this will take a while to return results!)
set long 100000
exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',true);
exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',false);
exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);
exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',false);
-- Checking the DDL statement
col DDL form a100 word_wrapped
select dbms_metadata.get_ddl(RTRIM(UPPER(object_type)),
RTRIM(UPPER(object_name)),
RTRIM(UPPER(owner))) DDL
from DBA_OBJECTS
where object_type='INDEX'
and object_id
in (select x from (select obj# x, obj#||','||intcol#, count(obj#||','||intcol#)
from ICOLDEP$
group by obj#, obj#||','||intcol# having count(*) > 1)
);
对于安装了APEX 组件或者在DBCA创建数据库时选择了General Purpose从Seed中clone数据库而非Custom Database的DB ,都会创建有"APEX_030200"."WWV_FLOW_WORKSHEETS_UNQ_IDX"、"APEX_030200"."WWV_FLOW_WS_UNQ_ALIAS_IDX"、"APEX_030200"."WWV_FLOW_WORKSHEET_RPTS_UK" 三个函数索引。
如果没有实际使用APEX组件的话,我们可以直接DROP掉APEX_030200:
SQL> drop user "APEX_030200" cascade;
User dropped.
SQL> set long 100000
SQL>
SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',true);
PL/SQL procedure successfully completed.
SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',false);
PL/SQL procedure successfully completed.
SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);
PL/SQL procedure successfully completed.
SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',false);
PL/SQL procedure successfully completed.
SQL>
SQL> -- Checking the DDL statement
SQL> col DDL form a100 word_wrapped
SQL> select dbms_metadata.get_ddl(RTRIM(UPPER(object_type)),
2 RTRIM(UPPER(object_name)),
3 RTRIM(UPPER(owner))) DDL
4 from DBA_OBJECTS
5 where object_type='INDEX'
6 and object_id
7 in (select x from (select obj# x, obj#||','||intcol#, count(obj#||','||intcol#)
8 from ICOLDEP$
9 group by obj#, obj#||','||intcol# having count(*) > 1)
10 );
no rows selected
再次尝试测试health check dictionary 发现问题仍存在:
SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-mac3');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-mac3'); END;
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1
到这一步决定自己来诊断这个ORA-01427错误的根源, 因为是递归SQL层出现故障,所以这里我们可以用到ERRORSTACK来深入了解问题:
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 1427 trace name errorstack level 4;
Statement processed.
/* 以上我们设置当触发1427错误事件时TRACE level 4的错误堆栈ERRORSTACK */
SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-mac4');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-mac4'); END;
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1
/* 触发ORA-01427 错误 将生成相关TRACE 信息*/
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_7781.trc
来进一步观察生成的TRACE文件:
*** 2012-04-30 09:20:55.438
dbms_hm: (In run_check)
Begin dbkhicd_run_check
dbkh_run_check_internal: BEGIN; check_namep=Dictionary Integrity Check, run_namep=check-mac4
dbkh_run_check_internal: BEGIN; timeout=0
dbkh_run_check_internal: AFTER RUN CREATE; run_id=1281
*** 2012-04-30 09:20:55.603
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=4, mask=0x0)
----- Error Stack Dump -----
ORA-01427: single-row subquery returns more than one row
----- Current SQL Statement for this session (sql_id=gxjzd1s7m8xfj) -----
select 52, rowid, 'ind$.obj#'
from IND$
where obj# < 0
union all
select 57, rowid, 'ind$.type#'
from IND$
where type# not between 1 and 9
union all
select 58, rowid, 'ind$.pctfree$'
from IND$
where pctfree$ not between 0 and 99
union all
select 59, rowid, 'ind$.analyzetime <= SYSDATE'
from IND$
where analyzetime > SYSDATE
union all
select 51, rowid, 'ind$.obj# pk'
from IND$
where obj# is null
union all
select 51, rowid, 'ind$.obj# pk'
from IND$
where 1 > (select obj# from IND$ group by obj# having count(*) > 1)
union all
select 53, rowid, 'ind$.dataobj# range'
from IND$
where 1 >
(select dataobj# from IND$ group by dataobj# having count(*) > 1)
union all
select 54, rowid, 'ind$.ts# fk'
from IND$
where (ts#) in (select ts#
from IND$
where (ts#) not in (select ts# from ts$)
and ts# != 2147483647)
union all
select 55, rowid, 'ind$.ts,file,block fk'
from IND$
where (ts#, file#, block#) in (select ts#, file#, block#
from IND$
where (ts#, file#, block#) not in
(select ts#, file#, block# from seg$)
and file# != 0
and block# != 0)
union all
select 56, rowid, 'ind$.obj# fk_obj$'
from IND$
where (obj#) in
(select obj# from IND$ where (obj#) not in (select obj# from obj$))
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0xb1269160 191 package body SYS.DBMS_HM
0xb1d9f600 1 anonymous block
实际触发ORA-01427的是一条较长的递归SQL语句,该SQL由多个部分UNION ALL组合而成负责检测IND$基表是否存在逻辑不一致, 实际检测可以发现真真存在问题的是 这一段SQL:
select 53, rowid, 'ind$.dataobj# range'
from IND$
where 1 >
(select dataobj# from IND$ group by dataobj# having count(*) > 1)
ERROR at line 4:
ORA-01427: single-row subquery returns more than one row
SQL> select dataobj# from IND$ group by dataobj# having count(*) > 1;
DATAOBJ#
----------
75601
75599
75594
75605
IND$ 基表上居然存在多条dataobj#重复的记录,我们来看看是哪些对象:
select /*+ first_rows */
owner, object_name, data_object_id
from dba_objects
where data_object_id in
(select dataobj# from IND$ group by dataobj# having count(*) > 1)
order by 3 ;
OWNER OBJECT_NAME DATA_OBJECT_ID
------------------------------ ------------------------------ --------------
SYS SYS_C0010990 75594
OE WHS_LOCATION_IX 75594
OE ORD_CUSTOMER_IX 75599
SYS SYS_IOT_TOP_75598 75599
SYS SYS_IOT_TOP_75600 75601
OE CUST_ACCOUNT_MANAGER_IX 75601
OE PROD_SUPPLIER_IX 75605
SYS SYS_IOT_TOP_75603 75605
8 rows selected.
OE这个Sample Schema下的多个索引居然和SYS用户的一些索引的DATA_OBJECT_ID重号; 我们不可能去改动SYS下的对象,而OE这个Schema则无关紧要,删除这些OE下的问题索引:
SQL> drop index oe.WHS_LOCATION_IX;
Index dropped.
SQL> drop index oe.ORD_CUSTOMER_IX;
Index dropped.
SQL> drop index oe.CUST_ACCOUNT_MANAGER_IX;
Index dropped.
SQL> drop index oe.PROD_SUPPLIER_IX;
Index dropped.
SQL> select dataobj# from IND$ group by dataobj# having count(*) > 1;
DATAOBJ#
----------
再次测试后成功执行Dictionary Integrity Check
SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-mac5');
PL/SQL procedure successfully completed.
SQL> set pause on;
SQL> spool dic_check
SQL> SET LONG 100000
SQL> SET LONGCHUNKSIZE 1000
SQL> SET PAGESIZE 100
SQL> SET LINESIZE 512
SQL> SELECT DBMS_HM.GET_RUN_REPORT('CHECK-MAC5') FROM DUAL;
DBMS_HM.GET_RUN_REPORT('CHECK-MAC5')
-----------------------------------------------------
Basic Run Information
Run Name : check-mac5
Run Id : 1301
Check Name : Dictionary Integrity Check
Mode : MANUAL
Status : COMPLETED
Start Time : 2012-04-30 09:33:28.540140 -04:00
End Time : 2012-04-30 09:33:32.303679 -04:00
Error Encountered : 0
Source Incident Id : 0
Number of Incidents Created : 0
Input Paramters for the Run
TABLE_NAME=ALL_CORE_TABLES
CHECK_MASK=ALL
Run Findings And Recommendations
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 1302
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: syn$.owner fk 95 on object SYN$
failed
Message : Damaged rowid is AAAABEAABAAANWgAB7 - description: Synonymn
APEX is referenced
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 1305
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: syn$.owner fk 95 on object SYN$
failed
Message : Damaged rowid is AAAABEAABAAANWhAAu - description: Synonymn
APEXWS is referenced
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 1308
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: syn$.owner fk 95 on object SYN$
failed
Message : Damaged rowid is AAAABEAABAAANWgACO - description: Synonymn
APEX_ACTIVITY_LOG is referenced
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 1311
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: syn$.owner fk 95 on object SYN$
failed
Message : Damaged rowid is AAAABEAABAAANWgABl - description: Synonymn
APEX_ADMIN is referenced
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 1314
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: syn$.owner fk 95 on object SYN$
failed
Message : Damaged rowid is AAAABEAABAAANWgACB - description: Synonymn
APEX_APPLICATION is referenced
这个case希望大家能了解的是对于ORA-00604这类递归SQL层的错误,报错信息本身给出的诊断信息是不完整的,需要我们通过一些工具来深入了解实际引发错误的是哪一条SQL语句,这些Recusive SQL出错的主要原因往往是BUG、或者数据字典存在不一致。如何在脱离MOS和SR帮助的情况下,安全地WorkAround绕过这个错误。