有应用人员反映某套Linux上的11.2.0.1数据库系统中出现了UNION ALL后返回的结果集不正确的问题,我们具体分析下出现问题的其中一条语句:
SELECT MTL_SECONDARY_INVENTORIES.SECONDARY_INVENTORY_NAME,
MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
MTL_SECONDARY_INVENTORIES.DESCRIPTION,
MTL_SECONDARY_INVENTORIES.AVAILABILITY_TYPE,
MTL_SECONDARY_INVENTORIES.MATERIAL_ACCOUNT,
MTL_SECONDARY_INVENTORIES.MATERIAL_OVERHEAD_ACCOUNT,
MTL_SECONDARY_INVENTORIES.RESOURCE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.OVERHEAD_ACCOUNT,
MTL_SECONDARY_INVENTORIES.OUTSIDE_PROCESSING_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ASSET_INVENTORY,
MTL_SECONDARY_INVENTORIES.EXPENSE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ENCUMBRANCE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ATTRIBUTE3,
MTL_SECONDARY_INVENTORIES.ATTRIBUTE5,
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES,
REPEMEAERP.WORKFLOW_START_TIMES
WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/*以上是QUERY A*/
UNION ALL
/*以下是QUERY B*/
SELECT DISTINCT 'WORKORDERS',
MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
'WORK ORDERS WITH WIP AS CATEGORY VALUE',
1,
0,
0,
0,
0,
0,
1,
0,
0,
'MOI',
'0',
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES, EIMMAINT.WORKFLOW_START_TIMES
WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/
138 rows selected.
以上查询语句中,QUERY A部分(也就是UNION ALL之前的SELECT语句)单独查询时返回返回69条记录,QUERY B部分单独查询时返回15记录,UNION ALL后返回的结果却是138条记录,而非84条记录。实际上这套系统也是最近才从10g迁移到11gr2上,之前在10g中同样的应用没有出过类似的问题,可以猜测是11g中新引入的某种特性存在可能引发wrong result的Bug。
具体思路虽然有了,但仍无法确定问题的关键所在;我们来看看该SQL的执行计划:
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 2443 | 52 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 7 | 2443 | 52 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | WORKFLOW_START_TIMES | 1 | 29 | 48 (0)| 00:00:01 |
| 3 | VIEW | VW_JF_SET$9BAED2EA | 1 | 320 | 4 (0)| 00:00:01 |
| 4 | UNION ALL PUSHED PREDICATE | | | | | |
|* 5 | FILTER | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| MTL_SECONDARY_INVENTORIES | 3 | 336 | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | IDX_MTL_SECONDARY_INVENTORIES | 1 | | 1 (0)| 00:00:01 |
|* 8 | FILTER | | | | | |
| 9 | TABLE ACCESS BY INDEX ROWID| MTL_SECONDARY_INVENTORIES | 3 | 36 | 2 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | IDX_MTL_SECONDARY_INVENTORIES | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("WORKFLOW_START_TIMES"."WORKFLOW_NAME"='w_int_FreqBatch_EMEA')
5 - filter(TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')<"WORKFLOW_START_TIMES"."WORKFLOW_START_TIME") 7 - access("MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT">TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT"<="WORKFLOW_START_TIMES"."WORKFLOW_START_TIME"
)
8 - filter(TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')<"WORKFLOW_START_TIMES"."WORKFLOW_START_TIME") 10 - access("MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT">TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT"<="WORKFLOW_START_TIMES"."WORKFLOW_START_TIME"
)
你可能从以上执行计划中发现了两处十分陌生的字眼:UNION ALL PUSHED PREDICATE和VW_JF_SET$。它们是什么!?
先来说说JF,JF是join factorization的缩写,你可以把它翻译作链接因式分解,如果你学过离散数学或者数据库原理的话,那么这种在11.2.0.1中最新推出的基于成本的变换操作对你来说并不陌生。用公式的样式来表达大概是下面这样:
YYA,YYB和YYC是3个关联的数据对象亦或者是3个关联的结果集;
(YYA JOIN YYB) UNION [ALL] (YYA JOIN YYC)
可以转换成为:
YYA JOIN (YYB UNION [ALL] YYC)
这样做YYA部分只需要读取一次,还可以少做一次JOIN,听上去是挺不错的吧!
下面我们来看一个Oracle使用join factorization的十分简单的实例:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> drop table yya;
drop table yya
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> drop table yyb;
drop table yyb
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> create table yya as select rownum id1,rownum id2,rownum id3 from dual connect by level<=20000;
Table created.
SQL> create table yyb as select rownum id1,rownum id2,rownum id3 from dual connect by level<=20000;
Table created.
SQL> explain plan for
2 select * from yya ,yyb where yya.id1=yyb.id1
3 union all
4 select * from yya, yyb where yya.id1=yyb.id1;
Explained.
SQL> set linesize 100 pagesize 1400;
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 744914999
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40000 | 2500K| 49 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 40000 | 2500K| 49 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL | YYA | 20000 | 234K| 16 (0)| 00:00:01 |
| 3 | VIEW | VW_JF_SET$6E3F6682 | 40000 | 2031K| 32 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | TABLE ACCESS FULL| YYB | 20000 | 761K| 16 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| YYB | 20000 | 761K| 16 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("YYA"."ID1"="ITEM_1")
/*执行计划中出现了VW_JF_SET$F22B2A93,Oracle选择了使用join factorization,该执行计划总成本49*/
SQL> alter session set "_optimizer_join_factorization"=false;
Session altered.
/*隐藏参数_optimizer_join_factorization决定了优化器是否可以选用join factorization,现在我们禁用它*/
SQL> explain plan for
2 select * from yya join yyb on yya.id1=yyb.id1
3 union all
4 select * from yya join yyb on yya.id1=yyb.id1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3439541885
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40000 | 1992K| 66 (52)| 00:00:01 |
| 1 | UNION-ALL | | | | | |
|* 2 | HASH JOIN | | 20000 | 996K| 33 (4)| 00:00:01 |
| 3 | TABLE ACCESS FULL| YYA | 20000 | 234K| 16 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| YYB | 20000 | 761K| 16 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 20000 | 996K| 33 (4)| 00:00:01 |
| 6 | TABLE ACCESS FULL| YYA | 20000 | 234K| 16 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL| YYB | 20000 | 761K| 16 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("YYA"."ID1"="YYB"."ID1")
5 - access("YYA"."ID1"="YYB"."ID1")
/*禁用链接因式分解后,Oracle使用了常规的"笨办法",成本上升到66*/
/*有趣的是下面的测试*/
SQL> alter session set "_optimizer_join_factorization"=true;
Session altered.
SQL> create table yyc as select * from yyb;
Table created.
SQL> explain plan for
2 select * from yya,yyc where yya.id1=yyc.id1
3 union all
4 select * from yya,yyb where yya.id1=yyb.id1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4240055274
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40000 | 1992K| 66 (52)| 00:00:01 |
| 1 | UNION-ALL | | | | | |
|* 2 | HASH JOIN | | 20000 | 996K| 33 (4)| 00:00:01 |
| 3 | TABLE ACCESS FULL| YYA | 20000 | 234K| 16 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| YYC | 20000 | 761K| 16 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 20000 | 996K| 33 (4)| 00:00:01 |
| 6 | TABLE ACCESS FULL| YYA | 20000 | 234K| 16 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL| YYB | 20000 | 761K| 16 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("YYA"."ID1"="YYC"."ID1")
5 - access("YYA"."ID1"="YYB"."ID1")
/*confused,Oracle有什么理由在这里反而不用join factorization了呢?看起来短期内join factorization的实际应用还有待"商榷"
*/
/*10053事件能解释这一问题吗?*/
SQL> alter system flush shared_pool;
System altered.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever,level 1;
Statement processed.
SQL> explain plan for
2 select * from yya join yyb on yya.id1=yyb.id1
3 union all
4 select * from yya join yyc on yya.id1=yyc.id1;
Explained.
SQL> oradebug event 10053 trace name context off;
Statement processed.
SQL> oradebug tracefile_name;
/home/maclean/app/maclean/diag/rdbms/prod/PROD/trace/PROD_ora_7907.trc
view /home/maclean/app/maclean/diag/rdbms/prod/PROD/trace/PROD_ora_7907.trc
***********************************
Cost-Based Join Factorization
***********************************
Join-Factorization on query block SET$1 (#1)
JF: Using search type: exhaustive
JF: Generate basic transformation units
Validating JF unit: (branch: {2, 3} table: {YYA, YYA})
rejected: join predicates do not match
JF: Generate transformation units from basic units
JF: No state generated.
/*优化器认为其链接谓词不符合使用join factorization的条件,JF题案被驳回,"悬案"!*/
join factorization是很棒的新技术,这点没错,但新技术往往又是horrible(可怕的),最近我常用这个词。我们的问题是不是这个新来的引起的呢?通过join factorization关键字检索MOS,可以发现一个今年(2010)3月出现的Bug 9504322,quote:
Hdr: 9504322 11.2.0.1 RDBMS 11.2.0.1 QRY OPTIMIZER PRODID-5 PORTID-226
Abstract: WRONG RESULTS WITH UNION_ALL AND INLINE VIEWS
*** 03/24/10 05:38 am ***
PROBLEM:
--------
Wrong results on 11.2 for queries of type:
SELECT * FROM
(
SELECT ... FROM view, table WHERE ...
UNION ALL
SELECT ... FROM view, table WHERE NOT ...
);
DIAGNOSTIC ANALYSIS:
--------------------
Problem seen between 10.2.0.4 and 11.2.0.1.
If we remove the use of inline view the correct results are returned.
WORKAROUND:
-----------
N/A
RELATED BUGS:
-------------
REPRODUCIBILITY:
----------------
It is reproducing on generic 11.2.0.1
呵呵,似乎有点眉目了,不过实践是检验真理的唯一标准:
SQL> alter session set "_optimizer_join_factorization"=true;
Session altered.
SELECT MTL_SECONDARY_INVENTORIES.SECONDARY_INVENTORY_NAME,
MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
MTL_SECONDARY_INVENTORIES.DESCRIPTION,
MTL_SECONDARY_INVENTORIES.AVAILABILITY_TYPE,
MTL_SECONDARY_INVENTORIES.MATERIAL_ACCOUNT,
MTL_SECONDARY_INVENTORIES.MATERIAL_OVERHEAD_ACCOUNT,
MTL_SECONDARY_INVENTORIES.RESOURCE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.OVERHEAD_ACCOUNT,
MTL_SECONDARY_INVENTORIES.OUTSIDE_PROCESSING_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ASSET_INVENTORY,
MTL_SECONDARY_INVENTORIES.EXPENSE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ENCUMBRANCE_ACCOUNT,
MTL_SECONDARY_INVENTORIES.ATTRIBUTE3,
MTL_SECONDARY_INVENTORIES.ATTRIBUTE5,
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES,
REPEMEAERP.WORKFLOW_START_TIMES
WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/*以上是QUERY A*/
UNION ALL
/*以下是QUERY B*/
SELECT DISTINCT 'WORKORDERS',
MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
'WORK ORDERS WITH WIP AS CATEGORY VALUE',
1,
0,
0,
0,
0,
0,
1,
0,
0,
'MOI',
'0',
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES, EIMMAINT.WORKFLOW_START_TIMES
WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
WORKFLOW_START_TIMES.WORKFLOW_START_TIME
AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/
138 rows selected.
结果和我们猜想的大相径庭,join factorization并非罪魁,找不到终点让我们回到原点。
至此UNION ALL PUSHED PREDICATE有了极大的嫌疑,什么是PUSH PREDICATE?我把它叫做谓词前推,这玩样最早出现在10g上,但一直问题多多!它到底是何种OPERATION呢?让我们来看看下面的例子:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> create table youyus (t1 int,t2 varchar2(20));
Table created.
SQL> alter table youyus add primary key(t1);
Table altered.
SQL> explain plan for
2 select *
3 from youyus
4 union all
5 select * from youyus;
Explained.
/*在之后的语句中将用到这个子查询*/
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1959159425
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 50 | 4 (50)| 00:00:01 |
| 1 | UNION-ALL | | | | | |
| 2 | TABLE ACCESS FULL| YOUYUS | 1 | 25 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| YOUYUS | 1 | 25 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*在之后的语句中将用到这个子查询,这里它的"原始"执行计划十分简单*/
SQL> explain plan for
2 select v2.t1, v2.t2
3 from (select t1 from youyus where rownum=1) v1,
4 (select *
5 from youyus
6 union all
7 select * from youyus) v2
8 where v1.t1 = v2.t1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2456530141
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 1 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 27 | 1 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 1 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | INDEX FULL SCAN | SYS_C0010819 | 1 | 13 | 1 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 14 | 0 (0)| 00:00:01 |
| 6 | UNION ALL PUSHED PREDICATE | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| YOUYUS | 1 | 25 | 0 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | SYS_C0010819 | 1 | | 0 (0)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID| YOUYUS | 1 | 25 | 0 (0)| 00:00:01 |
|* 10 | INDEX UNIQUE SCAN | SYS_C0010819 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(ROWNUM=1)
8 - access("YOUYUS"."T1"="V1"."T1")
10 - access("YOUYUS"."T1"="V1"."T1")
/* PUSHED PREDICATE将谓词逻辑前推到UNION ALL的子查询中,其优势在于可以避免全表扫描,利用索引*/
SQL> set linesize 100 pagesize 1400;
SQL>
SQL> explain plan for
2 select /*+ no_push_pred(v2) */ v2.t1, v2.t2
3 from (select t1 from youyus where rownum=1) v1,
4 (select *
5 from youyus
6 union all
7 select * from youyus) v2
8 where v1.t1 = v2.t1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2769827061
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 6 (17)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 38 | 6 (17)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 1 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | INDEX FULL SCAN | SYS_C0010819 | 1 | 13 | 1 (0)| 00:00:01 |
| 5 | VIEW | | 2 | 50 | 4 (0)| 00:00:01 |
| 6 | UNION-ALL | | | | | |
| 7 | TABLE ACCESS FULL| YOUYUS | 1 | 25 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL| YOUYUS | 1 | 25 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("V1"."T1"="V2"."T1")
3 - filter(ROWNUM=1)
/*no_push_pred hint让Oracle 放弃使用PUSHED PREDICATE,使用常规UNION-ALL操作后,子查询执行计划回归成全表扫描,整个计划成本上升*/