今天处理了这样一问题,where条件中存在函数fun(date)<to_date('9999-01-01','YYYY-MM-DD')这样的无实际意义谓词,导致CBO计算基数时cardinality远小于实际情况,导致优化器认为2个源数据集的基数都不大,从而选择了HASH JOIN Right SEMI+SORT ORDER BY的执行计划,但是由于实际基数远大于computed 计算值所以变成了大的数据集做HASH JOIN并全数据排序,而实际该SQL只要求返回几十行数据而已,使用NESTED LOOP SEMI JOIN可以立即返回排序的前20行数据。
这里就需要解释带函数的谓词时CBO如何计算基数,我们通过下面的例子来说明:
create or replace function check_date( RDATE in date) return date is
begin
IF rdate< to_date('2099-01-01','YYYY-MM-DD') then return rdate; ELSIF rdate >=to_date('2099-01-01','YYYY-MM-DD') then
return to_date('2000-01-01');
end if;
end check_date;
/
SQL> select check_date (sysdate) from dual;
CHECK_DAT
---------
06-DEC-12
drop table tab1;
SQL> create table tab1 tablespace users as select * from dba_objects where rownum create view vtab1 as select object_id as id , object_name as name, object_type as type , check_date(created) cdata from tab1;
View created.
SQL> select count(distinct cdata) from vtab1;
COUNT(DISTINCTCDATA)
--------------------
130
SQL> exec dbms_stats.gather_table_stats('','TAB1', method_opt=>'FOR ALL COLUMNS SIZE 254');
PL/SQL procedure successfully completed.
因为我们指定收集了直方图所以若直接以"created"为条件查询时可以获得较好的计算基数
SQL> select count(*) from tab1 where created >= to_date('0001-10-10','YYYY-MM-DD');
COUNT(*)
----------
10000
Execution Plan
----------------------------------------------------------
Plan hash value: 1117438016
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 40 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| TAB1 | 10000 | 80000 | 40 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CREATED">=TO_DATE(' 0001-10-10 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
133 consistent gets
0 physical reads
0 redo size
526 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
在以上查询中>= to_date('0001-10-10','YYYY-MM-DD'); 这样的过滤条件实际无意义,在直接使用 "created"列作为谓词的情况下,CBO可以获得很好的基数10000。
SQL> select * from TABLE(dbms_xplan.display_cursor(NULL,NULL,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zy2k9dy4cv73, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from vtab1 where cdata
>= to_date('0001-10-10','YYYY-MM-DD')
Plan hash value: 1117438016
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.25 | 154 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.25 | 154 |
|* 2 | TABLE ACCESS FULL| TAB1 | 1 | 500 | 10000 |00:00:00.31 | 154 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CHECK_DATE"("CREATED")>=TO_DATE(' 0001-10-10 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
21 rows selected.
通过gather_plan_statistics HINT,我们得到E-Rows 即CBO评估的基数,和 A-Rows实际的基数,可以看到这里E-Rows=500, 即在谓词左边存在使用内部函数或隐身装换的情况下,CBO无法通过现有统计信息的DISTINCT、DENSITY和HISTOGRAM获得较好的Cardinality,其基数总是统计信息中表的总行数/20,如上例中的 10000/20=500。
这就会引入不少的麻烦,因为开发人员有时候为了方便会在视图字段中嵌入自定义的函数,之后若在查询中使用该字段作为谓词条件,则可能导致CBO为相应表计算的基数偏少,是本身应当成本非常高的执行计划的COST变低,而容易被优化器选择。
对于上述问题可选的常见方案是若有这样问题的SQL较少则考虑加HINT或者SQL PROFILE,若较多还是需要考虑减少这种谓词左边有函数的现象。
implicit data_type conversion functions in Filter Predicates. Review Execution Plans.
If Filter Predicatesinclude unexpected INTERNAL_FUNCTION to perform an implicit data_type conversion,
be sure it is not preventing a column from being used as an Access Predicate.