zoukankan      html  css  js  c++  java
  • SQL调优:带函数的谓词导致CBO Cardinality计算误差

    今天处理了这样一问题,where条件中存在函数fun(date)<to_date('9999-01-01','YYYY-MM-DD')这样的无实际意义谓词,导致CBO计算基数时cardinality远小于实际情况,导致优化器认为2个源数据集的基数都不大,从而选择了HASH JOIN Right SEMI+SORT ORDER BY的执行计划,但是由于实际基数远大于computed 计算值所以变成了大的数据集做HASH JOIN并全数据排序,而实际该SQL只要求返回几十行数据而已,使用NESTED LOOP SEMI JOIN可以立即返回排序的前20行数据。 这里就需要解释带函数的谓词时CBO如何计算基数,我们通过下面的例子来说明:    
    create or replace function check_date( RDATE in date)  return date is 
    begin
    IF rdate< to_date('2099-01-01','YYYY-MM-DD') then   return rdate;   ELSIF  rdate >=to_date('2099-01-01','YYYY-MM-DD') then 
     return to_date('2000-01-01');
     end if;
     end check_date;
     /
    
     SQL> select check_date (sysdate) from dual;
    
    CHECK_DAT
    ---------
    06-DEC-12
    
    drop table tab1;
    
    SQL> create table tab1 tablespace users as select * from dba_objects where rownum create view vtab1 as select object_id as id , object_name as name, object_type as type , check_date(created) cdata from tab1;
    
    View created.
    
    SQL> select count(distinct cdata) from vtab1;
    
    COUNT(DISTINCTCDATA)
    --------------------
                     130
    
    SQL> exec dbms_stats.gather_table_stats('','TAB1', method_opt=>'FOR ALL COLUMNS SIZE 254');
    
    PL/SQL procedure successfully completed.
        因为我们指定收集了直方图所以若直接以"created"为条件查询时可以获得较好的计算基数
       SQL> select count(*) from tab1 where  created  >= to_date('0001-10-10','YYYY-MM-DD');
    
      COUNT(*)
    ----------
         10000
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1117438016
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |    40   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| TAB1 | 10000 | 80000 |    40   (0)| 00:00:01 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("CREATED">=TO_DATE(' 0001-10-10 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss'))
    
    Statistics
    ----------------------------------------------------------
              1  recursive calls
              0  db block gets
            133  consistent gets
              0  physical reads
              0  redo size
            526  bytes sent via SQL*Net to client
            523  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
        在以上查询中>= to_date('0001-10-10','YYYY-MM-DD'); 这样的过滤条件实际无意义,在直接使用 "created"列作为谓词的情况下,CBO可以获得很好的基数10000。    
    SQL> select * from TABLE(dbms_xplan.display_cursor(NULL,NULL,'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  6zy2k9dy4cv73, child number 0
    -------------------------------------
    select /*+ gather_plan_statistics */ count(*) from vtab1 where  cdata
    >= to_date('0001-10-10','YYYY-MM-DD')
    
    Plan hash value: 1117438016
    
    -------------------------------------------------------------------------------------
    | Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.25 |     154 |
    |   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.25 |     154 |
    |*  2 |   TABLE ACCESS FULL| TAB1 |      1 |    500 |  10000 |00:00:00.31 |     154 |
    -------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("CHECK_DATE"("CREATED")>=TO_DATE(' 0001-10-10 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss'))
    
    21 rows selected.
        通过gather_plan_statistics HINT,我们得到E-Rows 即CBO评估的基数,和 A-Rows实际的基数,可以看到这里E-Rows=500, 即在谓词左边存在使用内部函数或隐身装换的情况下,CBO无法通过现有统计信息的DISTINCT、DENSITY和HISTOGRAM获得较好的Cardinality,其基数总是统计信息中表的总行数/20,如上例中的 10000/20=500。 这就会引入不少的麻烦,因为开发人员有时候为了方便会在视图字段中嵌入自定义的函数,之后若在查询中使用该字段作为谓词条件,则可能导致CBO为相应表计算的基数偏少,是本身应当成本非常高的执行计划的COST变低,而容易被优化器选择。 对于上述问题可选的常见方案是若有这样问题的SQL较少则考虑加HINT或者SQL PROFILE,若较多还是需要考虑减少这种谓词左边有函数的现象。 implicit data_type conversion functions in Filter Predicates. Review Execution Plans. If Filter Predicatesinclude unexpected INTERNAL_FUNCTION to perform an implicit data_type conversion, be sure it is not preventing a column from being used as an Access Predicate.
  • 相关阅读:
    bzoj 2138: stone
    LOJ #6062. 「2017 山东一轮集训 Day2」Pair
    bzoj 5341: [Ctsc2018]暴力写挂
    UOJ #356. 【JOI2017春季合宿】Port Facility
    UOJ #357. 【JOI2017春季合宿】Sparklers
    UOJ #349. 【WC2018】即时战略
    bzoj 3600: 没有人的算术
    Codeforces 960G. Bandit Blues
    codeforces524E
    codeforces193B
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2968127.html
Copyright © 2011-2022 走看看