zoukankan      html  css  js  c++  java
  • 【12c】带你了解Oracle 12c数据库扩展的统计信息(Extended Statistics)的搜集

    在Oracle数据库中,优化器欲产生最优的执行计划,需要依赖于相关对象的统计信息的搜集。数据库可以自动的搜集执行计划,也可以使用程序包DBMS_STATS进行手动搜集,本篇对常规统计信息的搜集不做说明,重点演示下关于扩展的统计信息(Extended Statistics)的的搜集。

    1 扩展的统计信息介绍

    在执行SQL时,where条件后的单个列的统计信息很容易去决定对应列的谓词的选择性,但是,如果where条件后包含同一张表的多个列作为谓词时,那么单个列的统计信息就不能显示出列之间的关系了,这时,优化器根据单列的统计信息得到的执行计划就可能不是最优的了。这个问题可以通过列组(column group)来解决,通过对列组进行统计信息的搜集,来反映一组列的相互关系,从而使优化器选择最优的执行计划,列组的统计信息称为扩展的统计信息。Oracle中,扩展的统计信息包含:

    • 列组统计信息: 当一张表的多个列同时出现在一条SQL语句时,这种类型的扩展统计信息可以改进基数估计
    • 表达式统计信息:当对谓词使用表达式时,这种类型的统计信息可以改进优化器评估。

    如图:展示了sh.customers表中两个列cust_state_province和contry_id列的统计信息,以及这两个列组成的列组的统计信息,列组的名称是由系统产生。

    2 使用常规统计信息

    1)实验环境

    SQL> select * from v$version;
    
    
    
    BANNER CON_ID
    
    -------------------------------------------------------------------------------- ----------
    
    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0
    
    PL/SQL Release 12.1.0.2.0 - Production 0
    
    CORE 12.1.0.2.0 Production 0
    
    TNS for Linux: Version 12.1.0.2.0 - Production 0
    
    NLSRTL Version 12.1.0.2.0 - Production 0
    

    2)构建测试数据

    SQL> create table sh.customers_new as select * from sh.customers;
    
    
    
    Table created.
    
    SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS_NEW',method_opt => 'for all columns size 1');
    
    
    
    PL/SQL procedure successfully completed.

    3)查询列统计信息

    SQL> select column_name,num_distinct,histogram
    
    from dba_tab_col_statistics
    
    where owner='SH' and table_name='CUSTOMERS_NEW' and column_name in('CUST_STATE_PROVINCE','COUNTRY_ID');
    
    
    
    COLUMN_NAME NUM_DISTINCT HISTOGRAM
    
    ------------------------------ ------------ ---------------
    
    CUST_STATE_PROVINCE 145 NONE
    
    COUNTRY_ID 19 NONE

    3)查询表的总数和满足cust_state_province='CA'的数据量

    SQL> select count(1) from sh.customers_new;
    
    
    
    COUNT(1)
    
    ----------
    
    55500
    
    
    
    SQL> select count(1) from sh.customers_new where cust_state_province='CA';
    
    
    
    COUNT(1)
    
    ----------
    
    3341

    4)查看单列对应的执行计划

    SQL> explain plan for
    
    2 select * from sh.customers_new where cust_state_province='CA';
    
    
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display);
    
    
    
    PLAN_TABLE_OUTPUT
    
    -------------------------------------------------------------------------------------------------------------------------------------
    
    Plan hash value: 3410015392
    
    
    
    -----------------------------------------------------------------------------------
    
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    
    -----------------------------------------------------------------------------------
    
    | 0 | SELECT STATEMENT | | 383 | 72387 | 423 (1)| 00:00:01 |
    
    |* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 383 | 72387 | 423 (1)| 00:00:01 |
    
    -----------------------------------------------------------------------------------
    
    
    
    Predicate Information (identified by operation id):
    
    ---------------------------------------------------
    
    
    
    PLAN_TABLE_OUTPUT
    
    -------------------------------------------------------------------------------------------------------------------------------------
    
    
    
    1 - filter("CUST_STATE_PROVINCE"='CA')
    
    
    
    13 rows selected.

    看下面的计算,和优化器预估的行数相匹配:

    SQL> select 55500/145 from dual;

    55500/145

    ----------

    382.758621

    5)查看组合条件对应的执行计划

    SQL> explain plan for
    
    2 select * from sh.customers_new where cust_state_province='CA' and country_id='52790';
    
    
    
    Explained.
    
    
    
    SQL> select * from table(dbms_xplan.display);
    
    
    
    PLAN_TABLE_OUTPUT
    
    -------------------------------------------------------------------------------------------------------------------------------------
    
    Plan hash value: 3410015392
    
    
    
    -----------------------------------------------------------------------------------
    
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    
    -----------------------------------------------------------------------------------
    
    | 0 | SELECT STATEMENT | | 20 | 3780 | 423 (1)| 00:00:01 |
    
    |* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 20 | 3780 | 423 (1)| 00:00:01 |
    
    -----------------------------------------------------------------------------------
    
    
    
    Predicate Information (identified by operation id):
    
    ---------------------------------------------------
    
    
    
    PLAN_TABLE_OUTPUT
    
    -------------------------------------------------------------------------------------------------------------------------------------
    
    
    
    1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)
    
    
    
    13 rows selected.

    结果显示,优化器预估的行数为20行,和下面的计算相同,而实际的行数却为3341行,为何会这样呢?

    SQL> select 55500/145/19 from dual;

    55500/145/19

    ------------

    20.1451906

    首先,优化器不知道这两列的关系,不知道CA在国家52790,从而大大低估了返回的数据行;

    3 使用列组统计信息

    通过上面的实验,可以得知,如果列之间有一定的相互关系,且where条件出现组合条件时,优化器不能准确的评估返回的行数,那么对于这种情况,可以对组合列进行统计信息的搜集。

    1)搜集组合列统计信息

    begin
      dbms_stats.gather_table_stats('SH',
                                    'CUSTOMERS_NEW',
                                    method_opt => 'for all columns size 1,for columns (cust_state_province,country_id) size skewonly');
    end;
     /
    
    
    
    PL/SQL procedure successfully completed.

    2)查看组合条件对应的执行计划

    explain plan for
      select * from sh.customers_new where cust_state_province='CA' and country_id='52790';
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    -------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 3410015392
    
    -----------------------------------------------------------------------------------
    | Id  | Operation	  | Name	  | Rows  | Bytes | Cost (%CPU)| Time	  |
    -----------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |		  |  3341 |   655K|   423   (1)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| CUSTOMERS_NEW |  3341 |   655K|   423   (1)| 00:00:01 |
    -----------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    PLAN_TABLE_OUTPUT
    -------------------------------------------------------------------------------------------------------------------------------------
    
       1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)
    
    13 rows selected.

    可以看出预估的行数为3341,而实际的行数为3341,和真实结果一样。

    SQL> select count(1) from sh.customers_new where cust_state_province='CA' and country_id='52790';

    COUNT(1)

    ----------

    3341

    3)删除扩展统计信息

    begin
    
    dbms_stats.drop_extended_stats('sh',
    
    'CUSTOMERS_NEW',
    
    '(cust_state_province, country_id)');
    
    end;
    
    /
    
    
    
    PL/SQL procedure successfully completed.

    以上,就是对扩展统计信息中列组统计信息的演示。

  • 相关阅读:
    滑动窗口法学习
    209. Minimum Size Subarray Sum
    485. Max Consecutive Ones
    27. Remove Element
    167. Two Sum II
    561. Array Partition I
    344. Reverse String
    14. 最长公共前缀
    layui上传文件时出现 请求上传接口出错
    Linux-5.13将初步支持苹果M1 Soc
  • 原文地址:https://www.cnblogs.com/alen-liu-sz/p/12975555.html
Copyright © 2011-2022 走看看