zoukankan      html  css  js  c++  java
  • 索引的Clustering Factor

    Clustering Factor:集群因子(聚簇因子)用来描述一个表中的列是否是规则排序的

    Clustering Factor的算法如下:我们知道可以通过dbms_rowid.rowid_block_number(rowid)找到记录对应的block号。索引中记录了rowid,因此oracle就可以根据索引中的rowid来判断记录是否是在同一个block中。举个例子,比如说索引中有a,b,c,d,e五个记录,首先比较a,b是否在同一个block,如果不在同一个block那么Clustering Factor +1,然后继续比较b,c同理,如果b,c不在同一个block,那么Clustering Factor+1,这样一直进行下去,直到比较了所有的记录。

        根据算法我们就可以知道clustering factor的值介于block数和表行数之间。如果clustering factor接近block数,说明表的存储和索引存储排序接近,也就是说表中的记录很有序,这样在做index range scan 的时候能,读取少量的data block就能得到我们想要的数据,代价比较小。如果clustering factor接近表记录数,说明表的存储和索引排序差异很大,在做index range scan的时候,会额外读取多个block,因为表记录分散,代价较高。

    下面来看一个例子:

    SQL> desc test;
     名称                                                                                                      是否为空? 类型
     ----------------------------------------------------------------------------------------------------------------- -------- ----------------

     OWNER                                                                                                              VARCHAR2(30)
     OBJECT_NAME                                                                                                        VARCHAR2(128)
     SUBOBJECT_NAME                                                                                                     VARCHAR2(30)
     OBJECT_ID                                                                                                 NOT NULL NUMBER
     DATA_OBJECT_ID                                                                                                     NUMBER
     OBJECT_TYPE                                                                                                        VARCHAR2(19)
     CREATED                                                                                                            DATE
     LAST_DDL_TIME                                                                                                      DATE
     TIMESTAMP                                                                                                          VARCHAR2(19)
     STATUS                                                                                                             VARCHAR2(7)
     TEMPORARY                                                                                                          VARCHAR2(1)
     GENERATED                                                                                                          VARCHAR2(1)
     SECONDARY                                                                                                          VARCHAR2(1)

    SQL> select index_name,clustering_factor from user_indexes where table_name='TEST';

    INDEX_NAME                     CLUSTERING_FACTOR
    ------------------------------ -----------------
    NAMETYPE                                   25007
    OBJID                                        730

    SQL> select index_name,column_name,column_position from user_ind_columns;

    INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
    ------------------------------ -------------------- ---------------
    OBJID                          OBJECT_ID                          1
    NAMETYPE                       OBJECT_TYPE                        2
    NAMETYPE                       OBJECT_NAME                        1

    SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) from test;

    COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
    ---------------------------------------------------
                                                    684

    SQL> select count(*) from test;

      COUNT(*)
    ----------
         49947

    可以看到test表一共有49949行,存储在684个block,索引OBJID建立在object_id列上,它的值与块数接近,说明表中的object_id很有可能是排过序的,我们查一下前10行验证下

    SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) from test where object_id<11;

    COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
    ---------------------------------------------------
                                                      1

    我们看到从object_id=1到object_id=10都在同一个block,说明object_id这列经过了适当排序,那么我们在对object_id这一列进行index range scan的时候就可以少读很多block,就能把结果检索出来。

    索引NAMETYPE建立在object_name,object_type上,并且以object_name列为主导列,它的clustering factor接近行数,说明object_name没有经过适当排序,是分散的,在进行index range scan的时候 会读取较多的block.

    现在我将OBJID索引反序

    SQL> alter index objid rebuild reverse;

    索引已更改。

    SQL> select index_name,clustering_factor from user_indexes;

    INDEX_NAME                     CLUSTERING_FACTOR
    ------------------------------ -----------------
    SYS_IL0000052108C00036$$
    OBJID                                      49945
    NAMETYPE                                   25007

    由于反序了索引,此时clustering factor必然很高,因为表中objid是有序的,而索引是反序的

    这里也得到了一点提示,如果索引发生了热点块(cache buffers chains)竞争,可以建立一个reverse索引,不过带来的却是index range scan的花销,需要权衡利弊,看看是否有其他解决方案。

  • 相关阅读:
    MVC模式在Java Web应用程序中的实例分析
    设计模式
    实现xxxxxxx系统六大质量属性战术
    阅读了《大型网站技术架构:核心原理与案例分析》,分析XX系统如何增加相应的功能,提高系统的可用性和易用性
    淘宝系统质量属性分析
    《架构漫谈》读后感
    软件架构师架构设计过程是什么?
    《架构之美》阅读笔记06
    《架构之美》阅读笔记05
    《架构之美》阅读笔记04
  • 原文地址:https://www.cnblogs.com/hehe520/p/6330653.html
Copyright © 2011-2022 走看看