zoukankan      html  css  js  c++  java
  • 聚簇索引对数据插入的影响

    聚簇索引对数据插入的影响

    背景

    开发人员反馈系统执行某存储过程特别慢,经排查是由于存储过程执行过程中需要向新建的任务表插入大量数据,该任务表的主键是聚簇索引造成的。聚簇索引为什么会导致插入慢呢?聚簇索引会对数据插入造成多大影响呢?

    原理

    • 在非聚簇索引中,物理数据的存储顺序与索引不同,索引的最低级别包含指向数据页上的行的指针。

    这里写图片描述

    • 在聚簇索引中,物理数据的存储顺序与索引相同,索引的最低级别包含实际的数据页。
      聚簇索引导致数据记录必须按照键大小顺序存储,插入和删除须进行移动数据记录,导致额外的磁盘IO。

    这里写图片描述

    测试

    一、基本环境信息

    • 查看操作系统版本
        [root@npfydev01 home]# lsb_release -a
        LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
        Distributor ID: CentOS
        Description: CentOS release 6.4 (Final)
        Release: 6.4
        Codename: Final
    • 查看磁盘信息
        [root@npfydev01 home]# cat /proc/scsi/scsi
        Attached devices:
        Host: scsi0 Channel: 02 Id: 00 Lun: 00
        Vendor: IBM Model: ServeRAID M5110 Rev: 3.24
        Type: Direct-Access ANSI SCSI revision: 05
        Host: scsi1 Channel: 00 Id: 00 Lun: 00
        Vendor: IBM SATA Model: DEVICE 81Y3674 Rev: IB01
        Type: CD-ROM ANSI SCSI revision: 05
    • 查看磁盘读写速度
        [root@npfydev01 home]# time dd if=/dev/zero of=/home/4kb.1GBFILE bs=4k count=262144
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 1.58541 s, 677 MB/s
        real 0m1.589s
        user 0m0.050s
        sys 0m1.533s
    • 查看数据库版本
        1> select @@version
        2> go
        --------------------------------------------------------------------------------------
        Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110 /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov 8 05:39:38 2013
        (1 row affected)

    二、数据准备

    • 建立聚簇索引表(sybase主键默认为聚簇索引)
        USE DB_TASK
        GO
        CREATE TABLE T_TASKITEM_CI (
        C_BH char(32) primary key,
        C_BH_TASK char(32) null,
        C_BH_AJ varchar(32) null,
        N_AJBS numeric(15,0) null,
        C_AJLB varchar(6) null,
        N_JBFY int null,
        N_ZT int null,
        C_AH varchar(75) null
        )
        go
    • 建立非聚簇索引表
        USE DB_TASK
        go
        CREATE TABLE T_TASKITEM_NCI (
        C_BH char(32) NOT NULL,
        C_BH_TASK char(32) null,
        C_BH_AJ varchar(32) null,
        N_AJBS numeric(15,0) null,
        C_AJLB varchar(6) null,
        N_JBFY int null,
        N_ZT int null,
        C_AH varchar(75) null
        )
        go
        CREATE UNIQUE INDEX PK_TASKITEM ON DB_TASK.dbo.T_TASKITEM_NCI (C_BH)
        go
    • 构造数据
      构造一张同构的数据表T_TASKITEM_CC,使用如下SQL向该张表构造50W左右数据。
        SELECT newid ()
        , a.C_BH
        , 1 AS N_ZT
        , a.N_AJBS
        , a.N_JBFY
        , '5813b6d7ce8847d68b34daa956776659' AS C_BH_TASK
        , (CASE WHEN (a.N_YWLX = 20100) THEN '0201' WHEN (a.N_YWLX = 20200) THEN '0202' WHEN (a.N_YWLX = 20304) THEN '0207' WHEN (a.N_YWLX = 20501) THEN '0210' WHEN (a.N_YWLX = 20801) THEN '0224' WHEN (a.N_YWLX = 20601) THEN '0214' WHEN (a.N_YWLX = 20603) THEN '0216' WHEN (a.N_YWLX = 20602) THEN '0215' END) AS C_AJLB
        , a.C_AH
        FROM YWST..T_XS_AJ a

    数据量为 501132 条

    三、插入对比

    • 非聚簇索引表
        1> insert into T_TASKITEM_NCI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC
        2> GO
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=2025588 apf=0 total=2025588), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Total writes for this command: 3538
        Execution Time 97.
        Adaptive Server cpu time: 9688 ms. Adaptive Server elapsed time: 13381 ms.
        (501132 rows affected)
    • 聚簇索引表
          1> insert into T_TASKITEM_CI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC
          2> GO
          Parse and Compile Time 0.
          Adaptive Server cpu time: 0 ms.
          Parse and Compile Time 0.
          Adaptive Server cpu time: 0 ms.
          Table: T_TASKITEM_CI scan count 0, logical reads: (regular=6422447 apf=0 total=6422447), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
          Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
          Total writes for this command: 11945
          Execution Time 176.
          Adaptive Server cpu time: 17350 ms. Adaptive Server elapsed time: 28206 ms.
          (501132 rows affected)
    类别聚簇索引非聚簇索引
    写入 11945 3538
    读入 6422447 2025588
    执行时间 28206 ms 13381 ms

    结论:插入同样的数据量,非聚簇索引表比聚簇索引表时间上快一倍,IO减小2/3。

    四、删除对比

    • 构造删除数据

    按照索引字段C_BH排序,获取物理位置为于100行、200行…5000行的C_BH,将要删除的编号分别存储在T_DELETE_CI_BH 和 T_DELETE_NCI_BH表中。

        select C_BH,N_ORDER = identity(10) INTO T_ALL_CI_BH FROM T_TASKITEM_CI ORDER BY C_BH asc
        SELECT C_BH,N_ORDER INTO T_DELETE_CI_BH FROM T_ALL_CI_BH WHERE N_ORDER%100 = 0
        select C_BH,N_ORDER = identity(10) INTO T_ALL_NCI_BH FROM T_TASKITEM_NCI ORDER BY C_BH asc
        SELECT C_BH,N_ORDER INTO T_DELETE_NCI_BH FROM T_ALL_NCI_BH WHERE N_ORDER%100 = 0
    • 聚簇索引表执行删除
        1> DELETE FROM T_TASKITEM_CI where C_BH IN (SELECT C_BH FROM T_DELETE_CI_BH)
        2> go
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Table: T_TASKITEM_CI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_DELETE_CI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_TASKITEM_CI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Total writes for this command: 241
        Execution Time 1.
        Adaptive Server cpu time: 128 ms. Adaptive Server elapsed time: 379 ms.
        (5001 rows affected)
    • 非聚簇索引表执行删除
        1> DELETE FROM T_TASKITEM_NCI where C_BH IN (SELECT C_BH FROM T_DELETE_NCI_BH)
        2> go
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Parse and Compile Time 0.
        Adaptive Server cpu time: 0 ms.
        Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_DELETE_NCI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Table: T_TASKITEM_NCI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
        Total writes for this command: 242
        Execution Time 1.
        Adaptive Server cpu time: 128 ms. Adaptive Server elapsed time: 403 ms.
        (5001 rows affected)

    结论:按照索引字段删除,聚簇索引和非聚簇索引IO和效率一样。

    排查聚簇索引

    聚簇索引表插入无序主键(GUID/UUID)数据时会造成额外的磁盘IO和时间消耗,采用无序主键(GUID/UUID)的项目设计上是禁止使用聚簇索引,那么如何排查项目中非法使用聚簇索引的表呢?使用dba团队出品的sp_dba_citable存储过程检索
    核心代码:

        use sybsystemprocs
        GO
        if object_id('sp_dba_citable') is not null
        drop procedure sp_dba_citable
        GO
        create procedure sp_dba_citable
        AS
        --查看聚簇索引表
        --add by wangzhen 2017-07-17
        begin
        declare @temp_sql varchar(500)
        declare @sql varchar(1000)
        declare @dbname varchar(100)
        declare dbname_cursor cursor for select name from master..sysdatabases
        create table #objectinfo (
        dbname varchar(100),
        objid int,
        tablename varchar(300),
        indexid int,
        indexname varchar(300),
        keycnt int,
        indextype varchar(100)
        )
        set @temp_sql = 'insert into #objectinfo '
            + 'select ''@dbname#'' , '
            + '  obj.id , '
            + '  obj.name , '
            + '  ind.indid , '
            + '  ind.name , '
            + '  ind.keycnt , '
            + '  ''culster index'' '
            +' from @dbname#..sysindexes ind left join @dbname#..sysobjects obj on ind.id = obj.id '
            +' where (ind.status2 & 512 = 512 or ind.indid = 1) and obj.type = ''U'' '
        open dbname_cursor
        while @@sqlstatus =0
        BEGIN
          FETCH dbname_cursor into @dbname
          set @sql = str_replace(@temp_sql,'@dbname#',@dbname)
          EXECUTE(@sql)
        END
        close dbname_cursor
        select
            t.dbname as "库名",
            t.objid as "对象ID",
            t.tablename as "表名",
            t.indexname as "索引名"
        from #objectinfo t where t.dbname not in ('master','tempdb','sybsecurity','sybsystemdb','sybsystemprocs') group by t.dbname,t.objid,t.tablename,t.indexname,t.keycnt,t.indextype order by t.dbname asc,t.tablename asc
        end
        go

    总结

    在聚簇索引中,物理数据的存储顺序与索引相同,索引的最低级别包含实际的数据页,在无序字段上(GUID/UUID)上使用聚簇索引插入大量数据会比非聚簇索引时间慢一倍,IO高三倍。其实,NP在设计之初已经规定业务表中不能定义物理主键(含聚簇索引),而应该定义逻辑主键(唯一约束+索引+不为空)。对于采用无序主键(GUID/UUID)的项目可以使用sp_dba_citable排查聚簇索引表!

  • 相关阅读:
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    Python成长笔记
    解决Jenkins生成测试报告的问题
  • 原文地址:https://www.cnblogs.com/wangzhen3798/p/7286508.html
Copyright © 2011-2022 走看看