zoukankan      html  css  js  c++  java
  • 数据压缩:自动评估

    在前一篇博文数据压缩简要的基础上,我希望把数据压缩评估自动化。于是有了这篇博文。

    白皮书推荐对符合如下条件的大型表和索引使用页压缩:

    • 表或索引的扫描操作占到所有操作的75%及以上
    • 表或索引的更新操作占到所有操作的20%及以下

    注意,这是白皮书中的结论和建议,只做参考,最为最佳实践的考虑点之一。

    此脚本的原始作者是Louis Li。但是它的脚本有一些限制,我在这此基础上做了修改:

    • 辅助表由用户表改成临时表
    • 只分析页数大于1000的分区
    • 判断范围扩大到所有的表和索引,而不只是堆和聚集索引
    • 判断粒度改成分区级别。
    • 增加各分区使用空间的统计
    • 修改生成语句,增加提高性能的选项: MAXDOP=8,SORT_IN_TEMPDB=ON
    • 修改过滤条件。原来只分析Scan大于75%的分区,这样流水日志类型的表(S~=0%,U~=0%)会被过滤掉。改成(S>75%或者Update<20%)的。

    下面脚本会找出符合以下条件的对象并生成相应的压缩数据脚本。

    1. 扫描当前数据库的所有索引,找出同时符合下面条件的索引:

    • 索引的页数超过1000
    • 索引的SELECT操作在所有操作中的占比高于75%或者索引的UPDATE操作在所有操作中的占比小于20%

    注意此处的粒度是基于分区的。所以如果表和索行,做了分区会在分区级别上做出判断。

    2. 对于被上一步找出的索引,分别评估页和行压缩能节省的空间(用百分比表示)。

    3. 对比行和页压缩的数据,进行推荐。对于没有UPDATE操作或者页压缩节省的空间比行压缩多10%,则推荐页压缩。其余索引都推荐行压缩。

    4. 脚本的结果分为两部分,第一部分是推荐的压缩的索引,第二部分是推荐压缩的方式和相应脚本。

    --Collect all index stats
    
    if object_id('tempdb..#index_estimates') is not null
    
      drop table #index_estimates
    
    go
    
    create table #index_estimates
    
    (
    
        database_name sysname not null,
    
        [schema_name] sysname not null,
    
        table_name sysname not null,
    
        index_id int not null,
    
        partition_number int not null,
    
        update_pct decimal(5,2) not null,
    
        select_pct decimal(5,2) not null,
    
        used_size_kb int not null,
    
        constraint pk_index_estimates primary key (database_name,[schema_name],table_name,index_id,partition_number)
    
    )
    
    ;
    
    go
    
    insert into #index_estimates
    
    select
    
        db_name() as database_name,
    
        schema_name(t.schema_id) as [schema_name],
    
        t.name,
    
        i.index_id,
    
        p.partition_number,
    
        i.leaf_update_count * 100.0 / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) as UpdatePct,
    
        i.range_scan_count * 100.0 / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) as SelectPct
    
        ,p.used_page_count*8 as used_size_kb
    
    from 
    
        sys.dm_db_index_operational_stats(db_id(),null,null,null) i
    
        inner join sys.tables t on i.object_id = t.object_id
    
        inner join sys.dm_db_partition_stats p 
    
        on i.object_id = p.object_id and i.index_id=p.index_id and i.partition_number=p.partition_number
    
    where
    
        i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count > 0
    
        and p.used_page_count >= 1000  -- only consider tables contain more than 1000 pages
    
        --and i.index_id<2 --only consider heap and clustered index
    
        and 
    
        (
    
        (i.range_scan_count / (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count) > .75 
    
        or 
    
        (i.range_scan_count/ (i.leaf_delete_count + i.leaf_insert_count + i.leaf_update_count + i.range_scan_count + i.singleton_lookup_count + i.leaf_page_merge_count))< .2
    
        ))
    
    order by
    
        t.name,
    
        i.index_id
    
    go
    
    --show data compression candidates
    
    select * from #index_estimates;
    
    --Prepare 2 intermediate tables for row compression and page compression estimates
    
    if OBJECT_ID('tempdb..#page_compression_estimates') is not null 
    
      drop table #page_compression_estimates;
    
    go
    
    create table #page_compression_estimates
    
    ([object_name] sysname not null,
    
    [schema_name] sysname not null,
    
    index_id int not null,
    
    partition_number int not null,
    
    [size_with_current_compression_setting(KB)] bigint not null,
    
    [size_with_requested_compression_setting(KB)] bigint not null,
    
    [sample_size_with_current_compression_setting(KB)] bigint not null,
    
    [sample_size_with_requested_compression_setting(KB)] bigint not null,
    
    constraint pk_page_compression_estimates primary key ([object_name],[schema_name],index_id,partition_number)
    
    );
    
    go
    
    if OBJECT_ID('tempdb..#row_compression_estimates') is not null 
    
       drop table #row_compression_estimates;
    
    go
    
    create table #row_compression_estimates
    
    ([object_name] sysname not null,
    
    [schema_name] sysname not null,
    
    index_id int not null,
    
    partition_number int not null,
    
    [size_with_current_compression_setting(KB)] bigint not null,
    
    [size_with_requested_compression_setting(KB)] bigint not null,
    
    [sample_size_with_current_compression_setting(KB)] bigint not null,
    
    [sample_size_with_requested_compression_setting(KB)] bigint not null,
    
    constraint pk_row_compression_estimates primary key ([object_name],[schema_name],index_id,partition_number)
    
    );
    
    go
    
    --Use cursor and dynamic sql to get estimates  9:18 on my laptop
    
    declare @script_template nvarchar(max) = 'insert ###compression_mode##_compression_estimates exec sp_estimate_data_compression_savings ''##schema_name##'',''##table_name##'',##index_id##,##partition_number##,''##compression_mode##''';
    
    declare @executable_script nvarchar(max);
    
    declare @schema sysname, @table sysname, @index_id smallint ,@partition_number smallint,@compression_mode nvarchar(20);
    
    declare cur cursor fast_forward for 
    
    select
    
        i.[schema_name],
    
        i.[table_name],
    
        i.index_id,
    
        i.partition_number,
    
        em.estimate_mode
    
    from
    
        #index_estimates i cross join (values('row'),('page')) AS em(estimate_mode)
    
    group by
    
        i.[schema_name],
    
        i.[table_name],
    
        em.estimate_mode,
    
        i.index_id,
    
        i.partition_number;
    
    open cur;
    
    fetch next from cur into @schema, @table,@index_id,@partition_number, @compression_mode;
    
    while (@@FETCH_STATUS=0)
    
    begin
    
        set @executable_script = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@script_template,'##schema_name##',@schema),'##table_name##',@table),'##compression_mode##',@compression_mode),'##index_id##',@index_id),'##partition_number##',@partition_number);
    
        print @executable_script;
    
        exec(@executable_script);
    
        fetch next from cur into @schema,@table,@index_id,@partition_number, @compression_mode;
    
    end
    
    close cur;
    
    deallocate cur;
    
    --Show estimates and proposed data compression 
    
    with all_estimates as (
    
    select
    
        '[' + i.schema_name + '].[' + i.table_name + ']' as table_name,
    
        case 
    
            when i.index_id > 0 then '[' + idx.name + ']'
    
            else null
    
        end as index_name,
    
        i.partition_number,
    
        i.select_pct,
    
        i.update_pct,
    
        case 
    
            when r.[size_with_current_compression_setting(KB)] > 0 then 
    
                100  - r.[size_with_requested_compression_setting(KB)] * 100.0 / r.[size_with_current_compression_setting(KB)] 
    
            else
    
                0.0
    
        end as row_compression_saving_pct,
    
        case 
    
            when p.[size_with_current_compression_setting(KB)] > 0 then
    
                100  - p.[size_with_requested_compression_setting(KB)] * 100.0 / p.[size_with_current_compression_setting(KB)] 
    
            else    
    
                0.0
    
        end as page_compression_saving_pct,
    
        (case when ps.name is null then 0 else 1 end)  as is_partitioned
    
    from
    
        #index_estimates i
    
        inner join #row_compression_estimates r on i.schema_name = r.schema_name and i.table_name = r.object_name and i.index_id = r.index_id
    
        inner join #page_compression_estimates p on i.schema_name = p.schema_name and i.table_name = p.object_name and i.index_id = p.index_id
    
        inner join sys.indexes idx on i.index_id = idx.index_id and object_name(idx.object_id) = i.table_name
    
        left  join sys.partition_schemes ps on idx.data_space_id=ps.data_space_id
    
    ), 
    
    recommend_compression as (
    
    select
    
        table_name,
    
        index_name,
    
        select_pct,
    
        update_pct,
    
        row_compression_saving_pct,
    
        page_compression_saving_pct,
    
        partition_number,
    
        is_partitioned,
    
        case 
    
            when update_pct = 0 then 'Page'
    
            when update_pct >= 20 then 'Row'
    
            when update_pct > 0 and update_pct < 20 and page_compression_saving_pct - row_compression_saving_pct < 10 then 'Row'
    
            else 'Page'
    
        end as recommended_data_compression
    
    from
    
        all_estimates
    
    where
    
        row_compression_saving_pct > 0
    
        and page_compression_saving_pct > 0
    
    )
    
    select
    
        table_name,
    
        index_name,
    
        select_pct,
    
        update_pct,
    
        cast(row_compression_saving_pct as decimal(5,2)) as row_compression_saving_pct,
    
        cast(page_compression_saving_pct as decimal(5,2)) as page_compression_saving_pct,
    
        recommended_data_compression,
    
        case 
    
            when index_name is null and is_partitioned =0 then
    
                'ALTER TABLE ' + table_name + ' REBUILD WITH  ( data_compression = ' + recommended_data_compression + ',MAXDOP=8)' 
    
            when index_name is null and is_partitioned =1 then
    
                'ALTER TABLE ' + table_name + ' REBUILD PARTITION='+CAST(partition_number AS VARCHAR(2))+' WITH  ( data_compression = ' + recommended_data_compression + ',MAXDOP=8)' 
    
            when index_name is not null and is_partitioned =0 then
    
                'ALTER INDEX ' + index_name + ' ON ' + table_name + ' REBUILD WITH  (data_compression = ' + recommended_data_compression + ',MAXDOP=8,SORT_IN_TEMPDB=ON)' 
    
            when index_name is not null and is_partitioned =1 then 
    
                'ALTER INDEX ' + index_name + ' ON ' + table_name + ' REBUILD PARTITION='+CAST(partition_number AS VARCHAR(2))+' WITH  ( data_compression = ' + recommended_data_compression + ',MAXDOP=8,SORT_IN_TEMPDB=ON)'   
    
        end collate database_default as [statement] 
    
    from
    
        recommend_compression
    
    order by
    
        table_name
    
    --Clean up
    
    drop table #index_estimates;
    
    drop table #page_compression_estimates;
    
    drop table #row_compression_estimates;
    
    GO
    Evaluate Data Compression

    注意:

    这个脚本的分析时长由要分析对象的数量和数据量决定。可能你会发现,这个跟在SSMS中的Storage-Compression中评估值有一些差别。两种方式都使用的是sp_estimate_data_compression_savings,但是SSMS中不会指定@index_id参数,所以它评估的表中或者分区中所有对象的总合,这对于多个索引的表是非常不准确的。

    总结:

    1. 此脚本,我在很多生产环境中已经使用,均表现正常。但是如果你使用此脚本,请认真评估生成的推荐结果后再使用。

    2. 数据压缩还会跟复制,AlwaysOn,列存储等相互影响,这又是另一个故事了。

    3. 数据压缩不会压缩行外的LOB数据。如果要压缩只能在程序端压缩,或者使用FileStream+压缩卷。SQL Server 2016提供了新的函数COMPRESS/DECOMPRESS来压缩单个数据,但不是用来解决行外LOB压缩问题的。

  • 相关阅读:
    IE6中overflow:hidden失效怎么办
    单例模式笔记
    linux 中的 "2>&1"含义
    linux 文件目录介绍
    centos 安装jdk
    SimpleDateFormat非线程安全
    Linux下Weblogic 11g R1安装和配置
    <meta>标签 的一些用法
    基于java的邮件群发软件
    史上最完整的集合类总结及hashMap遍历
  • 原文地址:https://www.cnblogs.com/Joe-T/p/5670514.html
Copyright © 2011-2022 走看看