zoukankan      html  css  js  c++  java
  • BIT类型是否适合建索引

     
    Now, the question is, do you really WANT an index on a BIT column? We're going to run some experiments, but in general, it is highly unlikely that you will get much use out of such an index. The exception is when the data is heavily weighted towards, say, 1 (e.g. 95-99% of the table), and you are searching for 0. Or vice-versa. 
     
    What I'd like to test is the effect on execution time and the execution plan if you run different queries against large tables with a BIT column whose values are evenly distributed, or weighted heavily (in this case 97% - 3%), and in both cases, compare clustered to nonclustered to no index. 
     
    So, let's create six different tables in their own database: 
     

    CREATE DATABASE Splunge 
    GO 
     
    USE Splunge 
    GO 
     
    -- 50/50, no index 
    CREATE TABLE dbo.Test1 

        myBit BIT NOT NULL 

     
    -- 50/50, nonclustered index 
    CREATE TABLE dbo.Test2 

        myBit BIT NOT NULL 

     
    CREATE INDEX bitIndex ON dbo.Test2(myBit) 
     
    -- 50/50, clustered index 
    CREATE TABLE dbo.Test3 

        myBit BIT NOT NULL 

     
    CREATE CLUSTERED INDEX bitIndex ON dbo.Test3(myBit) 
     
    -- 97/3, no index 
    CREATE TABLE dbo.Test4 

        myBit BIT NOT NULL 

     
    -- 97/3, nonclustered index 
    CREATE TABLE dbo.Test5 

        myBit BIT NOT NULL 

     
    CREATE INDEX bitIndex ON dbo.Test5(myBit) 
     
    -- 97/3, clustered index 
    CREATE TABLE dbo.Test6 

        myBit BIT NOT NULL 

     
    CREATE CLUSTERED INDEX bitIndex ON dbo.Test6(myBit)

     
    And let's populate each with 100,000 rows, the first three will have 50,000 of each value (0,1), and the second three will have 97,000 0's and 3,000 1's. 
     

    DECLARE 
        @i INT, 
        @ff BIT, -- 50/50 flag 
        @nf BIT -- 97/3 flag 
     
    SELECT 
        @i = 1, 
        @ff = 0, 
        @nf = 0 
     
    WHILE @i <= 100000 
    BEGIN 
        IF @i > 50000 
            SET @ff = 1 
     
        IF @i > 97000 
            SET @nf = 1 
     
        INSERT dbo.Test1(myBit) SELECT @ff 
        INSERT dbo.Test2(myBit) SELECT @ff 
        INSERT dbo.Test3(myBit) SELECT @ff 
     
        INSERT dbo.Test4(myBit) SELECT @nf 
        INSERT dbo.Test5(myBit) SELECT @nf 
        INSERT dbo.Test6(myBit) SELECT @nf 
     
        SET @i = @i + 1 
    END

     
    On my system, this took roughly seven minutes. Your mileage may vary. 
     
    So now that we have the data in there, let's run the following sets of queries. 
     

    SELECT COUNT(*) FROM dbo.Test1 
    SELECT COUNT(*) FROM dbo.Test2 
    SELECT COUNT(*) FROM dbo.Test3 
    SELECT COUNT(*) FROM dbo.Test4 
    SELECT COUNT(*) FROM dbo.Test5 
    SELECT COUNT(*) FROM dbo.Test6
    SELECT COUNT(*) FROM dbo.Test1 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test1 WHERE MyBit=1 
    SELECT COUNT(*) FROM dbo.Test2 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test2 WHERE MyBit=1 
    SELECT COUNT(*) FROM dbo.Test3 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test3 WHERE MyBit=1 
    SELECT COUNT(*) FROM dbo.Test4 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test4 WHERE MyBit=1 
    SELECT COUNT(*) FROM dbo.Test5 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test5 WHERE MyBit=1 
    SELECT COUNT(*) FROM dbo.Test6 WHERE MyBit=0 
    SELECT COUNT(*) FROM dbo.Test6 WHERE MyBit=1

     
    If you observe the execution plan and statistics, you will see that those with the table scan (Test1 and Test4) require the least amount of reads and percentage of work. 
     
    However, these results change ever so slightly if you are performing grouping and aggregates in the same query: 
     

    SELECT MyBit, COUNT(*) FROM dbo.Test1 GROUP BY MyBit 
    SELECT MyBit, COUNT(*) FROM dbo.Test2 GROUP BY MyBit 
    SELECT MyBit, COUNT(*) FROM dbo.Test3 GROUP BY MyBit 
    SELECT MyBit, COUNT(*) FROM dbo.Test4 GROUP BY MyBit 
    SELECT MyBit, COUNT(*) FROM dbo.Test5 GROUP BY MyBit 
    SELECT MyBit, COUNT(*) FROM dbo.Test6 GROUP BY MyBit

     
    Here we see that the clustered index has a slight edge in query cost, but slightly higher I/O cost: 
     

     
    And in SQL Server 2005, the clustered index scan has an even greater advantage: 
     

     
     
    So, the answer to this one is yes, you can create a clustered index on a BIT column. 
     
    However, as for whether you SHOULD—as with so many other choices—it depends. 
     
    Jeff Gray correctly points out that the optimizer will do a little bit better if you explicitly tell it that you are dealing with a BIT (since the engine assumes INT), e.g.: 
     

    WHERE MyBit = CONVERT(BIT, 0) 
    --or 
    WHERE MyBit = CAST(0 AS BIT)

     
    Though that can come with a trade-off as well; namely, remembering to explicitly convert values on every statement. Whereas, if you use a CHAR(1) constrained to 'T'/'F', for example, no explicit conversion is necessary for the proper index to be used. 


    Now clean up, because you probably don't want this data to hang around: 
     

    DROP TABLE 
        dbo.Test1, dbo.Test2, dbo.Test3, 
        dbo.Test4, dbo.Test5, dbo.Test6

     
    Or, just: 
     

    DROP DATABASE Splunge
  • 相关阅读:
    Linux下Fortran多文件编译
    java用poi实现对word读取和修改操作
    SQL DATEDIFF语法及时间函数 Sql 查询当天、本周、本月记录
    深入Java集合学习系列:LinkedHashSet的实现原理
    Log4Net日志
    程序员创业如何才能成功?
    Asp.net 数据采集基类(远程抓取,分解,保存,匹配)
    response.setContentType()的String参数及对应类型
    深入Java集合学习系列:LinkedHashMap的实现原理
    深入Java集合学习系列:HashSet的实现原理
  • 原文地址:https://www.cnblogs.com/qanholas/p/2439542.html
Copyright © 2011-2022 走看看