zoukankan      html  css  js  c++  java
  • InnoDB表统计效率 count(主键) < count(1) ≈ count(*)

    MySQL count知多少

     

    统计一个表的数据量是经常遇到的需求,但是不同的表设计及不同的写法,统计性能差别会有较大的差异,下面就简单通过实验进行测试(大家测试的时候注意缓存的情况,否则影响测试结果)。

    1、 准备工作

    为了后续测试工作的进行,先准备几张用于测试的表及数据,为了使测试数据具有参考意义,建议测试表的数据量大一点,以免查询时间太小,因此,可以继续使用之前常用的连续数生成大法,如下:

    复制代码
    /* 创建连续数表 */
    CREATE TABLE nums(id INT primary key);
    
    /* 生成连续数的存储过程,优化过后的 */
    DELIMITER $$
    CREATE  PROCEDURE `sp_createNum`(cnt INT )
    BEGIN
        DECLARE i INT  DEFAULT 1;
        TRUNCATE TABLE nums;
        INSERT INTO nums SELECT i;
        WHILE i < cnt DO
          BEGIN
            INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt;
            SET i = i*2;
          END;
        END WHILE;
    END$$
    
    DELIMITER ;
    复制代码

    生成数据,本次准备生成1kw条记录

    /* 调用存储过程 */
    mysql> call sp_createNum(10000000);
    Query OK, 1611392 rows affected (32.07 sec)

    如果逐条循环,那时间相当长,大家可以自行测试,参考链接 效率提升16800倍的连续整数生成方法

    1.1 创建innodb表

    生成3张表innodb表,如下:

    nums_1表只有字符串主键字段

    复制代码
    /*  生成只有一个字符串类型字段主键的表nums_1 */
    mysql> create table  nums_1 (p1 varchar(32) primary key ) engine=innodb;
    Query OK, 0 rows affected (0.01 sec)
    
    /*  导入数据,将id通过md5函数转换为字符串 */
    mysql> insert into  nums_1 select md5(id) from nums;
    Query OK, 10000000 rows affected (1 min 12.63 sec)
    Records: 10000000  Duplicates: 0  Warnings: 0
    复制代码

    nums_2表有5个字段 ,其中主键为字符串类型字段的p1,其他字段为整型的id,非空的c1,可为空的c2,可为空的c3。

    其中c1,c2字段内容完全一致,差别是字段约束不一样(c1不可为空,c2可为空),c3与c1,c2的差别在于c1中aa开头的值在c3中为null,其他内容一样。

    复制代码
    /* 创建表nums_2 */
    mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
    Query OK, 0 rows affected (1.03 sec)
    
    /*导入数据 */
    mysql> insert into  nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums;
    Query OK, 10000000 rows affected (5 min 6.68 sec)
    Records: 10000000  Duplicates: 0  Warnings: 0
    复制代码

    nums_3表的内容与nums_2完全一样,区别在于主键字段不一样,c3表为整型的id

    复制代码
    /*  创建表nums_3 */
    mysql> create table nums_3(p1 varchar(32) ,id int primary key  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
    Query OK, 0 rows affected (0.01 sec)
    
    /* 因为内容完全一致,直接从nums_2 中导入 */
    mysql> insert into nums_3 select  * from nums_2;
    Query OK, 10000000 rows affected (3 min 18.81 sec)
    Records: 10000000  Duplicates: 0  Warnings: 0
    复制代码

    1.2 创建MyISAM引擎表

    再创建一张MyISAM的表,表结构及内容均与nums_2也一致,只是引擎为MyISAM。

    复制代码
    /* 创建MyISAM引擎的nums_4表*/
    mysql> create table nums_4(p1 varchar(32) not null  primary key ,id int  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM;
    Query OK, 0 rows affected (0.00 sec)
    
    /* 直接从nums_2表导入数据 */
    mysql> insert into nums_4 select  * from nums_2;
    Query OK, 10000000 rows affected (3 min 16.78 sec)
    Records: 10000000  Duplicates: 0  Warnings: 0
    复制代码

    2、 查询一张表数据量的方法

    查询一张表的数据量有如下几种:

    查询大致数据量,可以查统计信息,2.1中会介绍具体方法

    精确查找数据量,则可以通过count(主键字段),count(*), count(1) [这里的1可以替换为任意常量]

    2.1  非精确查询

    如果只是查一张表大致有多少数据,尤其是很大的表 只是查询其表属于什么量级的(百万、千万还是上亿条),可以直接查询统计信息,查询方式有如下几种:

    查询索引信息,其中Cardinality 为大致数据量(查看主键PRIMARY行的值,如果为多列的复合主键,则查看最后一列的Cardinality 值)

    复制代码
    mysql> show index from nums_2;
    +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
    +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | nums_2 |          0 | PRIMARY  |            1 | p1          | A         |     9936693 |     NULL | NULL   |      | BTREE      |         |               |
    +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    1 row in set (0.00 sec)
    复制代码

    查看表状态,其中Rows为大致数据量

    复制代码
    mysql> show table status like  'nums_2';
    +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
    | Name   | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
    +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
    | nums_2 | InnoDB |      10 | Dynamic    | 9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
    +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
    1 row in set (0.00 sec)
    复制代码

    直接查看STATISTICS或TABLES表,内容与查看索引信息或表状态类似,其中TABLE_ROWS的内容为大致的数据量

    复制代码
    mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_2';
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
    | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
    | def           | testdb       | nums_2     | BASE TABLE | InnoDB |      10 | Dynamic    |    9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
    1 row in set (0.00 sec)
    复制代码

    注意:

    • innodb引起的表通过以上3种方式均可查询对应表的大致数据量,且结果相同,因为均是取自相同的统计信息
    • MyISAM表的结果是精确值(表数据量,不包含其他字段)
    复制代码
    mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_4';
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
    | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME         | CHECK_TIME          | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
    | def           | testdb       | nums_4     | BASE TABLE | MyISAM |      10 | Dynamic    |   10000000 |             75 |   759686336 | 281474976710655 |    854995968 |         0 |           NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci |     NULL |                |               |
    +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
    1 row in set (0.00 sec)
    复制代码

    2.2  精确查找

    因为2.1中innodb的表查询的结果都是统计值,非准备值,实际工作中大多数情况下需要统计精确值,那么查询精确值的方法有如下几种,且所有引擎的表都适用。

    count(主键)

    复制代码
    mysql> select count(p1) from nums_2;
    +-----------+
    | count(p1) |
    +-----------+
    |  10000000 |
    +-----------+
    1 row in set (1.60 sec)
    复制代码

    count(1)

    其中的1可以是任意常量,例如 count(2),count('a‘)等

    复制代码
    mysql> select count(1) from nums_2;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.45 sec)
    复制代码

    count(*) 

    复制代码
    mysql> select count(*) from nums_2;
    +----------+
    | count(*) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.52 sec)
    复制代码

    3、 count的性能对比

    对比 count(主键) count(1)  count(*)   count(非空字段) count(可为空字段)  性能对比

    3.1  MyISAM引擎表

    3.1.1 查询整张表数据量

    如果想精确查询一张MyISAM表的数据量,使用 count(主键) count(1)  count(*) 效率均一致,直接查出准确结果,耗时几乎为0s

    复制代码
    mysql> select count(p1) from nums_4;
    +-----------+
    | count(p1) |
    +-----------+
    |  10000000 |
    +-----------+
    1 row in set (0.00 sec)
    
    mysql> select count(1) from nums_4;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (0.00 sec)
    
    mysql> select count(*) from nums_4;
    +----------+
    | count(*) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (0.00 sec)
    复制代码

    执行计划也均一致,可以看出没有通过主键或其他索引扫描的方式统计

    复制代码
    mysql> explain select count(*) from nums_4;
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    |  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    1 row in set, 1 warning (0.00 sec)
    
    mysql> explain select count(p1) from nums_4;
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    |  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    1 row in set, 1 warning (0.00 sec)
    
    mysql> explain select count(1) from nums_4;
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    |  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
    1 row in set, 1 warning (0.00 sec)
    复制代码

    小结:

    MyISAM的方法查整表数据量效率情况为 count(主键)=  count(1) = count(*)

    3.1.2 查询部分数据

    查询部分数据的时候则无法直接从统计信息获取,因此耗时情况大致如下:

    复制代码
    mysql> select count(p1) from nums_4 where  p1 like 'aa%';
    +-----------+
    | count(p1) |
    +-----------+
    |     39208 |
    +-----------+
    1 row in set (0.14 sec)
    
    mysql> select count(1) from nums_4 where  p1 like 'aa%';
    +----------+
    | count(1) |
    +----------+
    |    39208 |
    +----------+
    1 row in set (0.13 sec)
    
    mysql> select count(*) from nums_4 where p1 like 'aa%';
    +----------+
    | count(*) |
    +----------+
    | 39208 |
    +----------+
    1 row in set (0.13 sec)
    复制代码

    执行计划其实均一样:

    复制代码
    mysql> explain select count(1) from nums_4 where  p1 like 'aa%';
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
    | id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
    |  1 | SIMPLE      | nums_4 | NULL       | range | PRIMARY       | PRIMARY | 98      | NULL | 42603 |   100.00 | Using where; Using index |
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
    1 row in set, 1 warning (0.00 sec)
    复制代码

    小结:  MyISAM引擎表统计部分数据的时候直接得出数据量,也许扫描数据进行统计,几种写法效率相近。

    3.2   innodb引擎表

    innodb引擎因为要支持MVCC,因此不能整表数据量持久化保存,每次查询均需遍历统计,但是不同的写法,查询效率是有差别的,后面将进行不同维度进行对比。

    3.2.1  不同写法的性能对比

    通过 count(主键),count(1) , count(*) 对比查询效率

    复制代码
    mysql> select count(p1) from nums_2  ;
    +-----------+
    | count(p1) |
    +-----------+
    |  10000000 |
    +-----------+
    1 row in set (1.68 sec)
    
    mysql> select count(1) from nums_2  ;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.37 sec)
    
    mysql> select count(*) from nums_2  ;
    +----------+
    | count(*) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.38 sec)
    复制代码

    简单的对比发现,查询性能结果为 count(主键) < count(1) ≈ count(*)

    但是查看执行计划都是如下情况

    复制代码
    mysql> explain select count(p1) from nums_2;
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
    | id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
    |  1 | SIMPLE      | nums_2 | NULL       | index | NULL          | PRIMARY | 98      | NULL | 9936693 |   100.00 | Using index |
    +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
    1 row in set, 1 warning (0.00 sec
    复制代码

    但是查询效率不一样,原因在于统计的方式不一样,如下:

    • count(主键):innodb引擎根据对应的索引遍历整张表,把每一行的主键值都取出来,返回给 server 层。server 层拿到主键字段后,判断是不为空的(此处其实可以优化),就按行累加。
    • count(1):也是遍历整张表,因为每行的结果都是1(非空),所以可以直接计数,无需判断是否为空。
    • count(*):  innodb引擎做了优化处理的,此种方式和count(1)类似,直接按行累计统计

    3.2.2  主键字段类型不同性能对比

    nums_2与nums_3内容相同,区别在于num_3的主键字段是整型的id字段,现在对比主键字段不同时查询性能的差别,

    复制代码
    mysql> select /* SQL_NO_CACHE */count(1) from nums_2;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (2.02 sec)
    
    mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.69 sec)
    复制代码

    测试发现,相同内容数据的表表主键不同,性能存在差异,且,查询时主键(索引)字段类型小的时候查询效率更好。

    注:如果nums_2的id字段上添加索引后,会发现查询会走id的索引,原因在于主键索引(聚集索引)的类型是varchar(32),而id是int,索引的大小不一样,走整型的索引IO开销会少。

    因此,建议MySQL的主键使用自增id作为主键(优势不仅在数据统计上,有机会在讲解)。

    3.2.3  表大小不同的对比

    准备工作中的nums_1 与nums_3差别在于主键都是整型的id 但是nums_3的字段更多,也就是说表更大,查询效率对比如下:

    复制代码
    mysql> select /* SQL_NO_CACHE */count(1) from nums_1;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.61 sec)
    
    mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
    +----------+
    | count(1) |
    +----------+
    | 10000000 |
    +----------+
    1 row in set (1.67 sec)
    复制代码

    查询时间仅供参考,取决于机器性能。

    由此可见表大小不同,查询效率也不同,表越小查询效率越高。

    3.2.4  count(普通字段)

    因为nums_3表的c2字段允许为空,但是内容均不为空,c3字段允许为空,但是存在内容为空的情况。现在将nums_3表的c2,c3字段分别统计,查看结果(先添加索引,提高查询性能)

    复制代码
    mysql> select  count(c2) from  nums_3 ;
    +-----------+
    | count(c2) |
    +-----------+
    |  10000000 |
    +-----------+
    1 row in set (1.69 sec)
    
    mysql> select  count(c3) from  nums_3 ;
    +-----------+
    | count(c3) |
    +-----------+
    |   9960792 |
    +-----------+
    1 row in set (1.73 sec)
    复制代码

    因为c3字段有存在null的值,索引 统计c3行数的时候会忽略null值的行。

    4、 总结

    以上通过对比MyISAM引擎及InnoDB引擎表通过不同写法的统计效率进行对比,可以得到如下结论:

    • MyISAM表统计整表行数可以直接取出,效率最高,但是MyISAM表不支持事务
    • InnoDB表统计效率 count(主键) < count(1) ≈ count(*)
    • MySQL建议设置自增字段类型的主键
    • 表大小越小,查询统计效率越高
  • 相关阅读:
    Windows核心编程笔记:同步设备I/O与异步设备I/O 200404
    最顶层窗口
    Windows环境变量
    如何使用Beyond Compare比较两个文件夹的差异
    c++生成缩略图
    MD5加密算法
    c++读写注册表
    通过WMI获取网卡MAC地址、硬盘序列号、主板序列号、CPU ID、BIOS序列号
    Hive中HSQL中left semi join和INNER JOIN、LEFT JOIN、RIGHT JOIN、FULL JOIN区别
    postgreSQL格式化时间的函数详解
  • 原文地址:https://www.cnblogs.com/Leo_wl/p/12660603.html
Copyright © 2011-2022 走看看