zoukankan      html  css  js  c++  java
  • MySQL实验 内连接优化order by+limit 以及添加索引再次改进

    MySQL实验 内连接优化order by+limit 以及添加索引再次改进

    在进行子查询优化双参数limit时我萌生了测试更加符合实际生产需要的ORDER BY + LIMIT的想法,或许我们也可以对ORDER BY + LIMIT 也进行适当优化

     

    实验准备

    使用MySQL官方的大数据库employees进行实验,导入该示例库见此

    准备使用其中的employees表,先查看一下表结构和表内的记录数量

    mysql> desc employees;
    +------------+---------------+------+-----+---------+-------+
    | Field      | Type          | Null | Key | Default | Extra |
    +------------+---------------+------+-----+---------+-------+
    | emp_no     | int(11)       | NO   | PRI | NULL    |       |
    | birth_date | date          | NO   |     | NULL    |       |
    | first_name | varchar(14)   | NO   |     | NULL    |       |
    | last_name  | varchar(16)   | NO   |     | NULL    |       |
    | gender     | enum('M','F') | NO   |     | NULL    |       |
    | hire_date  | date          | NO   |     | NULL    |       |
    +------------+---------------+------+-----+---------+-------+
    6 rows in set (0.00 sec)
    
    mysql> select count(*) from employeed;
    ERROR 1146 (42S02): Table 'employees.employeed' doesn't exist
    mysql> select count(*) from employees;
    +----------+
    | count(*) |
    +----------+
    |   300024 |
    +----------+
    1 row in set (0.05 sec)
    

    我们可以看到,只有主键emp_no有索引

     

    实验过程

    MySQL5.7官网对Explain各项参数的解释

    官网对ORDER BY机制的详解

    explain参数5.7版本推荐参考博客

    老版本explain推荐参考博客(即新版本默认explain extended)

    关于explain参数的拓展链接

    MySQL explain key值的解释

     

    使用未优化order by + limit

    mysql> select * from employees order by birth_date limit 200000,10;
    +--------+------------+------------+------------+--------+------------+
    | emp_no | birth_date | first_name | last_name  | gender | hire_date  |
    +--------+------------+------------+------------+--------+------------+
    | 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
    | 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |
    | 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |
    | 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |
    | 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |
    | 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |
    | 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |
    |  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
    | 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |
    |  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |
    +--------+------------+------------+------------+--------+------------+
    10 rows in set (0.20 sec)
    
    mysql> explain select * from employees order by birth_date limit 200000,10;
    +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
    | id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |
    +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
    |  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299468 |   100.00 | Using filesort |
    +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
    1 row in set, 1 warning (0.00 sec)
    

    我们可以看到,未优化时使用的是全表扫描,花费0.2s

     

    内连接优化

    优化思路我们可以利用主键emp_no的索引树,在索引树上将符合order by birth_date limit 200000,10的元组(即,行)的主键找出来,再用内连接返回10行emp_no的所有信息。

    (内连接只返回表中与连接条件相匹配的行,也就是说,select emp_no from employees order by birth_date limit 200000,10只会返回10个emp_no,那么内连接后,结果集中也只有10个emp_no对应的所有信息)

    (另外这里的内连接时使用了emp_no,即,子查询中也有"覆盖索引"减少磁盘I/O的功劳)

    mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
    +--------+------------+------------+-----------+--------+------------+
    | emp_no | birth_date | first_name | last_name | gender | hire_date  |
    +--------+------------+------------+-----------+--------+------------+
    | 427365 | 1960-09-24 | Yuping     | Sethi     | M      | 1990-06-21 |
    | 424219 | 1960-09-25 | Woody      | Bernini   | M      | 1989-03-10 |
    | 469218 | 1960-09-25 | George     | Plotkin   | M      | 1992-02-19 |
    | 404121 | 1960-09-25 | Domenico   | Birnbaum  | M      | 1993-08-01 |
    | 404266 | 1960-09-25 | Quingbo    | Jervis    | F      | 1985-03-15 |
    | 409133 | 1960-09-25 | Nitsan     | Kleiser   | F      | 1985-05-18 |
    | 409558 | 1960-09-25 | Shunichi   | Hofting   | F      | 1992-07-06 |
    | 412045 | 1960-09-25 | Kristin    | Bolotov   | F      | 1985-06-28 |
    | 481404 | 1960-09-25 | Sanjeeva   | Eterovic  | F      | 1986-06-05 |
    | 483269 | 1960-09-25 | Mitchel    | Pramanik  | F      | 1997-07-23 |
    +--------+------------+------------+-----------+--------+------------+
    10 rows in set (0.10 sec)
    
    mysql> explain select * from employees inner join (select emp_no from employees order by birth_date limit 100000,10) as table_temp using (emp_no);
    +----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
    | id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref               | rows   | filtered | Extra          |
    +----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
    |  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 100010 |   100.00 | NULL           |
    |  1 | PRIMARY     | employees  | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | table_temp.emp_no |      1 |   100.00 | NULL           |
    |  2 | DERIVED     | employees  | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 299468 |   100.00 | Using filesort |
    +----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
    3 rows in set, 1 warning (0.00 sec)
    

    可见效率提高了一倍,在explain中

    • 第三行的select_type为DERIVED,是指这行是包含在from子句中的查询,我们可以看到,子句查询也没有使用索引

    • <derived2>是指,第一行的查询说明表示当前查询依赖 id=N 的查询,此处N=2,那我们先看第二行:

      第二行type为eq_ref是指primary key 或 unique key 索引被连接(join)使用,,对于每个索引键的关联查询,返回匹配唯一行数据(有且只有1个)。在这里就是说在子查询查询到emp_no后,子查询中产生的临时表与employees表进行连接。

    • (对于这里的explain的解释只包含了对explain各项参数的解释,但似乎没有办法直接验证优化思路,还望各位看官前辈指点)

     

    为排序字段加上索引

    既然我们在内连接中是通过排序字段birth_date后对emp_no进行查询,那么我们或许能再为排序字段加上索引以再次提高效率。

    mysql> alter table employees add index birthdate_index (birth_date);
    Query OK, 0 rows affected (0.75 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    
    mysql> desc employees;
    +------------+---------------+------+-----+---------+-------+
    | Field      | Type          | Null | Key | Default | Extra |
    +------------+---------------+------+-----+---------+-------+
    | emp_no     | int(11)       | NO   | PRI | NULL    |       |
    | birth_date | date          | NO   | MUL | NULL    |       |
    | first_name | varchar(14)   | NO   |     | NULL    |       |
    | last_name  | varchar(16)   | NO   |     | NULL    |       |
    | gender     | enum('M','F') | NO   |     | NULL    |       |
    | hire_date  | date          | NO   |     | NULL    |       |
    +------------+---------------+------+-----+---------+-------+
    6 rows in set (0.00 sec)
    

    然后我们再次执行未优化和通过内连接优化的两条查询语句。

    mysql> select * from employees order by birth_date limit 200000,10;
    +--------+------------+------------+------------+--------+------------+
    | emp_no | birth_date | first_name | last_name  | gender | hire_date  |
    +--------+------------+------------+------------+--------+------------+
    | 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
    | 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |
    | 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |
    | 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |
    | 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |
    | 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |
    | 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |
    |  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
    | 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |
    |  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |
    +--------+------------+------------+------------+--------+------------+
    10 rows in set (0.20 sec)
    
    mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
    +--------+------------+------------+------------+--------+------------+
    | emp_no | birth_date | first_name | last_name  | gender | hire_date  |
    +--------+------------+------------+------------+--------+------------+
    | 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
    |  23102 | 1960-09-25 | Hsiangchu  | Harbusch   | M      | 1986-03-14 |
    |  29961 | 1960-09-25 | Susumu     | Munoz      | F      | 1989-12-31 |
    |  32061 | 1960-09-25 | Dipankar   | Buescher   | M      | 1992-10-24 |
    |  36216 | 1960-09-25 | Xianlong   | Rassart    | F      | 1987-09-05 |
    |  37058 | 1960-09-25 | Khue       | Osgood     | M      | 1991-11-04 |
    |  38365 | 1960-09-25 | Sariel     | Ramsak     | M      | 1993-02-26 |
    |  39901 | 1960-09-25 | Jianhui    | Ushiama    | M      | 1985-12-03 |
    |  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
    |  63784 | 1960-09-25 | Rosita     | Zyda       | M      | 1988-08-12 |
    +--------+------------+------------+------------+--------+------------+
    10 rows in set (0.03 sec)
    

    我们可以看到,普通查询语句并没有得到效率上的提升,但是内连接的查询效率得到了很大的提升,花费时间从原来的0.1s缩减为0.03秒,也就是说,再次优化后的内连接差不多可以应对百万(甚至千万级,因为实际生产中所使用的硬件设施肯定会远远好与我现在的基础班ECS)级别的数据了。

     

    对于加上 birthdate_index索引后普通查询效率未提升的说明:

    因为我们查询的是select *,即使emp_no和birth_date上有索引,在查询其他列信息的时候,我们依然需要回表。因此即使加上索引后,我们的普通查询依然使用的是全表扫描。

     

    小结

    经过试验证明,内连接对于order by+双参数limit有一定效果,在合适的内连接子查询下,增加相应的索引,能够使性能进一步提升。从0.2到0.1在到0.03,当缩减一个数量级时,那都是很大的突破。(完结撒花~)

     

    最后的补充

    • EXPLAIN不会告诉你关于触发器、存储过程的信息或用户自定义函数对查询的影响情况

    • EXPLAIN不考虑各种Cache

    • EXPLAIN不能显示MySQL在执行查询时所作的优化工作

    • 部分统计信息是估算的,并非精确值

    • EXPALIN只能解释SELECT操作,其他操作要重写为SELECT后查看执行计划

     

  • 相关阅读:
    POJ 1141 括号匹配 DP
    881. Boats to Save People
    870. Advantage Shuffle
    874. Walking Robot Simulation
    文件操作
    861. Score After Flipping Matrix
    860. Lemonade Change
    842. Split Array into Fibonacci Sequence
    765. Couples Holding Hands
    763. Partition Labels
  • 原文地址:https://www.cnblogs.com/G-Aurora/p/13255053.html
Copyright © 2011-2022 走看看