zoukankan      html  css  js  c++  java
  • 含有IN的子查询

    本文主要探讨MySQL5.5和MYSQL5.6对包含in子查询的优化情况。

    • MySQL5.5

    首先来看一下,在MySQL5.5及以下版本是如何处理的,比如下面这条语句:

    select name from search_hash where info_hash in (select info_hash from search_filelist);

    这种语句我们见的很多,熟悉mysql的人都会建议对其改写,但为什么呢?先看它的执行计划:

    mysql> explain extended select name from search_hash where info_hash in (select info_hash from search_filelist);
    +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
    | id | select_type        | table           | type            | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
    +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
    |  1 | PRIMARY            | search_hash     | ALL             | NULL          | NULL    | NULL    | NULL | 604435 |   100.00 | Using where |
    |  2 | DEPENDENT SUBQUERY | search_filelist | unique_subquery | PRIMARY       | PRIMARY | 122     | func |      1 |   100.00 | Using index |
    +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+

    很奇怪吧,第二行显示查询的类型是相关子查询。这怎么就扯上相关子查询呢?再看一下优化器的执行情况:

    mysql> show warnings;
    +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Level | Code | Message                                                                                                                                                                                                                                 |
    +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Note  | 1003 | select `zsky`.`search_hash`.`name` AS `name` from `zsky`.`search_hash` where <in_optimizer>(`zsky`.`search_hash`.`info_hash`,<exists>(<primary_index_lookup>(<cache>(`zsky`.`search_hash`.`info_hash`) in search_filelist on PRIMARY))) |
    +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    MYSQL优化器会把这个查询改写为下面的相关子查询:
    select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
    不信看下面,两者的执行计划极其相似。

    mysql> explain select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
    +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
    | id | select_type        | table           | type   | possible_keys | key     | key_len | ref                        | rows   | Extra       |
    +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
    |  1 | PRIMARY            | search_hash     | ALL    | NULL          | NULL    | NULL    | NULL                       | 604435 | Using where |
    |  2 | DEPENDENT SUBQUERY | search_filelist | eq_ref | PRIMARY       | PRIMARY | 122     | zsky.search_hash.info_hash |      1 | Using index |
    +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+

    改写后的查询会对search_hash表做全表扫描,然后用每一个info_hash值再去 search_filelist表查,如果外面这个表很大的话,就会很慢。比如例子中的表有60万行,也就是子查询要执行60万次。一般别人会建议改写为表连接,比如下面这样:
    select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash;

    mysql> explain select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash ;
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
    | id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
    |  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 144657 | Using index |
    |  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | Using where |
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+

    这个时候是扫描整个search_filelist表,然后对每一个info_hash值去search_hash表中查。但为什么改写为表连接后就快些呢?
    当然左表能够使用索引是一部分原因。此外需要注意到的是,优化器自动会选择记录较少的search_filelist作为左表。虽然都需要扫描整张表,但后者需要扫描的数据量少很多,执行的次数也少。一般情况下,in里面的子查询返回的结果集都会是主查询的子集,或者记录少于主查询,那么改写后的查询需要扫描的数量也就更少。

     

    • MySQL5.6

    而在MySQL5.6中,优化器会自动将in 子查询改写为表连接。也就是说在MYSQL5.6及以上版本已经不需要手工优化包含in的子查询。下面是在MYSQL5.6中的的执行计划:

    mysql> explain select name from search_hash where info_hash in (select info_hash from search_filelist);
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
    | id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
    |  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 110474 | Using index |
    |  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | NULL        |
    +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
    转载请注明出处:http://www.cnblogs.com/ayard/
  • 相关阅读:
    17.正则表达式
    16.os模块-shutil模块-tarfile压缩模块
    15.序列化模块-时间模块-zip压缩模块
    第一章 单变量线性回归
    如何跑通MonoRTM模型的官方例子
    PHP命名规范
    js中要声明变量吗?
    php抓取网站图片源码
    InnoDB和MyISAM区别总结
    php分页代码。
  • 原文地址:https://www.cnblogs.com/ayard/p/9317507.html
Copyright © 2011-2022 走看看