zoukankan      html  css  js  c++  java
  • MySQL 如何执行关联查询

    本文同时发表在https://github.com/zhangyachen/zhangyachen.github.io/issues/51

    当前mysql执行的策略很简单:mysql对任何关联都执行嵌套循环操作,即mysql先在一个表中循环取出单条数据,然后再嵌套循环到下一个表中寻打匹配的行,依次下去,直到描述到所表表中匹配的行为止。然后根据各个表匹配的行,返回查询中需要的各个列。mysql会尝试在最后一个关联表中打到所有匹配的行,如果最后一个关联表无法找到更多的行以后,mysql返回到上一层次关联表,看是否能够找到更多的匹配记录,依此类推迭代执行。

    按照这样的方式查找第一条表记录,再嵌套查询下一个关联表,然后回溯到上一个表,在mysql中是通过嵌套循环的方式来实现的--正如其名‘嵌套循环关联’。请看下面的例子中的简单查询:

    select tbl1.col1,tbl2.col2  
    from tbl1 inner join tbl2 using(col3) 
    where tbl1.col1 in(5,6);
    

    假设mysql按照查询中的表顺序进行关联操作,我们则可以用下面的伪代码表示mysql将如何完成这个查询:

    outer_iter = iterator_over tbl1 where col1 in(3,4)
    outer_row = outer_iter.next
        while outer_row
        inner_iter = iterator over tbl2 where col3=outer_row.col3
        inner_row = inner_iter.next
        while inner_row
         output[outer_row.col1,inner_row.col2]
            inner_row = inner_iter.next
         end
         out_row = outer_iter.next
    end
    

    上面的执行计划对于单表查询和多表关联查询都适用,如果是一个单表查询,那么只需要完成上面的外层的基本操作。对于外连接和上面的执行过程任然适用。例如我们将上面的查询修改如下:

    SELECT tbl1.col1 ,tbl2.col2 FROM tbl1 left outer join tbl2 using (col3) WHERE tbl1.col1 in (3,4)
    

    对应的伪代码:

    outer_iter = iterator over tbl1 where col1 in(3,4)
    outer row = outer_iter.next
    while outer_row
        inner_iter = iterator over tbl2 where col3 = outer_row.col3
        inner_row = inner_iter.next
        if inner row   -> 手动加粗
            while inner_row
                out_put [outer_row.col1,inner_row.col2]
                inner_row = inner_iter.next
            end
        else         -> 手动加粗
            out_put[outer_row.col1,NULL]         -> 手动加粗
        end
        outer_row = outer_iter.next
    end
    

    从上面两个例子也可以看出,对于主表来说,是先进行主表的where条件筛选,再进行表联接,而不是先进行整表联接再进行where条件的筛选。

    举个例子:

    数据表结构:

    mysql> create table a(
        -> id int unsigned not null primary key
        -> );
    
    mysql> create table b like a;
    

    表中数据:

    mysql> select * from a;
    +----+
    | id |
    +----+
    |  1 |
    |  2 |
    |  3 |
    |  4 |
    |  5 |
    +----+
    5 rows in set (0.00 sec)
    
    mysql> select * from b;
    +----+
    | id |
    +----+
    |  4 |
    |  5 |
    |  6 |
    |  7 |
    +----+
    4 rows in set (0.00 sec)
    

    explain查询:

    mysql> explain select a.id as aid,b.id as bid from a left join b using(id) where a.id>3;
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+
    | id | select_type | table | type   | possible_keys | key     | key_len | ref      | rows | Extra                    |
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+
    |  1 | SIMPLE      | a     | range  | PRIMARY       | PRIMARY | 4       | NULL     |    3 | Using where; Using index |
    |  1 | SIMPLE      | b     | eq_ref | PRIMARY       | PRIMARY | 4       | com.a.id |    1 | Using index              |
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+
    2 rows in set (0.00 sec)
    

    可以看出,首先在a表上进行范围查询,筛选出a.id>3的数据,然后在进行"嵌套查询"。

    注意,on后面的筛选条件主要是针对的是关联表,而对于主表筛选并不适用,比如:

    mysql>  select a.id as aid,b.id as bid from a left join b on a.id=b.id and a.id>3;
    +-----+------+
    | aid | bid  |
    +-----+------+
    |   1 | NULL |
    |   2 | NULL |
    |   3 | NULL |
    |   4 |    4 |
    |   5 |    5 |
    +-----+------+
    5 rows in set (0.00 sec)
    
    mysql> explain select a.id as aid,b.id as bid from a left join b on a.id=b.id and a.id>3;
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+
    | id | select_type | table | type   | possible_keys | key     | key_len | ref      | rows | Extra       |
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+
    |  1 | SIMPLE      | a     | index  | NULL          | PRIMARY | 4       | NULL     |    5 | Using index |
    |  1 | SIMPLE      | b     | eq_ref | PRIMARY       | PRIMARY | 4       | com.a.id |    1 | Using index |
    +----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+
    2 rows in set (0.00 sec)
    

    我们发现a表的id<=3的数据并未被筛选走,explain的结果是a表进行了index类型的查询,即主键索引的全部扫描。

    如果在on的筛选条件是针对b表的呢,情况会怎么样?
    下面的例子数据表结构和数据变了,我们只关注查询的结果区别:

    mysql> select * from a left join b on a.data=b.data and b.id<=20;
    +----+------+------+------+
    | id | data | id   | data |
    +----+------+------+------+
    |  1 |    1 | NULL | NULL |
    |  2 |    2 | NULL | NULL |
    |  3 |    3 | NULL | NULL |
    |  4 |    4 |    1 |    4 |
    |  4 |    4 |    6 |    4 |
    |  4 |    4 |   11 |    4 |
    |  4 |    4 |   16 |    4 |
    |  5 |    5 |    2 |    5 |
    |  5 |    5 |    7 |    5 |
    |  5 |    5 |   12 |    5 |
    |  5 |    5 |   17 |    5 |
    +----+------+------+------+
    11 rows in set (0.00 sec)
    
    mysql> select * from a left join b on a.data=b.data where b.id<=20;
    +----+------+------+------+
    | id | data | id   | data |
    +----+------+------+------+
    |  4 |    4 |    1 |    4 |
    |  5 |    5 |    2 |    5 |
    |  4 |    4 |    6 |    4 |
    |  5 |    5 |    7 |    5 |
    |  4 |    4 |   11 |    4 |
    |  5 |    5 |   12 |    5 |
    |  4 |    4 |   16 |    4 |
    |  5 |    5 |   17 |    5 |
    +----+------+------+------+
    8 rows in set (0.00 sec)
    

    由此,我们可以根据伪码来分析两者的区别:

    outer_iter = iterator over a
    outer row = outer_iter.next
    while outer_row
        inner_iter = iterator over b where data = outer_row.date where id<=20
        inner_row = inner_iter.next
        if inner row   
            while inner_row
                out_put [outer_row,inner_row]
                inner_row = inner_iter.next
            end
        else         
            out_put[outer_row,NULL]         
        end
        outer_row = outer_iter.next
    end
    
    outer_iter = iterator over a
    outer row = outer_iter.next
    while outer_row
        inner_iter = iterator over b where data = outer_row.date  ->手动加粗
        inner_row = inner_iter.next
        if inner row   
            while inner_row
                out_put [outer_row,inner_row]
                inner_row = inner_iter.next
            end
        else         
            out_put[outer_row,NULL]         
        end
        outer_row = outer_iter.next
    end
    left join的结果集中 where b.id<=20  ->手动加粗
    

    参考资料:《高性能MySQL》

  • 相关阅读:
    东驴西磨
    深入理解计算机系统 第二章信息的表示和处理(笔记小结)
    Amdahl 阿姆达尔定律(系统优化)
    想你的夜——葱油饼
    TypeScript and AWS Lambda practicing road map
    "Silicon Valley-like" companies think of engineers as value generators
    original intentions
    Canopy is hiring Postgres SQL Engineer
    在多线程中 feign 调用,服务提供方拿不到 request 的错误
    CF1542D
  • 原文地址:https://www.cnblogs.com/zhangyachen/p/8035704.html
Copyright © 2011-2022 走看看