zoukankan      html  css  js  c++  java
  • oracle 表连接

    一. nested loop 原理

    nested loop 连接(循环嵌套连接)指的是两个表连接时, 通过两层嵌套循环来进行依次的匹配, 最后得到返回结果集的表连接方法.

    假如下面的 sql 语句中表 T1 和 T2 的连接方式是循环嵌套连接, T1 是驱动表
    select *
    from T1, T2
    where T1.id = T2.id and T1.name = 'David';
    那么将上述 sql 语句翻译为伪码应该如下所示:

    1 for each row in (select * from T1 where name = 'David') loop
    2 for (select * from T2 where T2.id = outer.id) loop
    3 If match then pass the row on to the next step
    4 If no match then discard the row
    5 end loop
    6 end loop

    具体来说, 如果上述 sql 语句执行循环嵌套连接的话, 那么实际的执行过程应该如下所示:

    (1) 首先 oracle 会根据一定的规则(根据统计信息的成本计算或者 hint 强制)决定哪个表是驱动表, 哪个表是被驱动表 (假设 T1 是驱动表)
    (2) 查询驱动表 "select * from T1 where name = 'David'" 然后得到驱动结果集 Q1
    (3) 遍历驱动结果集 Q1 以及被驱动表 T2, 从驱动结果集 Q1 中取出一条记录, 接着遍历 T2 并按照连接条件 T2.id = T1.id 去判断 T2 中是否存在匹配的记录,
    如果能够匹配则保留, 不能匹配则忽略此行, 然后再从 Q1 中取出下一条记录, 接着遍历 T2 进行匹配, 如此下去直到取完 Q1 中的所有记录

    具体来说, 如果上述 sql 语句执行循环嵌套连接的话, 那么实际的执行过程应该如下所示:
    (1) 首先 oracle 会根据一定的规则(根据统计信息的成本计算或者 hint 强制)决定哪个表是驱动表, 哪个表是被驱动表 (假设 T1 是驱动表)
    (2) 查询驱动表 "select * from T1 where name = 'David'" 然后得到驱动结果集 Q1
    (3) 遍历驱动结果集 Q1 以及被驱动表 T2, 从驱动结果集 Q1 中取出一条记录, 接着遍历 T2 并按照连接条件 T2.id = T1.id 去判断 T2 中是否存在匹配的记录,
    如果能够匹配则保留, 不能匹配则忽略此行, 然后再从 Q1 中取出下一条记录, 接着遍历 T2 进行匹配, 如此下去直到取完 Q1 中的所有记录

    二. nested loop 特性

    嵌套循环连接有以下特性:

    (1) 通常 sql 语句中驱动表只访问一次, 被驱动表访问多次
    (2) 不必等待处理完成所有行前可以先返回部分已经处理完成的数据
    (3) 在限制条件以及连接条件列上建立索引, 能够提高执行效率
    (4) 支持所有类型的连接 (等值连接, 非等值连接, like 等)

    构造试验数据

    SQL> CREATE TABLE t1 (
    2 id NUMBER NOT NULL,
    3 n NUMBER,
    4 pad VARCHAR2(4000),
    5 CONSTRAINT t1_pk PRIMARY KEY(id)
    6 );
    Table created.
    
    SQL> CREATE TABLE t2 (
    2 id NUMBER NOT NULL,
    3 t1_id NUMBER NOT NULL,
    4 n NUMBER,
    5 pad VARCHAR2(4000),
    6 CONSTRAINT t2_pk PRIMARY KEY(id),
    7 CONSTRAINT t2_t1_fk FOREIGN KEY (t1_id) REFERENCES t1
    8 ); 
    Table created.
    
    SQL> CREATE TABLE t3 (
    2 id NUMBER NOT NULL,
    3 t2_id NUMBER NOT NULL,
    4 n NUMBER,
    5 pad VARCHAR2(4000),
    6 CONSTRAINT t3_pk PRIMARY KEY(id),
    7 CONSTRAINT t3_t2_fk FOREIGN KEY (t2_id) REFERENCES t2
    8 ); 
    Table created.
    
    SQL> CREATE TABLE t4 (
    2 id NUMBER NOT NULL,
    3 t3_id NUMBER NOT NULL,
    4 n NUMBER,
    5 pad VARCHAR2(4000),
    6 CONSTRAINT t4_pk PRIMARY KEY(id),
    7 CONSTRAINT t4_t3_fk FOREIGN KEY (t3_id) REFERENCES t3
    8 ); 
    Table created.
    
    SQL> execute dbms_random.seed(0) 
    PL/SQL procedure successfully completed.
    
    
    SQL> INSERT INTO t1 SELECT rownum, rownum, dbms_random.string('a',50) FROM dual CONNECT BY level <= 10 ORDER BY dbms_random.random;
    10 rows created.
    
    SQL> INSERT INTO t2 SELECT 100+rownum, t1.id, 100+rownum, t1.pad FROM t1, t1 dummy ORDER BY dbms_random.random; 
    100 rows created.
    
    SQL> INSERT INTO t3 SELECT 1000+rownum, t2.id, 1000+rownum, t2.pad FROM t2, t1 dummy ORDER BY dbms_random.random; 
    1000 rows created.
    
    SQL> INSERT INTO t4 SELECT 10000+rownum, t3.id, 10000+rownum, t3.pad FROM t3, t1 dummy ORDER BY dbms_random.random; 
    10000 rows created.
    
    SQL> COMMIT; 
    Commit complete.

    使用 hint 让 sql 语句通过 nested loop 连接, 并且指定 t3 为驱动表

     1 SQL> select /*+ leading(t3) use_nl(t4) */ * from t3, t4
     2 2 where t3.id = t4.t3_id and t3.n = 1100;
     3 
     4 10 rows selected.
     5 
     6 SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
     7 
     8 PLAN_TABLE_OUTPUT
     9 ---------------------------------------------------------------------------------------------
    10 SQL_ID 89hnfwqakjghg, child number 0
    11 -------------------------------------
    12 select /*+ leading(t3) use_nl(t4) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
    13 
    14 Plan hash value: 1907878852
    15 
    16 -------------------------------------------------------------------------------------
    17 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
    18 -------------------------------------------------------------------------------------
    19 | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 121 |
    20 | 1 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 121 |
    21 |* 2 | TABLE ACCESS FULL| T3 | 1 | 1 | 1 |00:00:00.01 | 16 |
    22 |* 3 | TABLE ACCESS FULL| T4 | 1 | 10 | 10 |00:00:00.01 | 105 |
    23 -------------------------------------------------------------------------------------
    24 
    25 Predicate Information (identified by operation id):
    26 ---------------------------------------------------
    27 
    28 2 - filter("T3"."N"=1100)
    29 3 - filter("T3"."ID"="T4"."T3_ID")

    在执行计划中我们可以看到驱动表 T3 访问一次, 因为驱动表上有谓词条件 t3.n = 1100, 通过执行谓词条件后驱动结果集的记录数为 1, 所以 T4 也只访问一次(starts 列)

    使用 hint 让 sql 语句通过 nested loop 连接, 并且指定 t4 为驱动表

     1 SQL> select /*+ leading(t4) use_nl(t3) full(t4) full(t3) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
     2 
     3 SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
     4 
     5 PLAN_TABLE_OUTPUT
     6 ----------------------------------------------------------------------------------------------------------
     7 SQL_ID 0yxm1muqwrfq2, child number 0
     8 -------------------------------------
     9 select /*+ leading(t4) use_nl(t3) full(t4) full(t3) */ * from t3, t4
    10 where t3.id = t4.t3_id and t3.n = 1100
    11 
    12 Plan hash value: 3886808168
    13 
    14 -------------------------------------------------------------------------------------
    15 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
    16 -------------------------------------------------------------------------------------
    17 | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.25 | 150K|
    18 | 1 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.25 | 150K|
    19 | 2 | TABLE ACCESS FULL| T4 | 1 | 10000 | 10000 |00:00:00.01 | 105 |
    20 |* 3 | TABLE ACCESS FULL| T3 | 10000 | 1 | 10 |00:00:00.21 | 150K|
    21 -------------------------------------------------------------------------------------
    22 
    23 Predicate Information (identified by operation id):
    24 ---------------------------------------------------
    25 
    26 3 - filter(("T3"."N"=1100 AND "T3"."ID"="T4"."T3_ID"))

    在执行计划中我们可以看到驱动表 T4 访问一次, 因为驱动表上 T4 结果集的记录数为 10000, 所以 T4 访问了 10000 次, buffers 和 A-time(实际执行时间) 都比较高.

    三. nested loop 优化

    在 nested loop 被驱动表上的连接列上 (T4 表的 t3_id 列) 建立索引

     1 SQL> CREATE INDEX t4_t3_id ON t4(t3_id);
     2 
     3 Index created.
     4 
     5 SQL> select /*+ leading(t3) use_nl(t4) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
     6 
     7 10 rows selected.
     8 
     9 SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
    10 
    11 PLAN_TABLE_OUTPUT
    12 ------------------------------------------------------------------------------------------------------------------------------------
    13 SQL_ID 89hnfwqakjghg, child number 0
    14 -------------------------------------
    15 select /*+ leading(t3) use_nl(t4) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
    16 
    17 Plan hash value: 2039660043
    18 
    19 ------------------------------------------------------------------------------------------------------------
    20 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
    21 ------------------------------------------------------------------------------------------------------------
    22 | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 29 | 1 |
    23 | 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 29 | 1 |
    24 | 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 19 | 1 |
    25 |* 3 | TABLE ACCESS FULL | T3 | 1 | 1 | 1 |00:00:00.01 | 16 | 0 |
    26 |* 4 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 | 1 |
    27 | 5 | TABLE ACCESS BY INDEX ROWID| T4 | 10 | 10 | 10 |00:00:00.01 | 10 | 0 |
    28 ------------------------------------------------------------------------------------------------------------
    29 Predicate Information (identified by operation id):
    30 ---------------------------------------------------
    31 3 - filter("T3"."N"=1100)
    32 4 - access("T3"."ID"="T4"."T3_ID")

     在执行计划中可以看到在被驱动表上的连接列上加上索引后, buffer 从 121 下降到了 29

    在驱动表的谓词条件列上 (T3 表的 n 列) 加上索引

    SQL> create index t3_n on t3(n);
    
    Index created.
    
    SQL> select /*+ leading(t3) use_nl(t4) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
    
    10 rows selected.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
    
    PLAN_TABLE_OUTPUT
    -------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID 89hnfwqakjghg, child number 0
    -------------------------------------
    select /*+ leading(t3) use_nl(t4) */ * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
    
    Plan hash value: 2304842513
    
    -------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
    -------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 17 | 1 |
    | 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 17 | 1 |
    | 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 7 | 1 |
    | 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 1 | 1 |00:00:00.01 | 4 | 1 |
    |* 4 | INDEX RANGE SCAN | T3_N | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
    |* 5 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 | 0 |
    | 6 | TABLE ACCESS BY INDEX ROWID | T4 | 10 | 10 | 10 |00:00:00.01 | 10 | 0 |
    -------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    4 - access("T3"."N"=1100)
    5 - access("T3"."ID"="T4"."T3_ID")
    

      在执行计划中可以看到在驱动表上的谓词条件列上加上索引后, buffer 从 29 继续下降到了 17

    四. 小结

    由此可见, 在 sql 调优时如果遇到表的连接方式是 nested loop:

    首先,要确保结果集小的表为驱动表,结果集多的表为被驱动表。这不意味着记录多的表不能作为驱动表, 只要通过谓词条件过滤后得到的结果集比较小,也可以作为驱动表。

    其次,在驱动表的谓词条件列以及被驱动表的连接列上加上索引,能够显著的提高执行性能。

    最后,如果要查询的列都在索引中,避免回表查询列信息时,又将进一步提高执行性能。

    https://blog.csdn.net/dataminer_2007/article/details/41826915

  • 相关阅读:
    KubeCon 2020 演讲集锦|《阿里巴巴云原生技术与实践 13 讲》开放下载
    突围数字化转型,让特步同比增长24.8%的全渠道中台
    阿里云飞天大数据产品价值解读——《一站式高质量搜索开放搜索》
    高德AR驾车导航解决方案
    我在阿里写代码学会的六件事
    送外卖也要“黑科技”?阿里移动感知技术应用揭秘
    阿里云机器学习怎么玩?这本新手入门指南揭秘了!
    用户自定义类型03 零基础入门学习Delphi33
    用户自定义类型03 零基础入门学习Delphi33
    用户自定义类型01 零基础入门学习Delphi31
  • 原文地址:https://www.cnblogs.com/elontian/p/9483072.html
Copyright © 2011-2022 走看看