zoukankan      html  css  js  c++  java
  • 实验:小表驱动大表 速度比较

    SQL性能优化的侧重点之一就是用小表驱动大表,今天我做了一次实验。

    硬件环境:T440p

    软件环境:Win10

    实验数据库:Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

    实验表:bigtable含id,name,score,createtime,有百万数据;smalltable含id,name,createtime三个字段,有十万数据。

    这些表是如何建表插值的请参考:https://www.cnblogs.com/xiandedanteng/p/12151536.html

    需求:先从小表中查出姓名以张开头的姓名,再与大表进行半连接。这样是为了模拟《SQL优化核心思想》(罗炳森黄超等著)P187的内容。否则用select * from bigtable where name like '张%'就好了。

    目的:对比常规方案和优化后的小表驱动大表方案的耗时;

    常规方案:

    select * from bigtable where name in (select name from smalltable where name like '张%')

    小表驱动大表方案:

    select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big 

    where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%')

    比较结果:

    # 常规方案耗时 小表驱动大表方案耗时
    1 00: 00: 04.81 00: 00: 03.54
    2 00: 00: 04.04 00: 00: 03.50
    3 00: 00: 04.06 00: 00: 03.47

    可以看去小表驱动大表的确实是快些,但优势还是不明显,下面让我把大表扩大些。

    我使用 以下SQL将bigtable的记录数扩成了两百万,这已经是我可怜的T440p能接受一次性插入的最大数据量了。

     Insert into bigtable
     select rownum,dbms_random.string('*',dbms_random.value(6,20)),dbms_random.value(0,20),sysdate from dual
     connect by level<=2000000
     order by dbms_random.random

    比较结果如下:

    # 常规方案耗时 小表驱动大表方案耗时
    1 00: 00: 07.29 00: 00: 06.75
    2 00: 00: 07.80 00: 00: 06.81
    3 00: 00: 07.37 00: 00: 06.87

    优势更明显些了。

    第二次实验再次确认了小表驱动大表的优势。

    第三次实验

    限于bigtable总量的限制,这回我把大表里含张姓的数据调多了,具体看看:

    SQL> select count(*) from smalltable where name like '张%';
    
      COUNT(*)
    ----------
           212
    
    已用时间:  00: 00: 00.03
    
    SQL> select count(*) from bigtable where name like '张%';
    
      COUNT(*)
    ----------
         48330
    
    已用时间:  00: 00: 00.12

    小表张姓数据是两百来条,大表是四万八千条。

    再比三次:

    # 常规方案 小表驱动大表方案
    1 00: 01: 16.40 00: 01: 12.54
    2 00: 01: 17.14 00: 01: 15.60
    3 00: 01: 15.73 00: 01: 15.30

    这把随着数据量的增多,优势反而越来越微弱了!!

    忽然想起小表驱动大表后面还有一句,大表走索引,再给大表的name字段加上索引试试:

    SQL> create index bigtable_name_index on bigtable(name);
    
    索引已创建。
    
    已用时间:  00: 00: 06.02

    再比三次:

    # 常规方案 小表驱动大表方案
    1 00: 01: 15.62 00: 01: 13.59
    2 00: 01: 16.27 00: 01: 16.24
    3 00: 01: 17.50 00: 01: 16.58

    让人晕菜的结果出来了,原以为加了索引两个方案都快了,而且驱动比常规方案优势明显,结果都没有。估计数据库还是走的全表查询。

    让我们看看select * from bigtable where name in (select name from smalltable where name like '张%')的执行计划:

    SQL> set autotrace trace exp
    SQL> select * from bigtable where name in (select name from smalltable where name like '张%');
    已用时间:  00: 00: 00.00
    
    执行计划
    ----------------------------------------------------------
    Plan hash value: 4026363122
    
    ----------------------------------------------------------------------------------------------------
    | Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |                     |   311 | 50693 |   446   (1)| 00:00:06 |
    |   1 |  NESTED LOOPS                |                     |       |       |            |          |
    |   2 |   NESTED LOOPS               |                     |   311 | 50693 |   446   (1)| 00:00:06 |
    |   3 |    SORT UNIQUE               |                     |   178 | 11036 |   172   (2)| 00:00:03 |
    |*  4 |     TABLE ACCESS FULL        | SMALLTABLE          |   178 | 11036 |   172   (2)| 00:00:03 |
    |*  5 |    INDEX RANGE SCAN          | BIGTABLE_NAME_INDEX |     3 |       |     2   (0)| 00:00:01 |
    |   6 |   TABLE ACCESS BY INDEX ROWID| BIGTABLE            |     2 |   202 |     5   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - filter("NAME" LIKE U'5F20%')
       5 - access("NAME"="NAME")
           filter("NAME" LIKE U'5F20%')
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)

    select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%')的执行计划:

    SQL> select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big 

    where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%'); 已用时间: 00: 00: 00.00 执行计划 ---------------------------------------------------------- Plan hash value: 4026363122 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 311 | 50693 | 446 (1)| 00:00:06 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 311 | 50693 | 446 (1)| 00:00:06 | | 3 | SORT UNIQUE | | 178 | 11036 | 172 (2)| 00:00:03 | |* 4 | TABLE ACCESS FULL | SMALLTABLE | 178 | 11036 | 172 (2)| 00:00:03 | |* 5 | INDEX RANGE SCAN | BIGTABLE_NAME_INDEX | 3 | | 2 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID| BIGTABLE | 2 | 202 | 5 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("NAME" LIKE U'5F20%') 5 - access("NAME"="NAME") filter("NAME" LIKE U'5F20%') Note ----- - dynamic sampling used for this statement (level=2)

    最后我们发现执行计划是近乎一致的,看来是Oracle内部将普通方案进行优化的结果,优化的方向就是小表驱动大表。

    如果我强制让大表驱动,那么差别不就出来了?

    SQL> select /*+ leading(big) */ * from bigtable big where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%');
    已用时间:  00: 00: 00.01
    
    执行计划
    ----------------------------------------------------------
    Plan hash value: 3046684681
    
    -----------------------------------------------------------------------------------------
    | Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |            |   312 | 50856 |       |  4297   (2)| 00:00:52 |
    |*  1 |  HASH JOIN SEMI    |            |   312 | 50856 |  4920K|  4297   (2)| 00:00:52 |
    |*  2 |   TABLE ACCESS FULL| BIGTABLE   | 44556 |  4394K|       |  3883   (2)| 00:00:47 |
    |*  3 |   TABLE ACCESS FULL| SMALLTABLE |   178 | 11036 |       |   172   (2)| 00:00:03 |
    -----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("NAME"="NAME")
       2 - filter("NAME" LIKE U'5F20%')
       3 - filter("NAME" LIKE U'5F20%')
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)

    这一步oralce让大表先筛选一遍,再是小表筛选一遍,两者再做哈希半连接。

    看这把它要执行多少时间:

    已选择48330行。
    
    已用时间: 00: 00: 09.75

    这比小表驱动大表还要快!

    再看看终极 sql要跑多久:

    select * from bigtable where name like '张%'

    已选择48330行。
    
    已用时间: 00: 00: 09.94

    看来数据不够复杂,Oracle内部优化器轻易找到了最优方案,这次实验没有达到预期效果。

    --END-- 2020年1月5日10点28分

  • 相关阅读:
    Vue-嵌套路由
    Vue-详解设置路由导航的两种方法: <router-link :to="..."> 和router.push(...)
    Python
    windows和linux下 Python2,Python3 的环境及安装
    Python那点事
    Linux
    Linux
    Django
    Redis
    Django
  • 原文地址:https://www.cnblogs.com/heyang78/p/12151584.html
Copyright © 2011-2022 走看看