zoukankan      html  css  js  c++  java
  • 从员工表和部门表联合查询的不同方式看CBO对不同SQL的优化

    名词解释:

    CBO Cost-Based Optimization 基于代价的优化器

    往下进入正题:

    有一张员工表这样设计:

    create table emp(
    id int,
    name nvarchar2(20),
    deptid int,
    primary key(id));

    可以这样给它塞入三十万条记录:

    insert into emp
    select rownum,
    dbms_random.string('*',dbms_random.value(6,20)),
    dbms_random.value(1,10)
    from dual
    connect by level<300001;

    还有一张部门表这样设计:

    create table dept(
    id int,
    name nvarchar2(20),
    primary key(id));

    这回选择逐条插值:

    insert into dept(id,name) values('1','dev');
    insert into dept(id,name) values('2','prod');
    insert into dept(id,name) values('3','sales');
    insert into dept(id,name) values('4','postsales');
    insert into dept(id,name) values('5','market');
    insert into dept(id,name) values('6','lab');
    insert into dept(id,name) values('7','research');
    insert into dept(id,name) values('8','adv');
    insert into dept(id,name) values('9','hr');
    insert into dept(id,name) values('10','mng');

    需求出来了,要找员工名以AK打头,属于生产prod、销售sales、售后postsales和市场market四个部的员工。

    按我机器上产生的结果,以AK开头的记录是466条,占emp表总数的0.15%;四个部的员工占dept表总数的4成。

    按常规理解,应该是先过滤再连接最高效,但既然是实验,就把能出来正确结果的各种SQL都试试,比较一下。

    满足这个需求的SQL不少,首先便是:

    1.
    select emp.name,dept.name from emp,dept where emp.deptid=dept.id and emp.name like 'AK%' and dept.id in (2,3,4,5)

    这条SQL是让员工表和部门表内联,然后让表连接条件和两个过滤条件统一写到where从句中。

    这是常规解法,效率未纳入考量范围。让我们看看解释计划是怎样的:

    SQL> explain plan for select emp.name,dept.name from emp,dept where emp.deptid=dept.id and emp.name like 'AK%' and dept.id in (2,3,4,5);
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("DEPT"."ID"=2 OR "DEPT"."ID"=3 OR "DEPT"."ID"=4 OR
                  "DEPT"."ID"=5)
       3 - filter("EMP"."NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       - dynamic sampling used for this statement (level=2)
    
    已选择23行。

    从解释计划里,我们发现CBO是让两表先过滤再连接,cost是486. 也就是说这种SQL没考虑效率问题,CBO帮咱们考虑了,并按最优方案进行。

    第二种SQL

    2.

    select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id

    这句SQL知道先主动过滤两表再进行连接,表面上是比第一种要快的,让我们看看解释计划里它会如何表现:

    SQL> explain plan for select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter("NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    我们发现,CBO知道该先过滤再连接,如果SQL这样做了,它也就顺其自然,于是解释计划和上面一样,Cost也是486.

    第三种SQL

    3.

    select a.name,a.dname from (select emp.name,dept.name as dname,dept.id from emp,dept where emp.deptid=dept.id) a where a.name like 'AK%' and a.id in (2,3,4,5)

    这种SQL是不管数据多少故意硬让emp和dept两表连接,形成一个逻辑大表,再在出来的结果集里筛选。

    理论上,我们知道这是最费力的方案,但是,Oracle有CBO,它会对SQL进行优化,看解释计划暴露的CBO意图如何?

    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("DEPT"."ID"=2 OR "DEPT"."ID"=3 OR "DEPT"."ID"=4 OR
                  "DEPT"."ID"=5)
       3 - filter("EMP"."NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       - dynamic sampling used for this statement (level=2)
    
    已选择23行。

    看,即使SQL里强制先连接表再过滤,知道最佳方案是怎样的CBO是不会舍近求远的,它还是选择最佳路径前进,所以cost依旧是486.

    看到这里,你会知道过于纠结一些SQL的写法无意义,CBO会知道如何优化,就像java虚拟机对字符串+的优化;另一方面,SQL跑得顺溜也未必是SQL写得正确高效,或许是CBO在背后默默奉献。

    这个例子只是简单情况,所以CBO能一直坚持最短路径,但复杂情况下未必合理,我们还是要秉承先过滤后连接的方针不变,就像第二种方案里的SQL那样写。

    如果给emp表的name单列加索引会是什么效果呢?cost是增长还是有效降低?。

    让我们先加索引,再对第二SQL(先过滤后连接)查看解释计划。

    SQL> create index idx_emp_name on emp(name);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 262175135
    
    --------------------------------------------------------------------------------
    -------------
    
    | Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU
    )| Time     |
    
    --------------------------------------------------------------------------------
    -------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |              |    45 |  3150 |   467   (1
    )| 00:00:06 |
    
    |*  1 |  HASH JOIN                   |              |    45 |  3150 |   467   (1
    )| 00:00:06 |
    
    |*  2 |   TABLE ACCESS FULL          | DEPT         |     4 |   140 |     3   (0
    )| 00:00:01 |
    
    |*  3 |   TABLE ACCESS BY INDEX ROWID| EMP          |   111 |  3885 |   463   (0
    )| 00:00:06 |
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    
    |*  4 |    INDEX RANGE SCAN          | IDX_EMP_NAME |   362 |       |     5   (0
    )| 00:00:01 |
    
    --------------------------------------------------------------------------------
    -------------
    
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter("EMP"."DEPTID"=2 OR "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR
                  "EMP"."DEPTID"=5)
       4 - access("NAME" LIKE U'AK%')
           filter("NAME" LIKE U'AK%')
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择25行。

    看来对name单列加索引对半模糊查询的优化效果极其有限。

    下面我们用instr函数取代 like ‘AK%’,看是否有惊喜。

    SQL:

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;

    执行情况:

    SQL> create index idx_emp_name on emp(name);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   484   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter(INSTR("NAME",U'AK')=1 AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))

    Cost反而上升了1,看来对name加单列索引,对查询的改变都很有限。

    下面我们对emp表的name和deptid字段加联合索引,看是否有改善。

    SQL> create index idx_emp_name_deptid on emp(name,deptid);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   484   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter(INSTR("NAME",U'AK')=1 AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    结果和对name加单列索引是一样的,cost还上升了1.

    下面我们把dept的in查询改成范围查询,还是用上面创建的联合索引,看看情况如何。

    SQL:

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id>1 and id<6 ) b where a.deptid=b.id;

    执行:

    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id>1 and id<6 ) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   482   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID">1 AND "ID"<6)
       3 - filter("EMP"."DEPTID">1 AND "EMP"."DEPTID"<6 AND
                  INSTR("NAME",U'AK')=1)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    从Cost上看,少了1,是聊胜于无的改善。

    再换一种, 用between and 取代<>, 注意between and是包括上下边界的。

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id between 2 and 5 ) b where a.deptid=b.id;

    执行情况:

    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id between 2 and 5 ) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   482   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID">=2 AND "ID"<=5)
       3 - filter("EMP"."DEPTID">=2 AND "EMP"."DEPTID"<=5 AND
                  INSTR("NAME",U'AK')=1)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    从Cost上看,这个方案比大于小于方案没有改进,还是485.

    就这个简单例子而言,CBO的优化已经做到了极限,以致于改进的空间都极其有限了。

    END

  • 相关阅读:
    Java实现 蓝桥杯 算法训练 画图(暴力)
    Java实现 蓝桥杯 算法训练 画图(暴力)
    Java实现 蓝桥杯 算法训练 相邻数对(暴力)
    Java实现 蓝桥杯 算法训练 相邻数对(暴力)
    Java实现 蓝桥杯 算法训练 相邻数对(暴力)
    Java实现 蓝桥杯 算法训练 Cowboys
    Java实现 蓝桥杯 算法训练 Cowboys
    55. Jump Game
    54. Spiral Matrix
    50. Pow(x, n)
  • 原文地址:https://www.cnblogs.com/heyang78/p/15221470.html
Copyright © 2011-2022 走看看