zoukankan      html  css  js  c++  java
  • 从员工表和部门表联合查询的不同方式看CBO对不同SQL的优化

    名词解释:

    CBO Cost-Based Optimization 基于代价的优化器

    往下进入正题:

    有一张员工表这样设计:

    create table emp(
    id int,
    name nvarchar2(20),
    deptid int,
    primary key(id));

    可以这样给它塞入三十万条记录:

    insert into emp
    select rownum,
    dbms_random.string('*',dbms_random.value(6,20)),
    dbms_random.value(1,10)
    from dual
    connect by level<300001;

    还有一张部门表这样设计:

    create table dept(
    id int,
    name nvarchar2(20),
    primary key(id));

    这回选择逐条插值:

    insert into dept(id,name) values('1','dev');
    insert into dept(id,name) values('2','prod');
    insert into dept(id,name) values('3','sales');
    insert into dept(id,name) values('4','postsales');
    insert into dept(id,name) values('5','market');
    insert into dept(id,name) values('6','lab');
    insert into dept(id,name) values('7','research');
    insert into dept(id,name) values('8','adv');
    insert into dept(id,name) values('9','hr');
    insert into dept(id,name) values('10','mng');

    需求出来了,要找员工名以AK打头,属于生产prod、销售sales、售后postsales和市场market四个部的员工。

    按我机器上产生的结果,以AK开头的记录是466条,占emp表总数的0.15%;四个部的员工占dept表总数的4成。

    按常规理解,应该是先过滤再连接最高效,但既然是实验,就把能出来正确结果的各种SQL都试试,比较一下。

    满足这个需求的SQL不少,首先便是:

    1.
    select emp.name,dept.name from emp,dept where emp.deptid=dept.id and emp.name like 'AK%' and dept.id in (2,3,4,5)

    这条SQL是让员工表和部门表内联,然后让表连接条件和两个过滤条件统一写到where从句中。

    这是常规解法,效率未纳入考量范围。让我们看看解释计划是怎样的:

    SQL> explain plan for select emp.name,dept.name from emp,dept where emp.deptid=dept.id and emp.name like 'AK%' and dept.id in (2,3,4,5);
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("DEPT"."ID"=2 OR "DEPT"."ID"=3 OR "DEPT"."ID"=4 OR
                  "DEPT"."ID"=5)
       3 - filter("EMP"."NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       - dynamic sampling used for this statement (level=2)
    
    已选择23行。

    从解释计划里,我们发现CBO是让两表先过滤再连接,cost是486. 也就是说这种SQL没考虑效率问题,CBO帮咱们考虑了,并按最优方案进行。

    第二种SQL

    2.

    select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id

    这句SQL知道先主动过滤两表再进行连接,表面上是比第一种要快的,让我们看看解释计划里它会如何表现:

    SQL> explain plan for select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter("NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    我们发现,CBO知道该先过滤再连接,如果SQL这样做了,它也就顺其自然,于是解释计划和上面一样,Cost也是486.

    第三种SQL

    3.

    select a.name,a.dname from (select emp.name,dept.name as dname,dept.id from emp,dept where emp.deptid=dept.id) a where a.name like 'AK%' and a.id in (2,3,4,5)

    这种SQL是不管数据多少故意硬让emp和dept两表连接,形成一个逻辑大表,再在出来的结果集里筛选。

    理论上,我们知道这是最费力的方案,但是,Oracle有CBO,它会对SQL进行优化,看解释计划暴露的CBO意图如何?

    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   486   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   483   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("DEPT"."ID"=2 OR "DEPT"."ID"=3 OR "DEPT"."ID"=4 OR
                  "DEPT"."ID"=5)
       3 - filter("EMP"."NAME" LIKE U'AK%' AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       - dynamic sampling used for this statement (level=2)
    
    已选择23行。

    看,即使SQL里强制先连接表再过滤,知道最佳方案是怎样的CBO是不会舍近求远的,它还是选择最佳路径前进,所以cost依旧是486.

    看到这里,你会知道过于纠结一些SQL的写法无意义,CBO会知道如何优化,就像java虚拟机对字符串+的优化;另一方面,SQL跑得顺溜也未必是SQL写得正确高效,或许是CBO在背后默默奉献。

    这个例子只是简单情况,所以CBO能一直坚持最短路径,但复杂情况下未必合理,我们还是要秉承先过滤后连接的方针不变,就像第二种方案里的SQL那样写。

    如果给emp表的name单列加索引会是什么效果呢?cost是增长还是有效降低?。

    让我们先加索引,再对第二SQL(先过滤后连接)查看解释计划。

    SQL> create index idx_emp_name on emp(name);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where name like 'AK%') a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 262175135
    
    --------------------------------------------------------------------------------
    -------------
    
    | Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU
    )| Time     |
    
    --------------------------------------------------------------------------------
    -------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |              |    45 |  3150 |   467   (1
    )| 00:00:06 |
    
    |*  1 |  HASH JOIN                   |              |    45 |  3150 |   467   (1
    )| 00:00:06 |
    
    |*  2 |   TABLE ACCESS FULL          | DEPT         |     4 |   140 |     3   (0
    )| 00:00:01 |
    
    |*  3 |   TABLE ACCESS BY INDEX ROWID| EMP          |   111 |  3885 |   463   (0
    )| 00:00:06 |
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    
    |*  4 |    INDEX RANGE SCAN          | IDX_EMP_NAME |   362 |       |     5   (0
    )| 00:00:01 |
    
    --------------------------------------------------------------------------------
    -------------
    
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter("EMP"."DEPTID"=2 OR "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR
                  "EMP"."DEPTID"=5)
       4 - access("NAME" LIKE U'AK%')
           filter("NAME" LIKE U'AK%')
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择25行。

    看来对name单列加索引对半模糊查询的优化效果极其有限。

    下面我们用instr函数取代 like ‘AK%’,看是否有惊喜。

    SQL:

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;

    执行情况:

    SQL> create index idx_emp_name on emp(name);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   484   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter(INSTR("NAME",U'AK')=1 AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))

    Cost反而上升了1,看来对name加单列索引,对查询的改变都很有限。

    下面我们对emp表的name和deptid字段加联合索引,看是否有改善。

    SQL> create index idx_emp_name_deptid on emp(name,deptid);
    
    索引已创建。
    
    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id in (2,3,4,5)) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   487   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   484   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5)
       3 - filter(INSTR("NAME",U'AK')=1 AND ("EMP"."DEPTID"=2 OR
                  "EMP"."DEPTID"=3 OR "EMP"."DEPTID"=4 OR "EMP"."DEPTID"=5))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    结果和对name加单列索引是一样的,cost还上升了1.

    下面我们把dept的in查询改成范围查询,还是用上面创建的联合索引,看看情况如何。

    SQL:

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id>1 and id<6 ) b where a.deptid=b.id;

    执行:

    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id>1 and id<6 ) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   482   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID">1 AND "ID"<6)
       3 - filter("EMP"."DEPTID">1 AND "EMP"."DEPTID"<6 AND
                  INSTR("NAME",U'AK')=1)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    从Cost上看,少了1,是聊胜于无的改善。

    再换一种, 用between and 取代<>, 注意between and是包括上下边界的。

    select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id between 2 and 5 ) b where a.deptid=b.id;

    执行情况:

    SQL> explain plan for select a.name,b.name from (select * from emp where instr(name,'AK')=1) a ,(select * from dept where id between 2 and 5 ) b where a.deptid=b.id;
    
    已解释。
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 615168685
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  1 |  HASH JOIN         |      |    45 |  3150 |   485   (2)| 00:00:06 |
    |*  2 |   TABLE ACCESS FULL| DEPT |     4 |   140 |     3   (0)| 00:00:01 |
    |*  3 |   TABLE ACCESS FULL| EMP  |   111 |  3885 |   482   (2)| 00:00:06 |
    ---------------------------------------------------------------------------
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("EMP"."DEPTID"="DEPT"."ID")
       2 - filter("ID">=2 AND "ID"<=5)
       3 - filter("EMP"."DEPTID">=2 AND "EMP"."DEPTID"<=5 AND
                  INSTR("NAME",U'AK')=1)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    已选择22行。

    从Cost上看,这个方案比大于小于方案没有改进,还是485.

    就这个简单例子而言,CBO的优化已经做到了极限,以致于改进的空间都极其有限了。

    END

  • 相关阅读:
    【转载】怎样使用ZEMAX导出高质量的图像动画
    shell中的单引号,双引号,反引号
    docker容器以非root用户启动应用
    js操作json的基本方法
    页岩油
    shell中使用ssh
    强一致性 弱一致性 最终一致性
    CSV和excel
    workbook sheetname最大长度
    ipvs了解
  • 原文地址:https://www.cnblogs.com/heyang78/p/15221470.html
Copyright © 2011-2022 走看看