zoukankan      html  css  js  c++  java
  • [转]对Hash Join的一次优化

    原文地址:[精华] 对Hash Join的一次优化

    前两天解决了一个优化SQL的case,SQL语句如下,big_table为150G大小,small_table很小,9000多条记录,不到1M大小,hash_area_size, sort_area_size均设置足够大,可以进行optimal hash join和memory sort。

    select /*+ leading(b) use_hash(a b) */ distinct a.ID
    from BIG_TABLE a, SMALL_TABLE b
    where (a.category  = b.from_cat or
           a.category2 = b.from_cat) and
           a.site_id  = b.site_id and
           a.sale_end >= sysdate;

    执行计划如下:

    --------------------------------------------------------------------------
    | Id  | Operation            |  Name        | Rows  | Bytes | Cost (%CPU)|
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT     |              |     2 |   174 |    18  (17)|
    |   1 |  SORT UNIQUE         |              |     2 |   174 |    18  (17)|
    |*  2 |   HASH JOIN          |              |     2 |   174 |    17  (12)|
    |   3 |    TABLE ACCESS FULL | SMALL_TABLE  |  1879 | 48854 |    14   (8)|
    |*  4 |    TABLE ACCESS FULL | BIG_TABLE    |     4 |   244 |     3  (34)|
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    --------------------------------------------------- 
       2 - access("A"."SITE_ID"="B"."SITE_ID")
           filter("A"."CATEGORY"="B"."FROM_CAT" OR
                  "A"."CATEGORY2"="B"."FROM_CAT")
       4 - filter("A"."SALE_END">=SYSDATE@!)

    粗略来看,PLAN非常的完美,SQL HINT写的也很到位,小表在内build hash table,大表在外进行probe操作,根据经验来看,整个SQL执行的时间应该和FTS(Full Table Scan) BIG_TABLE的时间差不多。

    但是FTS BIG_TABLE的时间大约是8分钟,而真个SQL执行的时间长达3~4小时。

    那么问题究竟出在哪里?

    FTS时间应该不会有太大变化,那么问题应该在hash join,设置event来trace一下hash join的过程:

    alter session set events '10104 trace name context forever, level 2';
    ### Hash table ###
    # NOTE: The calculated number of rows in non-empty buckets may be smaller
    #       than the true number.
    Number of buckets with   0 rows:      16373
    Number of buckets with   1 rows:          0
    Number of buckets with   2 rows:          0
    Number of buckets with   3 rows:          1
    Number of buckets with   4 rows:          0
    Number of buckets with   5 rows:          0
    Number of buckets with   6 rows:          0
    Number of buckets with   7 rows:          1
    Number of buckets with   8 rows:          0
    Number of buckets with   9 rows:          0
    Number of buckets with between  10 and  19 rows:          1
    Number of buckets with between  20 and  29 rows:          1
    Number of buckets with between  30 and  39 rows:          3
    Number of buckets with between  40 and  49 rows:          0
    Number of buckets with between  50 and  59 rows:          0
    Number of buckets with between  60 and  69 rows:          0
    Number of buckets with between  70 and  79 rows:          0
    Number of buckets with between  80 and  89 rows:          0
    Number of buckets with between  90 and  99 rows:          0
    Number of buckets with 100 or more rows:          4
    ### Hash table overall statistics ###
    Total buckets: 16384 Empty buckets: 16373 Non-empty buckets: 11
    Total number of rows: 9232
    Maximum number of rows in a bucket: 2531
    Average number of rows in non-empty buckets: 839.272705

    仔细看,在一个bucket中最多的行数竟然有2531行,因为bucket中是一个链表的结构,所以这几千行都是串在一个链表上。
    由这一点想到这个Hash Table所依赖的hash key的distinct value可能太少,重复值太多。否则不应该会有这么多行在同一个bucket里面。

    因为Join条件里面有两个列from_cat和site_id,穷举法有三种情况:

    SQL> select site_id,from_cat,count(*) from SMALL_TABLE group by site_id,from_cat having count(*)>100;
    
    no rows selected
    
    2. Build hash table based on (from_cat):
    
    SQL> select from_cat,count(*) from SMALL_TABLE group by from_cat having count(*)>100;
    
    no rows selected
    
    3. Build hash table based on (site_id):
    
    SQL> select site_id,count(*) from SMALL_TABLE group by site_id having count(*)>100;
    
       SITE_ID   COUNT(*)
    ---------- ----------
             0       2531
             2       2527
           146       1490
           210       2526

    到这里可以发现,基于site_id这种情况和trace file中这两行很相符:

    Number of buckets with 100 or more rows: 4
    Maximum number of rows in a bucket: 2531

    注:这判断过程可以从执行计划的“Predicate Information”部分看出:

    access("A"."SITE_ID"="B"."SITE_ID")

    所以推断这个hash table是基于site_id而建的,而Big_Table中大量的行site_id=0,都落在这个linked list最长的bucket中,而大部分行都会扫描完整个链表而最后被丢弃掉,所以这个Hash Join的操作效率非常差,几乎变为了Nest Loop操作。

    找到了根本原因,问题也就迎刃而解了。

    理想状况下,hash table应当建立于(site_id,from_cat)上,那么问题肯定出在这个OR上,把OR用UNION改写:

    select /*+ leading(b) use_hash(a b) */ distinct a.ID
    from BIG_TABLE a, SMALL_TABLE b
    where  a.category  = b.from_cat and
           a.site_id  = b.site_id and
           a.sale_end >= sysdate
    UNION
    select /*+ leading(b) use_hash(a b) */ distinct a.ID
    from BIG_TABLE a, SMALL_TABLE b
    where  a.category2 = b.from_cat and
           a.site_id  = b.site_id and
           a.sale_end >= sysdate;
    --------------------------------------------------------------------------
    | Id  | Operation            |  Name        | Rows  | Bytes | Cost (%CPU)|
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT     |              |     2 |   148 |    36  (59)|
    |   1 |  SORT UNIQUE         |              |     2 |   148 |    36  (59)|
    |   2 |   UNION-ALL          |              |       |       |            |
    |*  3 |    HASH JOIN         |              |     1 |    74 |    17  (12)|
    |   4 |     TABLE ACCESS FULL| SMALL_TABLE  |  1879 | 48854 |    14   (8)|
    |*  5 |     TABLE ACCESS FULL| BIG_TABLE    |     4 |   192 |     3  (34)|
    |*  6 |    HASH JOIN         |              |     1 |    74 |    17  (12)|
    |   7 |     TABLE ACCESS FULL| SMALL_TABLE  |  1879 | 48854 |    14   (8)|
    |*  8 |     TABLE ACCESS FULL| BIG_TABLE    |     4 |   192 |     3  (34)|
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    --------------------------------------------------- 
       3 - access("A"."CATEGORY"="B"."FROM_CAT" AND
                  "A"."SITE_ID"="B"."SITE_ID")
       5 - filter("A"."SALE_END">=SYSDATE@!)
       6 - access("A"."CATEGORY2"="B"."FROM_CAT" AND
                  "A"."SITE_ID"="B"."SITE_ID")
       8 - filter("A"."SALE_END">=SYSDATE@!)

    初看这个PLAN好像不如第一个PLAN,因为执行了两次BIG_TABLE的FTS,但是让我们在来看看HASH TABLE的结构

    ### Hash table ###
    # NOTE: The calculated number of rows in non-empty buckets may be smaller
    #       than the true number.
    Number of buckets with   0 rows:       9306
    Number of buckets with   1 rows:       5310
    Number of buckets with   2 rows:       1436
    Number of buckets with   3 rows:        285
    Number of buckets with   4 rows:         43
    Number of buckets with   5 rows:          4
    Number of buckets with   6 rows:          0
    Number of buckets with   7 rows:          0
    Number of buckets with   8 rows:          0
    Number of buckets with   9 rows:          0
    Number of buckets with between  10 and  19 rows:          0
    Number of buckets with between  20 and  29 rows:          0
    Number of buckets with between  30 and  39 rows:          0
    Number of buckets with between  40 and  49 rows:          0
    Number of buckets with between  50 and  59 rows:          0
    Number of buckets with between  60 and  69 rows:          0
    Number of buckets with between  70 and  79 rows:          0
    Number of buckets with between  80 and  89 rows:          0
    Number of buckets with between  90 and  99 rows:          0
    Number of buckets with 100 or more rows:          0
    ### Hash table overall statistics ###
    Total buckets: 16384 Empty buckets: 9306 Non-empty buckets: 7078
    Total number of rows: 9232
    Maximum number of rows in a bucket: 5
    Average number of rows in non-empty buckets: 1.304323

    这就是我们所需要的Hash Table,最长的链表只有五行数据。

    整个SQL的执行时间从三四个小时缩短为16分钟,大大超出了developer的预期。

    这个SQL单纯从PLAN上很难看出问题所在,需要了解Hash Join的机制,进行更深一步的分析。

  • 相关阅读:
    进程池,线程池,协程,gevent模块,协程实现单线程服务端与多线程客户端通信,IO模型
    线程相关 GIL queue event 死锁与递归锁 信号量l
    生产者消费者模型 线程相关
    进程的开启方式 进程的join方法 进程间的内存隔离 其他相关方法 守护进程 互斥锁
    udp协议 及相关 利用tcp上传文件 socketserver服务
    socket套接字 tcp协议下的粘包处理
    常用模块的完善 random shutil shevle 三流 logging
    day 29 元类
    Django入门
    MySQL多表查询
  • 原文地址:https://www.cnblogs.com/killkill/p/1782889.html
Copyright © 2011-2022 走看看