zoukankan      html  css  js  c++  java
  • [20181130]如何猜测那些值存在hash冲突.txt

    [20181130]如何猜测那些值存在hash冲突.txt

    --//今年6月份开始kerrycode的1个帖子提到子查询结果缓存在哈希表中情况:
    --//链接:http://www.cnblogs.com/kerrycode/p/9099507.html,摘要:
    通俗来将,当使用标量子查询的时候,ORACLE会将子查询结果缓存在哈希表中, 如果后续的记录出现同样的值,优化器通过缓存在哈希
    表中的值,判断重复值不用重复调用函数,直接使用上次计算结果即可。从而减少调用函数次数,从而达到优化性能的效果。另外在
    ORACLE 10和11中, 哈希表只包含了255个Buckets,也就是说它能存储255个不同值,如果超过这个范围,就会出现散列冲突,那些出现
    散列冲突的值就会重复调用函数,即便如此,依然能达到大幅改善性能的效果。

    --//我当时就非常想从作者了解"哈希表只包含了255个Buckets",这个观点的出处.kerrycode给了我一个链接:
    https://blogs.oracle.com/oraclemagazine/on-caching-and-evangelizing-sql

    Oracle Database will use this hash table to remember the scalar subquery and the inputs to it—just :DEPTNO in this case
    —and the output from it. At the beginning of every query execution, this cache is empty, but suppose you run the query
    and the first PROJECTS row you retrieve has a DEPTNO value of 10. Oracle Database will assign the number 10 to a hash
    value between 1 and 255 (the size of the hash table cache in Oracle Database 10g and Oracle Database 11g currently) and
    will look in that hash table slot to see if the answer exists. In this case, it will not, so Oracle Database must run
    the scalar subquery with the input of 10 to get the answer. If that answer (count) is 42, the hash table may look
    something like this:

    Select count(*) from emp where emp.deptno = :deptno
    :deptno     Count(*)

    You'll have saved the DEPTNO value of 10 and the answer (count) of 42 in some slot—probably not the first or last slot,
    but whatever slot the hash value 10 is assigned to. Now suppose the second row you get back from the PROJECTS table
    includes a DEPTNO value of 20. Oracle Database will again look in the hash table after assigning the value 20, and it
    will discover "no result in the cache yet." So it will run the scalar subquery, get the result, and put it into the hash
    table cache. Now the cache may look like this:

    Select count(*) from emp where emp.deptno = :deptno
    :deptno     Count(*)
    Select count(*) from emp where emp.deptno = :deptno
    :deptno     Count(*)
    …     …
    10     42

    Now suppose the query returns a third row and it again includes a DEPTNO value of 10. This time, Oracle Database will
    see DEPTNO = 10, find that it already has that value in the hash table cache, and will simply return 42 from the cache
    instead of executing the scalar subquery. In fact, it will never have to run that scalar subquery for the DEPTNO values
    of 10 or 20 again for that query—it will already have the answer.

    What happens if the number of unique DEPTNO values exceeds the size of the hash table? What if there are more than 255
    values? Or, more generally, if more than one DEPTNO value is assigned to the same slot in the hash table, what happens
    in a hash collision?

    The answer is the same for all these questions and is rather simple: Oracle Database will not be able to cache the
    second or nth value to that slot in the hash table. For example, what if the third row returned by the query contains
    the DEPTNO = 30 value? Further, suppose that DEPTNO = 30 is to be assigned to exactly the same hash table slot as DEPTNO
    = 10. The database won't be able to effectively cache DEPTNO = 30 in this case—the value will never make it into the
    hash table. It will, however, be "partially cached." Oracle Database still has the hash table with all the previous
    executions, but it also keeps the last scalar subquery result it had "next to" the hash table. That is, if the fourth
    row also includes a DEPTNO = 30 value, Oracle Database will discover that the result is not in the hash table but is
    "next to" the hash table, because the last time it ran the scalar subquery, it was run with an input of 30. On the other
    hand, if the fourth row includes a DEPTNO = 40 value, Oracle Database will run the scalar subquery with the DEPTNO = 40
    value (because it hasn't seen that value yet during this query execution) and overwrite the DEPTNO = 30 result. The next
    time Oracle Database sees DEPTNO = 30 in the result set, it'll have to run that scalar subquery again.

    --//我自己开始瞎尝试各种方法验证hash buckets是否是255.我开始先入为主,认为就是255(或者256),经历许多混乱,最后kerrycode给我
    --//一个测试方法,链接如下:
    http://blog.itpub.net/267265/viewspace-2156702/
    http://www.cnblogs.com/kerrycode/p/9223093.html

    --//按照这个方法很容易验证hash buckets大小,11.2.0.4是1024,10.2.0.4是512,12.1.0.1是1024.
    --//我想起开始测试时,75与48存在冲突的情况,当时我没有想到这么靠前的值存在冲突,为了验证我几乎是1个1个尝试.
    --//因为你根本不知道oracle的算法.
    --//昨天看https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/,验证为什么4与432存在冲突.

    1.环境:
    SCOTT@book> @ &r/ver1
    PORT_STRING                    VERSION        BANNER
    ------------------------------ -------------- --------------------------------------------------------------------------------
    x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    create or replace function f( x in varchar2 ) return number
    as
    begin
            dbms_application_info.set_client_info(userenv('client_info')+1 );
            return length(x);
    end;
    /

    SCOTT@book> create table t as select rownum id1,mod(rownum-1,10000)+1 id2 from dual connect by level<=20000;
    Table created.

    SCOTT@book> create table t1 ( a number ,b number);
    Table created.
    --//字段a 记录调用函数次数.


    2.建立测试脚本:
    --//建立脚本cz.txt
    exec dbms_application_info.set_client_info(0);
    set term off
    exec :x := &&1;
    select count(distinct f_id2) from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (&&2,:x ));
    set term on
    insert into t1 values (userenv('client_info') ,:x) ;
    commit ;

    --//建立shell脚本cz.sh:
    #! /bin/bash
    sqlplus -s -l scott/book <<EOF >> hz.txt
    variable x number;
    $(seq 500 | xargs -I{} echo @cz.txt {} $1)
    quit
    EOF

    3.测试:
    --//执行脚本cz.sh:
    $ . cz.sh 4

    SCOTT@book> select * from t1 where a<>2;
             A          B
    ---------- ----------
             1          4
             3        432
    --//可以发现4,432存在冲突.函数调用了3次.

    SCOTT@book> delete t1;
    500 rows deleted.

    SCOTT@book> commit ;
    Commit complete.

    --//验证1与那个值存在冲突.

    $ . cz.sh 1

    SCOTT@book> select * from t1 where a<>2;
             A          B
    ---------- ----------
             3        484
             1          1

    --//可以验证1与484存在hash冲突.

    4.再拿链接例子做测试:
    --//链接:https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/
    SCOTT@book> update emp set dept_no=484 where dept_no=432;
    1 row updated.

    SCOTT@book> commit ;
    Commit complete.

    SCOTT@book> alter session set statistics_level = all;
    Session altered.

    select
            /*+ gather_plan_statistics post-shrink  */
            count(*)
    from    (
            select  /*+ no_merge */
                    outer.*
            from emp outer
            where outer.sal >
                    (
                            select /*+ no_unnest */ avg(inner.sal)
                            from emp inner
                            where inner.dept_no = outer.dept_no
                    )
            )
    ;

      COUNT(*)
    ----------
          9498

    SCOTT@book> @ dpc '' ''
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  gx7xb7rhfd2zf, child number 0
    -------------------------------------
    select         /*+ gather_plan_statistics post-shrink  */
    count(*) from    (         select  /*+ no_merge */
    outer.*         from emp outer         where outer.sal >
     (                         select /*+ no_unnest */ avg(inner.sal)
                      from emp inner                         where
    inner.dept_no = outer.dept_no                 )         )

    Plan hash value: 322796046

    ------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
    ------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT      |      |      1 |        |       |   569 (100)|          |      1 |00:00:03.43 |     783K|
    |   1 |  SORT AGGREGATE       |      |      1 |      1 |       |            |          |      1 |00:00:03.43 |     783K|
    |   2 |   VIEW                |      |      1 |    143 |       |   569   (1)| 00:00:07 |   9498 |00:00:03.42 |     783K|
    |*  3 |    FILTER             |      |      1 |        |       |            |          |   9498 |00:00:03.42 |     783K|
    |   4 |     TABLE ACCESS FULL | EMP  |      1 |  20001 |   156K|    71   (0)| 00:00:01 |  19001 |00:00:00.01 |     247 |
    |   5 |     SORT AGGREGATE    |      |   3173 |      1 |     8 |            |          |   3173 |00:00:03.41 |     783K|
    |*  6 |      TABLE ACCESS FULL| EMP  |   3173 |   2857 | 22856 |    71   (0)| 00:00:01 |     10M|00:00:02.71 |     783K|
    ------------------------------------------------------------------------------------------------------------------------
    --//循环3173.

    SCOTT@book> select dept_no,count(*) from emp group by dept_no order by 1;
       DEPT_NO   COUNT(*)
    ---------- ----------
             0       3167
             1       3167
             2       3167
             3       3166
             4       3166
             5       3167
           484          1
    7 rows selected.
    --//dept_no=1出现hash冲突.
    --//dept_no=484 循环1次
    --//dept_no=0   循环1次
    --//dept_no=1   循环3167次
    --//dept_no=2   循环1次
    --//dept_no=3   循环1次
    --//dept_no=4   循环1次
    --//dept_no=5   循环1次

    --//这样累加: 1+1+3167+1+1+1+1 = 3173,这样就相互验证了.

    4.我上面的测试纯粹是蛮力测试,改写为PL/SQL脚本看看,PL/sql确实不熟练....

    SCOTT@book> create table t2 ( a number ,b number,c number);
    Table created.
    --//字段a 记录调用函数次数.

    --//脚本cy.txt
    declare
    x number;
      begin
       for i in 1..10000 loop
         dbms_application_info.set_client_info(0);
         select count(distinct f_id2) into x from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (i, &&1 ) );
         if ( userenv('client_info') =3 ) then  
              insert into t2 values (userenv('client_info') ,i,&&1) ;
              commit ;
              exit;
         END IF;
       end loop;
    end;
    /
    --//我加入发现后exit(退出).你可以注解或者取消,这样测试1..10000之间的hash buckets冲突值.

    --//执行如下:
    @ cy.txt 4
    @ cy.txt 1
    @ cy.txt 3
    @ cy.txt 18
    @ cy.txt 48
    @ cy.txt 75

    SCOTT@book> select * from t2;
             A          B          C
    ---------- ---------- ----------
             3        432          4
             3        484          1
             3        735          3
             3       2071         18
             3         75         48
             3         48         75
    6 rows selected.

    --//这样就很快知道那些值会发生hash冲突了.
    --//不知道那位还有什么更好的方法...

  • 相关阅读:
    Azure开发者任务之二:Cloud Service项目添加到ASP.Net Web中
    Azure开发者任务之一:解决Azure Storage Emulator初始化失败
    Configuring a Windows Azure Project
    How to manage the certificates in the PC
    在此声明我的博客已经搬到CSDN 中了
    http://www.cnblogs.com/Sniper-quay/archive/2011/06/22/2086636.html
    杂乱的UDPsocket
    socket下server端支持多客户端并发访问简单实现
    Qt 的udpSocket通信
    正则表达式
  • 原文地址:https://www.cnblogs.com/lfree/p/10043032.html
Copyright © 2011-2022 走看看