zoukankan      html  css  js  c++  java
  • Hash Join: Basic Steps

    Joins https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm#TGSQL242

    tidb/index_lookup_hash_join.go at master · pingcap/tidb https://github.com/pingcap/tidb/blob/master/executor/index_lookup_hash_join.go

    Hash Join: Basic Steps

    The optimizer uses the smaller data source to build a hash table on the join key in memory, and then scans the larger table to find the joined rows.

    The basic steps are as follows:

    1. The database performs a full scan of the smaller data set, called the build table, and then applies a hash function to the join key in each row to build a hash table in the PGA.

      In pseudocode, the algorithm might look as follows:

      FOR small_table_row IN (SELECT * FROM small_table)
      LOOP
        slot_number := HASH(small_table_row.join_key);
        INSERT_HASH_TABLE(slot_number,small_table_row);
      END LOOP;
      
    2. The database probes the second data set, called the probe table, using whichever access mechanism has the lowest cost.

      Typically, the database performs a full scan of both the smaller and larger data set. The algorithm in pseudocode might look as follows:

      FOR large_table_row IN (SELECT * FROM large_table)
      LOOP
         slot_number := HASH(large_table_row.join_key);
         small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
         IF small_table_row FOUND
         THEN
            output small_table_row + large_table_row;
         END IF;
      END LOOP;
      

      For each row retrieved from the larger data set, the database does the following:

      1. Applies the same hash function to the join column or columns to calculate the number of the relevant slot in the hash table.

        For example, to probe the hash table for department ID 30, the database applies the hash function to 30, which generates the hash value 4.

      2. Probes the hash table to determine whether rows exists in the slot.

        If no rows exist, then the database processes the next row in the larger data set. If rows exist, then the database proceeds to the next step.

      3. Checks the join column or columns for a match. If a match occurs, then the database either reports the rows or passes them to the next step in the plan, and then processes the next row in the larger data set.

        If multiple rows exist in the hash table slot, the database walks through the linked list of rows, checking each one. For example, if department 30 hashes to slot 4, then the database checks each row until it finds 30.

    Example 9-4 Hash Joins

    An application queries the oe.orders and oe.order_items tables, joining on the order_id column.

    SELECT o.customer_id, l.unit_price * l.quantity
    FROM   orders o, order_items l
    WHERE  l.order_id = o.order_id;
    

    The execution plan is as follows:

    --------------------------------------------------------------------------
    | Id  | Operation            |  Name        | Rows  | Bytes | Cost (%CPU)|
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT     |              |   665 | 13300 |     8  (25)|
    |*  1 |  HASH JOIN           |              |   665 | 13300 |     8  (25)|
    |   2 |   TABLE ACCESS FULL  | ORDERS       |   105 |   840 |     4  (25)|
    |   3 |   TABLE ACCESS FULL  | ORDER_ITEMS  |   665 |  7980 |     4  (25)|
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("L"."ORDER_ID"="O"."ORDER_ID")
    

    Because the orders table is small relative to the order_items table, which is 6 times larger, the database hashes orders. In a hash join, the data set for the build table always appears first in the list of operations (Step 2). In Step 3, the database performs a full scan of the larger order_items later, probing the hash table for each row.

    How Hash Joins Work When the Hash Table Does Not Fit in the PGA

    The database must use a different technique when the hash table does not fit entirely in the PGA. In this case, the database uses a temporary space to hold portions (called partitions) of the hash table, and sometimes portions of the larger table that probes the hash table.

    The basic process is as follows:

    1. The database performs a full scan of the smaller data set, and then builds an array of hash buckets in both the PGA and on disk.

      When the PGA hash area fills up, the database finds the largest partition within the hash table and writes it to temporary space on disk. The database stores any new row that belongs to this on-disk partition on disk, and all other rows in the PGA. Thus, part of the hash table is in memory and part of it on disk.

    2. The database takes a first pass at reading the other data set.

      For each row, the database does the following:

      1. Applies the same hash function to the join column or columns to calculate the number of the relevant hash bucket.

      2. Probes the hash table to determine whether rows exist in the bucket in memory.

        If the hashed value points to a row in memory, then the database completes the join and returns the row. If the value points to a hash partition on disk, however, then the database stores this row in the temporary tablespace, using the same partitioning scheme used for the original data set.

    3. The database reads each on-disk temporary partition one by one

    4. The database joins each partition row to the row in the corresponding on-disk temporary partition.

    Hash Join Controls

    The USE_HASH hint instructs the optimizer to use a hash join when joining two tables together.

  • 相关阅读:
    设计模式学习每日一记(16.模板方法模式)
    lrzsz工具移植交叉编译
    设计模式学习每日一记(17.解释器模式)
    鼠标键盘共享之Synergy工具
    wpf中textblock竖立显示文本
    c#中常会涉及到一些类,List的深度拷贝。
    SQL 2005 多表分页存储过程
    C#中的委托和事件(续)[转]
    制作sitemap与在线生成网站地图并提交
    操作EXcel
  • 原文地址:https://www.cnblogs.com/rsapaper/p/14166244.html
Copyright © 2011-2022 走看看