zoukankan      html  css  js  c++  java
  • About the Implementation of .Net Framework's HashTable

    From Rotor(Shared Source CLI) :
    /*
    77:         Implementation Notes:
    78:         Dictionary was copied from Hashtable's source - any bug fixes here
    79:         probably need to be made to Dictionary as well.
    80:     
    81:         This Hashtable uses double hashing. There are hashsize buckets in the
    82:         table, and each bucket can contain 0 or 1 element. We a bit to mark
    83:         whether there's been a collision when we inserted multiple elements
    84:         (ie, an inserted item was hashed at least a second time and we probed
    85:         this bucket, but it was already in use). Using the collision bit, we
    86:         can terminate lookups & removes for elements that aren't in the hash
    87:         table more quickly. We steal the most significant bit from the hash code
    88:         to store the collision bit.
    89:
    90:         Our hash function is of the following form:
    91:     
    92:         h(key, n) = h1(key) + n*h2(key)
    93:     
    94:         where n is the number of times we've hit a collided bucket and rehashed
    95:         (on this particular lookup). Here are our hash functions:
    96:     
    97:         h1(key) = GetHash(key); // default implementation calls key.GetHashCode();
    98:         h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1));

    99:     
    100:         The h1 can return any number. h2 must return a number between 1 and
    101:         hashsize - 1 that is relatively prime to hashsize (not a problem if
    102:         hashsize is prime). (Knuth's Art of Computer Programming, Vol. 3, p. 528-9)
    103:         If this is true, then we are guaranteed to visit every bucket in exactly
    104:         hashsize probes, since the least common multiple of hashsize and h2(key)
    105:         will be hashsize * h2(key). (This is the first number where adding h2 to
    106:         h1 mod hashsize will be 0 and we will search the same bucket twice).
    107:         
    108:         We previously used a different h2(key, n) that was not constant. That is a
    109:         horrifically bad idea, unless you can prove that series will never produce
    110:         any identical numbers that overlap when you mod them by hashsize, for all
    111:         subranges from i to i+hashsize, for all i. It's not worth investigating,
    112:         since there was no clear benefit from using that hash function, and it was
    113:         broken.
    114:     
    115:         For efficiency reasons, we've implemented this by storing h1 and h2 in a
    116:         temporary, and setting a variable called seed equal to h1. We do a probe,
    117:         and if we collided, we simply add h2 to seed each time through the loop.
    118:     
    119:         A good test for h2() is to subclass Hashtable, provide your own implementation
    120:         of GetHash() that returns a constant, then add many items to the hash table.
    121:         Make sure Count equals the number of items you inserted.
    122:
    123:         Note that when we remove an item from the hash table, we set the key
    124:         equal to buckets, if there was a collision in this bucket. Otherwise
    125:         we'd either wipe out the collision bit, or we'd still have an item in
    126:         the hash table.
    127:         */

    The Insert Method of HashTable:
    718:         // Inserts an entry into this hashtable. This method is called from the Set
    719:         // and Add methods. If the add parameter is true and the given key already
    720:         // exists in the hashtable, an exception is thrown.
    721:         private void Insert (Object key, Object nvalue, bool add) {
    722:             if (key == null) {
    723:                 throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
    724:             }
    725:             if (count >= loadsize)
    726:                 expand();
    727:             uint seed;
    728:             uint incr;
    729:             // Assume we only have one thread writing concurrently. Modify
    730:             // buckets to contain new data, as long as we insert in the right order.
    731:             uint hashcode = InitHash(key, buckets.Length, out seed, out incr);
    732:             int ntry = 0;
    733:             int emptySlotNumber = -1; // We use the empty slot number to cache the first empty slot. We chose to reuse slots
    734:                                     // create by remove that have the collision bit set over using up new slots.
    735:             
    736:             do {
    737:                 int bucketNumber = (int) (seed % (uint)buckets.Length);
    738:
    739:                 if (emptySlotNumber == -1 && (buckets[bucketNumber].key == buckets) && (buckets[bucketNumber].hash_coll < 0))//(((buckets[bucketNumber].hash_coll & unchecked(0x80000000))!=0)))
    740:                     emptySlotNumber = bucketNumber;
    741:
    742:                 //We need to check if the collision bit is set because we have the possibility where the first
    743:                 //item in the hash-chain has been deleted.
    744:                 if ((buckets[bucketNumber].key == null) ||
    745:                     (buckets[bucketNumber].key == buckets && ((buckets[bucketNumber].hash_coll & unchecked(0x80000000))==0))) {
    746:                     if (emptySlotNumber != -1) // Reuse slot
    747:                         bucketNumber = emptySlotNumber;
    748:                 
    749:                     // We pretty much have to insert in this order. Don't set hash
    750:                     // code until the value & key are set appropriately.
    751:                     buckets[bucketNumber].val = nvalue;
    752:                     buckets[bucketNumber].key = key;
    753:                     buckets[bucketNumber].hash_coll |= (int) hashcode;
    754:                     count++;
    755:                     version++;
    756:                     return;
    757:                 }
    758:                 if (((buckets[bucketNumber].hash_coll & 0x7FFFFFFF) == hashcode) &&
    759:                     KeyEquals (key, buckets[bucketNumber].key)) {
    760:                     if (add) {
    761:                         throw new ArgumentException(Environment.GetResourceString("Argument_AddingDuplicate__", buckets[bucketNumber].key, key));
    762:                     }
    763:                     buckets[bucketNumber].val = nvalue;
    764:                     version++;
    765:                     return;
    766:                 }
    767:                 if (emptySlotNumber == -1) // We don't need to set the collision bit here since we already have an empty slot
    768:                     buckets[bucketNumber].hash_coll |= unchecked((int)0x80000000);
    769:                 seed += incr;
    770:             } while (++ntry < buckets.Length);
    771:
    772:             if (emptySlotNumber != -1)
    773:             {
    774:                     // We pretty much have to insert in this order. Don't set hash
    775:                     // code until the value & key are set appropriately.
    776:                     buckets[emptySlotNumber].val = nvalue;
    777:                     buckets[emptySlotNumber].key = key;
    778:                     buckets[emptySlotNumber].hash_coll |= (int) hashcode;
    779:                     count++;
    780:                     version++;
    781:                     return;
    782:         
    783:             }
    784:     
    785:             // If you see this assert, make sure load factor & count are reasonable.
    786:             // Then verify that our double hash function (h2, described at top of file)
    787:             // meets the requirements described above. You should never see this assert.
    788:             BCLDebug.Assert(false, "hash table insert failed! Load factor too high, or our double hashing function is incorrect.");
    789:             throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_HashInsertFailed"));
    790:         }
    791:     

    Double Hashing in <<Introduction to Algorithmics>>:

    Double hashing is one of the best methods available for open addressing because the permutations produced have many of the characteristics of randomly chosen permutations. Double hashing uses a hash function of the form

    h(k, i) = (h1(k) + ih2(k)) mod m,

    where h1 and h2 are auxiliary hash functions. The initial probe is to position T[h1(k)]; successive probe positions are offset from previous positions by the amount h2(k), modulo m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. Figure 11.5 gives an example of insertion by double hashing.

    Figure 11.5: Insertion by double hashing. Here we have a hash table of size 13 with h1(k) = k mod 13 and h2(k) = 1 + (k mod 11). Since 14 1 (mod 13) and 14 3 (mod 11), the key 14 is inserted into empty slot 9, after slots 1 and 5 are examined and found to be occupied.

    The value h2(k) must be relatively prime to the hash-table size m for the entire hash table to be searched. (See Exercise 11.4-3.) A convenient way to ensure this condition is to let m be a power of 2 and to design h2 so that it always produces an odd number. Another way is to let m be prime and to design h2 so that it always returns a positive integer less than m. For example, we could choose m prime and let

    h1(k)

    =

    k mod m,

    h2(k)

    =

    1 + (k mod m'),

    where m' is chosen to be slightly less than m (say, m - 1). For example, if k = 123456, m = 701, and m' = 700, we have h1(k) = 80 and h2(k) = 257, so the first probe is to position 80, and then every 257th slot (modulo m) is examined until the key is found or every slot is examined.

    Double hashing improves over linear or quadratic probing in that Θ(m2) probe sequences are used, rather than Θ(m), since each possible (h1(k), h2(k)) pair yields a distinct probe sequence. As a result, the performance of double hashing appears to be very close to the performance of the "ideal" scheme of uniform hashing.

  • 相关阅读:
    《大道至简》读后感
    论校园跑腿软件的体验
    php学习
    小资料:管理学中的几种分析方法
    SQL Server 连接失败(转自http://7880.com/Info/Article116a9e40.html)
    无法打开项目文件:Visual Studio .net
    ASP.NET设计网络硬盘之下载或在线查看 (转)
    upload file to sql
    转自thinhunan 应用WEB标准进行网站设计--《网站重构》读书笔记
    关于轻量级权限控制的实现(转自登峰之道)
  • 原文地址:https://www.cnblogs.com/Dah/p/558430.html
Copyright © 2011-2022 走看看