zoukankan      html  css  js  c++  java
  • Algorithm | hash

    A basic requirement is that the function should provide a uniform distribution of hash values. A non-uniform distribution increases the number of collisions and the cost of resolving them.

    A critical statistic for a hash table is called the load factor. This is simply the number of entries divided by the number of buckets, that is, n/k where n is the number of entries and k is the number of buckets.

    hash不同的类型有不同的方法,性能各不一样。hash共有的问题就是碰撞,Collision resolution处理碰撞(冲突的方式)有:

    1. 开放寻址法(open addressing):(hash_i=(hash(key)+d_i) \,mod\, m, i=1,2...k\,(k le m-1)),其中hash(key)为散列函数,m为散列表长,d_i为增量序列,i为已发生碰撞的次数。增量序列可有下列取法:
    d_i=1,2,3...(m-1)称为 线性探测;即 d_i=i ,或者为其他线性函数。相当于逐个探测存放地址的表,直到查找到一个空单元,把散列地址存放在该空单元。
    (d_i=pm 1^2, pm 2^2,pm 3^2...pm k^2 (k le m/2))称为 平方探测。相对线性探测,相当于发生碰撞时探测间隔 d_i=i^2 个单元的位置是否为空,如果为空,将地址存放进去。
    d_i=伪随机数序列,称为 伪随机探测。

    Well-known probe sequences include:

    • Linear probing, in which the interval between probes is fixed (usually 1)
    • Quadratic probing, in which the interval between probes is increased by adding the successive outputs of a quadratic polynomial to the starting value given by the original hash computation
    • Double hashing, in which the interval between probes is computed by another hash function

    A drawback of all these open addressing schemes is that the number of stored entries cannot exceed the number of slots in the bucket array. Open addressing schemes also put more stringent requirements on the hash function: besides distributing the keys more uniformly over the buckets, the function must also minimize the clustering of hash values that are consecutive in the probe order. 

    Open addressing only saves memory if the entries are small (less than four times the size of a pointer) and the load factor is not too small. If the load factor is close to zero (that is, there are far more buckets than stored entries), open addressing is wasteful even if each entry is just two words.

    Generally speaking, open addressing is better used for hash tables with small records that can be stored within the table (internal storage) and fit in a cache line. They are particularly suitable for elements of one word or less. If the table is expected to have a high load factor, the records are large, or the data is variable-sized, chained hash tables often perform as well or better.

    这个性能可以做到很好,因为是连续数组,不需要重新开内存。

    2. 单独链表法Separate chaining:将散列到同一个存储位置的所有元素保存在一个链表中。实现时,一种策略是散列表同一位置的所有碰撞结果都是用栈存放的,新元素被插入到表的前端还是后端完全取决于怎样方便。

    Chained hash tables with linked lists are popular because they require only basic data structures with simple algorithms, and can use simple hash functions that are unsuitable for other methods.

    Chained hash tables also inherit the disadvantages of linked lists. When storing small keys and values, the space overhead of the next pointer in each entry record can be significant. An additional disadvantage is that traversing a linked list has poor cache performance, making the processor cache ineffective.

    3. 再散列:hash_i = hash_i (key), i=1,2...k。hash_i是一些散列函数。即在上次散列计算发生碰撞时,利用该次碰撞的散列函数地址产生新的散列函数地址,直到碰撞不再发生。这种方法不易产生“聚集”(Cluster),但增加了计算时间。

     1 class HashTable {
     2     public:
     3         struct List {
     4             int val;
     5             List* next;
     6             List(int val):val(val), next(NULL) {}
     7         };
     8 
     9         HashTable() {
    10             table = new List*[TABLE_SIZE];
    11             memset(table, 0, sizeof(List*) * TABLE_SIZE);
    12             count = 0;
    13         }
    14 
    15         ~HashTable() {
    16             for (int i = 0; i < TABLE_SIZE; ++i) {
    17                 List* p = table[i], *tmp;
    18                 while (p) {
    19                     tmp = p->next;
    20                     delete p;
    21                     p = tmp;
    22                 }
    23             }
    24             delete[] table;
    25         }
    26 
    27         void insert(int val) {
    28             //cout << "insert " << val << endl;
    29             count++;
    30             int index = hash(val);
    31             List* elem = new List(val);
    32             elem->next = table[index];
    33             table[index] = elem;
    34         }
    35 
    36         void remove(int val) {
    37             int index = hash(val);
    38             List **p = &table[index], *tmp;
    39             while (*p) {
    40                 if ((*p)->val == val) {
    41                     //cout << "remove " << val << endl;
    42                     count--;
    43                     tmp = (*p)->next;
    44                     delete *p;
    45                     *p = tmp; 
    46                     return;
    47                 }
    48                 p = &((*p)->next);
    49             }
    50         }
    51 
    52         int size() const {
    53             return count;
    54         }
    55 
    56         void print() const {
    57             for (int i = 0; i < TABLE_SIZE; ++i) {
    58                 List* p = table[i];
    59                 cout << i << ": ";
    60                 while (p) {
    61                     cout << p->val << " ";
    62                     p = p->next;
    63                 }
    64                 cout << endl;
    65             }
    66         }
    67     private:
    68         List** table;
    69         int count;
    70         int hash(int val) {
    71             return val % TABLE_SIZE;
    72         }
    73         enum { TABLE_SIZE = 1000 };
    74 };
  • 相关阅读:
    微信小程序 组件事件传递
    vue 项目引入字体报错
    vue 单文件 样式写了scoped 不能覆盖框架原有样式的解决办法
    react 动态获取数据
    百度地图marker点击任意一个当前的变化,其余的marker不变
    对象字面量中可以使用中括号作为属性,表示属性也能是一个变量
    二维数组转化为一维数组 contact 与apply 的结合
    一个对象如何复制给另一个对象,互不影响
    在-for 循环里面如何利用ref 操作dom
    mac 进程管理
  • 原文地址:https://www.cnblogs.com/linyx/p/3773922.html
Copyright © 2011-2022 走看看