zoukankan      html  css  js  c++  java
  • Java Map hashCode深究

    【Java心得总结七】Java容器下——Map 在自己总结的这篇文章中有提到hashCode,但是没有细究,今天细究整理一下hashCode相关问题

    1.hashCode与equals

      首先我们都知道hashCode()和equals()函数是java基类Object的一部分,我查阅了java7文档,其中对于两者的描述如下:

      解读这里对hashCode的描述,不难发现:

    • 首先hashCode必须是一个整数,即Integer类型的
    • 其次满足一致性,即在程序的同一次执行无论调用该函数多少次都返回相同的整数。(这里注意是程序的一次执行,而程序不同的执行间是不保证返回相同结果,因为hashcode计算方式可能会涉及到物理地址,而程序的不同执行对象在内存的位置会不同)
    • 另外与equas配合,如果两个对象调用equals相同那么一定拥有相同的hashcode,然而反之,如果两个对象调用equals不相等,hashcode不一定就不同(但是这里提到尽量产生不同的hashcode有利于提高哈希表的性能,减少了冲突嘛)

      

      这里突然发现《java编程思想》中对于equals的描述原来出自这里:

    • 自反性:对任意x,x.equals(x)一定返回true
    • 对称性:对任意x,y如果x.equals(y)返回true,则y.equals(y)返回true
    • 传递性:对任意x,y,z如果x.equals(y)和y.equals(z)都返回true,则x.equals(z)返回true
    • 一致性:对任意x,y,equals函数返回的结果无论调用多少次都一致
    • 另外还有就是任意x,x.equals(null)都会返回false
    • 还需要注意的就是一旦equals函数被override,那么hashcode也一定要override以保持前面的原则

    2.Map对hashCode的应用

      Java中HashMap的实现,我截取了部分代码如下:

    代码段-1

      1 /* HashMap实现部分代码 */
      2 public class HashMap<K,V>
      3     extends AbstractMap<K,V>
      4     implements Map<K,V>, Cloneable, Serializable
      5 {
      6     /**
      7      * The default initial capacity - MUST be a power of two.
      8      */
      9     static final int DEFAULT_INITIAL_CAPACITY = 16;
     10 
     11     /**
     12      * The maximum capacity, used if a higher value is implicitly specified
     13      * by either of the constructors with arguments.
     14      * MUST be a power of two <= 1<<30.
     15      */
     16     static final int MAXIMUM_CAPACITY = 1 << 30;
     17 
     18     /**
     19      * The load factor used when none specified in constructor.
     20      */
     21     static final float DEFAULT_LOAD_FACTOR = 0.75f;
     22 
     23     /**
     24      * The table, resized as necessary. Length MUST Always be a power of two.
     25      */
     26     transient Entry<K,V>[] table;
     27 
     28     /**
     29      * The number of key-value mappings contained in this map.
     30      */
     31     transient int size;
     32 
     33     /**
     34      * The next size value at which to resize (capacity * load factor).
     35      * @serial
     36      */
     37     int threshold;
     38 
     39     /**
     40      * The load factor for the hash table.
     41      *
     42      * @serial
     43      */
     44     final float loadFactor;
     45 
     46     /**
     47      * Retrieve object hash code and applies a supplemental hash function to the
     48      * result hash, which defends against poor quality hash functions.  This is
     49      * critical because HashMap uses power-of-two length hash tables, that
     50      * otherwise encounter collisions for hashCodes that do not differ
     51      * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     52      */
     53     final int hash(Object k) {
     54         int h = 0;
     55         if (useAltHashing) {
     56             if (k instanceof String) {
     57                 return sun.misc.Hashing.stringHash32((String) k);
     58             }
     59             h = hashSeed;
     60         }
     61 
     62         h ^= k.hashCode();
     63 
     64         // This function ensures that hashCodes that differ only by
     65         // constant multiples at each bit position have a bounded
     66         // number of collisions (approximately 8 at default load factor).
     67         h ^= (h >>> 20) ^ (h >>> 12);
     68         return h ^ (h >>> 7) ^ (h >>> 4);
     69     }
     70     
     71     /**
     72      * Returns index for hash code h.
     73      */
     74     static int indexFor(int h, int length) {
     75         return h & (length-1);
     76     }
     77 
     78     /**
     79      * Adds a new entry with the specified key, value and hash code to
     80      * the specified bucket.  It is the responsibility of this
     81      * method to resize the table if appropriate.
     82      *
     83      * Subclass overrides this to alter the behavior of put method.
     84      */
     85     void addEntry(int hash, K key, V value, int bucketIndex) {
     86         if ((size >= threshold) && (null != table[bucketIndex])) {
     87             resize(2 * table.length);
     88             hash = (null != key) ? hash(key) : 0;
     89             bucketIndex = indexFor(hash, table.length);
     90         }
     91 
     92         createEntry(hash, key, value, bucketIndex);
     93     }
     94     
     95     /**
     96      * Like addEntry except that this version is used when creating entries
     97      * as part of Map construction or "pseudo-construction" (cloning,
     98      * deserialization).  This version needn't worry about resizing the table.
     99      *
    100      * Subclass overrides this to alter the behavior of HashMap(Map),
    101      * clone, and readObject.
    102      */
    103     void createEntry(int hash, K key, V value, int bucketIndex) {
    104         Entry<K,V> e = table[bucketIndex];
    105         table[bucketIndex] = new Entry<>(hash, key, value, e);
    106         size++;
    107     }
    108     
    109     /**
    110      * Associates the specified value with the specified key in this map.
    111      * If the map previously contained a mapping for the key, the old
    112      * value is replaced.
    113      *
    114      * @param key key with which the specified value is to be associated
    115      * @param value value to be associated with the specified key
    116      * @return the previous value associated with <tt>key</tt>, or
    117      *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
    118      *         (A <tt>null</tt> return can also indicate that the map
    119      *         previously associated <tt>null</tt> with <tt>key</tt>.)
    120      */
    121     public V put(K key, V value) {
    122         if (key == null)
    123             return putForNullKey(value);
    124         int hash = hash(key);
    125         int i = indexFor(hash, table.length);
    126         for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    127             Object k;
    128             if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    129                 V oldValue = e.value;
    130                 e.value = value;
    131                 e.recordAccess(this);
    132                 return oldValue;
    133             }
    134         }
    135 
    136         modCount++;
    137         addEntry(hash, key, value, i);
    138         return null;
    139     }
    140     
    141     /**
    142      * Returns the entry associated with the specified key in the
    143      * HashMap.  Returns null if the HashMap contains no mapping
    144      * for the key.
    145      */
    146     final Entry<K,V> getEntry(Object key) {
    147         int hash = (key == null) ? 0 : hash(key);
    148         for (Entry<K,V> e = table[indexFor(hash, table.length)];
    149              e != null;
    150              e = e.next) {
    151             Object k;
    152             if (e.hash == hash &&
    153                 ((k = e.key) == key || (key != null && key.equals(k))))
    154                 return e;
    155         }
    156         return null;
    157     }
    158     
    159     /**
    160      * Removes and returns the entry associated with the specified key
    161      * in the HashMap.  Returns null if the HashMap contains no mapping
    162      * for this key.
    163      */
    164     final Entry<K,V> removeEntryForKey(Object key) {
    165         int hash = (key == null) ? 0 : hash(key);
    166         int i = indexFor(hash, table.length);
    167         Entry<K,V> prev = table[i];
    168         Entry<K,V> e = prev;
    169 
    170         while (e != null) {
    171             Entry<K,V> next = e.next;
    172             Object k;
    173             if (e.hash == hash &&
    174                 ((k = e.key) == key || (key != null && key.equals(k)))) {
    175                 modCount++;
    176                 size--;
    177                 if (prev == e)
    178                     table[i] = next;
    179                 else
    180                     prev.next = next;
    181                 e.recordRemoval(this);
    182                 return e;
    183             }
    184             prev = e;
    185             e = next;
    186         }
    187 
    188         return e;
    189     }
    190     
    191     /**
    192      * Rehashes the contents of this map into a new array with a
    193      * larger capacity.  This method is called automatically when the
    194      * number of keys in this map reaches its threshold.
    195      *
    196      * If current capacity is MAXIMUM_CAPACITY, this method does not
    197      * resize the map, but sets threshold to Integer.MAX_VALUE.
    198      * This has the effect of preventing future calls.
    199      *
    200      * @param newCapacity the new capacity, MUST be a power of two;
    201      *        must be greater than current capacity unless current
    202      *        capacity is MAXIMUM_CAPACITY (in which case value
    203      *        is irrelevant).
    204      */
    205     void resize(int newCapacity) {
    206         Entry[] oldTable = table;
    207         int oldCapacity = oldTable.length;
    208         if (oldCapacity == MAXIMUM_CAPACITY) {
    209             threshold = Integer.MAX_VALUE;
    210             return;
    211         }
    212 
    213         Entry[] newTable = new Entry[newCapacity];
    214         boolean oldAltHashing = useAltHashing;
    215         useAltHashing |= sun.misc.VM.isBooted() &&
    216                 (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    217         boolean rehash = oldAltHashing ^ useAltHashing;
    218         transfer(newTable, rehash);
    219         table = newTable;
    220         threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    221     }
    222     
    223     /**
    224      * Transfers all entries from current table to newTable.
    225      */
    226     void transfer(Entry[] newTable, boolean rehash) {
    227         int newCapacity = newTable.length;
    228         for (Entry<K,V> e : table) {
    229             while(null != e) {
    230                 Entry<K,V> next = e.next;
    231                 if (rehash) {
    232                     e.hash = null == e.key ? 0 : hash(e.key);
    233                 }
    234                 int i = indexFor(e.hash, newCapacity);
    235                 e.next = newTable[i];
    236                 newTable[i] = e;
    237                 e = next;
    238             }
    239         }
    240     }
    241 }

     代码段-2

     1 static class Entry<K,V> implements Map.Entry<K,V> {
     2         final K key;
     3         V value;
     4         Entry<K,V> next;
     5         int hash;
     6 
     7         /**
     8          * Creates new entry.
     9          */
    10         Entry(int h, K k, V v, Entry<K,V> n) {
    11             value = v;
    12             next = n;
    13             key = k;
    14             hash = h;
    15         }
    16 }

      我将HahsMap中的增删改查以及相关用到的函数截取了出来以作分析:

    • 存储方式:Java中的HashMap源码是通过Entry<K,V>[]即一个Entry数组实现的,在代码26行(前面加transient是多线程问题);
    • 散列函数:53行的hash函数中我们可以看出Java源代码利用HashTable中的key的hashCode来计算哈希值,我们可以将这个函数看做散列函数;
    • 扩展存储空间:在代码85行addEntry函数中我们看到当发生空间不足或者冲突的时候,java会利用代码205行的代码进行扩充,扩充方法就是new一个新的Entry数组,数组大小是原有数组大小的两倍,之后再将旧的表格中的数据全部拷贝到现有新的数组中。(注:Java在性能与空间之间做了权衡,即只有当size大于某一个阈值threshold且发生了冲突的时候才会进行存储数组的扩充
    • 存储位置:在代码89行addEntry函数中,当添加一个元素时,如何确定将该Entry添加到数组的什么位置:利用了代码74行的indexFor函数,通过利用hash函数计算的哈希值与数组长度进行与运算来获得(保证了返回的值不会超出数组界限);
    • 冲突解决:哈希表结构不得不提的就是冲突问题,因为我们知道几乎不可能找到一个完美的散列函数把所有数据完全分散不冲突的散列在存储序列中(除非存储空间足够大),所以冲突时必不可少的,查看代码段-2,会发现每个Entry中会有一个指针指向下一个Entry,在代码段-1中的105行,会发现createEntry函数中会将最新插入的Entry放在table中,然后让它指向原有的链表。即Java HashMap中用了最传统的当发生冲突在后面挂链表的方式来解决。
    • put函数:在代码121行我们看到我们最常用的HashMap插入元素方法put,当传入要添加的key和value时,它会遍历哈希表,来确定表中是否已经有key(确定两个key是否相等就要用到equals函数,所以如果我们在利用HashMap的时候key是自定义类,那么切记要override equals函数),如果没有则新添加,如果有则覆盖原有key的value值
    • getEntry函数:在代码146行getEntry函数中会再次计算出传入key的hash值,然后还是通过代码74行的indexFor函数计算该元素在数组中的位置,我们发现函数中并不是O(1)的方式取到的,需要用到一个循环,因为我们上面提到了冲突,如果在某点发生了冲突,那么就要通过遍历冲突链表来进行查找
    • removeEntry函数:同样涉及到一个查找的过程,而且还涉及到如果被删除元素在冲突链表中需要修改前后元素的指针

     3.散列函数/哈希函数

       通过上面的分析我们也会发现如何构造一个优良的散列函数是一件非常重要的事情,我们构造散列函数的基本原则就是:尽可能的减少冲突,尽可能的将元素“散列”在存储空间中

      下面是我从维基上找到的一些方法,之后如果有好的想法再做补充:

    1. 直接定址法:取关键字或关键字的某个线性函数值为散列地址。即hash(k)=khash(k)=acdot k + b,其中a\,b为常数(这种散列函数叫做自身函数)
    2. 数字分析法:假设关键字是以x为基的数,并且哈希表中可能出现的关键字都是事先知道的,则可取关键字的若干数位组成哈希地址。
    3. 平方取中法:取关键字平方后的中间几位为哈希地址。通常在选定哈希函数时不一定能知道关键字的全部情况,取其中的哪几位也不一定合适,而一个数平方后的中间几位数和数的每一位都相关,由此使随机分布的关键字得到的哈希地址也是随机的。取的位数由表长决定。
    4. 折叠法:将关键字分割成位数相同的几部分(最后一部分的位数可以不同),然后取这几部分的叠加和(舍去进位)作为哈希地址。
    5. 随机数法
    6. 除留余数法:取关键字被某个不大于散列表表长m的数p除后所得的余数为散列地址。即hash(k)=k \,mod \,pple m。不仅可以对关键字直接取模,也可在折叠法平方取中法等运算之后取模。对p的选择很重要,一般取素数或m,若p选择不好,容易产生碰撞。

     而在反观Java中的散列函数:

    代码段-3

     1 /**
     2      * A randomizing value associated with this instance that is applied to
     3      * hash code of keys to make hash collisions harder to find.
     4      */
     5     transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
     6     
     7     /**
     8      * Retrieve object hash code and applies a supplemental hash function to the
     9      * result hash, which defends against poor quality hash functions.  This is
    10      * critical because HashMap uses power-of-two length hash tables, that
    11      * otherwise encounter collisions for hashCodes that do not differ
    12      * in lower bits. Note: Null keys always map to hash 0, thus index 0.
    13      */
    14     final int hash(Object k) {
    15         int h = 0;
    16         if (useAltHashing) {
    17             if (k instanceof String) {
    18                 return sun.misc.Hashing.stringHash32((String) k);
    19             }
    20             h = hashSeed;
    21         }
    22 
    23         h ^= k.hashCode();
    24 
    25         // This function ensures that hashCodes that differ only by
    26         // constant multiples at each bit position have a bounded
    27         // number of collisions (approximately 8 at default load factor).
    28         h ^= (h >>> 20) ^ (h >>> 12);
    29         return h ^ (h >>> 7) ^ (h >>> 4);
    30     }
    1.  Java会利用随机数法产生一个hashSeed
    2. 利用这个随机数再与key的hashcode进行异或运算
    3. 然后通过各种移位异或来算出一个哈希值(这里搞不清楚什么意思,看下别的书,以后补充吧)

    似乎Java是综合运用了上面几种方法来计算哈希值

    上面有些地方是自己的一些理解,如果碰巧某位仁兄看到那里说的不对了还请指正~

  • 相关阅读:
    分页插件PageHelper
    持久层的具体实现
    SSM框架搭建
    mysql库中建立存储过程
    安装python2.7
    Spark应用程序第三方jar文件依赖解决方案
    spark2.0.1源码编译
    Hadoop2.7.3源码编译
    Hadoop2.x伪分布式环境搭建(一)
    Linux基础环境的各项配置(三)
  • 原文地址:https://www.cnblogs.com/xlturing/p/4445574.html
Copyright © 2011-2022 走看看