zoukankan html css js c++ java

HashMap 1.8

HashMap在1.8中和1.7的差别

在Jdk1.8中HashMap的实现方式做了一些改变，但是基本思想还是没有变得，只是在一些地方做了优化，下面来看一下这些改变的地方,数据结构的存储由数组+链表的方式，变化为数组+链表+红黑树的存储方式，当链表长度超过阈值（8）时，将链表转换为红黑树。在性能上进一步得到提升。

在代码中注释有提到，bin（数组中的每个桶）的数据的个数是符合泊松分布的，当hash函数不是过于差的时候，每个桶数量到达8以上是很少几率的，下图中的tree bins are rarely used

     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

HashMap的hash()方法巧妙之处

先看看JDK1.8中hash算法的实现，感觉真的很巧妙。

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    index = (n - 1) & hash(key) //n表示长度

如果是自己实现hash算法的话，最简单的话就是直接用hasCode对取余

index = key.hasCode() % n

在HashMap的实现中要求n的长度为2的n次幂
对于2的n次幂取余，可以用更加高效的方法

index = key.hasCode() & (n-1)

上面两种方法都存在一种缺陷，就是取余的计算结果对高位是无效的，只是对低位有效，当计算出来的hasCode()只有高位有变化时，取余的结果还是一样的。

例如

int hashCode1 = 01110101
int hasdCode2 = 01010101

int index1 = 01110101 & 1111 -> 0101
int index2 - 01010101 & 1111 -> 0101

//十进制翻译
int hashCode1 = 117
int hashCode2 = 85

int index1 = 117 % 16  -> 5
int index2 = 85 % 16  -> 5

从上面的例子可以看出来，当key计算出来的hashCode()只有高位变化时，最终算出来的index索引就会引起hash冲突，如果冲突太多的话，HashMap的效率就会非常低下了。

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

再来看看这段代码，对key的hashCode值进行再一次计算。在java中，hashCode是32位的。

首先，对hashCode进行16位的无符号右移。

（我们的例子就假设hashCode是8位的）

int hashCode1 = 01110101 >>> 4
--> hashCode1 = 00000111

int hasCode2 = 01010101 >>> 4
-->hasCode2 = 00000101

然后，对自身进行与或运算。

hashCode1 = 01110101 ^ 00000111
--> hashCode1 = 01110010

hashCode2 = 01010101 ^ 00000101
--> hashCode2 = 01010000

最后，取余

hashCode1 = 01110010 & 1111 = 0010
hashCode2 = 01010000 & 1111 = 0000

通过上面的分析，hash的再次计算能够把高位的变化影响到了低位的变化，真的很神奇啊

作者：曾泽浩
链接：https://www.jianshu.com/p/e1d3ba0c733a
来源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

tableSizeFor(int cap)方法

https://www.imooc.com/article/267756

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

note： HashMap要求容量必须是2的幂。这个方法就是找到大于或等于给定cap的最小的2的幂的数

首先，int n = cap -1是为了防止cap已经是2的幂时，执行完后面的几条无符号右移操作之后，返回的capacity是这个cap的2倍，因为cap已经是2的幂了，就已经满足条件了。如果不懂可以往下看完几个无符号移位后再回来看。（建议自己在纸上画一下）

如果n这时为0了（经过了cap-1之后），则经过后面的几次无符号右移依然是0，最后返回的capacity是1（最后有个n+1的操作）。这里只讨论n不等于0的情况。

以16位为例，假设开始时 n 为 0000 1xxx xxxx xxxx （x代表不关心0还是1）

第一次右移 n |= n >>> 1;

由于n不等于0，则n的二进制表示中总会有一bit为1，这时考虑最高位的1。通过无符号右移1位，则将最高位的1右移了1位，再做或操作，使得n的二进制表示中与最高位的1紧邻的右边一位也为1，如0000 11xx xxxx xxxx 。

第二次右移 n |= n >>> 2;

注意，这个n已经经过了n |= n >>> 1; 操作。此时n为0000 11xx xxxx xxxx ，则n无符号右移两位，会将最高位两个连续的1右移两位，然后再与原来的n做或操作，这样n的二进制表示的高位中会有4个连续的1。如0000 1111 xxxx xxxx

第三次右移 n |= n >>> 4;

这次把已经有的高位中的连续的4个1，右移4位，再做或操作，这样n的二进制表示的高位中会有8个连续的1。如0000 1111 1111 xxxx

第。。。，你还忍心让我继续推么？相信聪明的你已经想出来了，容量最大也就是32位的正数，所以最后一次 n |= n >>> 16; 可以保证最高位后面的全部置为1。当然如果是32个1的话，此时超出了MAXIMUM_CAPACITY ，所以取值到 MAXIMUM_CAPACITY

这个方法被调用的地方

    public HashMap(int initialCapacity, float loadFactor) {
        /**省略此处代码**/
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

注意，得到的这个capacity却被赋值给了threshold。

this.threshold = tableSizeFor(initialCapacity);

开始以为这个是个Bug，感觉应该这么写：

this.threshold = tableSizeFor(initialCapacity) * this.loadFactor;

这样才符合threshold的意思（当HashMap的size到达threshold这个阈值时会扩容）。

但是，请注意，在构造方法中，并没有对table这个成员变量进行初始化，table的初始化被推迟到了put方法中，在put方法中会对threshold重新计算

put方法

public V put(K key, V value) {
    //调用putVal()方法完成
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //判断table是否初始化，否则初始化操作
    //resize操作中会把threshold变成负载因子*capacity
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //计算存储的索引位置，如果没有元素，直接赋值
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //节点若已经存在，执行赋值操作
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //判断链表是否是红黑树
        else if (p instanceof TreeNode)
            //红黑树对象操作
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //为链表，
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //链表长度大于等于8（从0到7），将链表转化为红黑树存储
                    //https://blog.csdn.net/qsdnmd/article/details/82914151  解读treeifyBin函数
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //key存在，直接覆盖
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //记录修改次数
    ++modCount;
    //判断是否需要扩容
    if (++size > threshold)
        resize();
    //空操作
    afterNodeInsertion(evict);
    return null;
}
//如果存在key节点，返回旧值，如果不存在则返回Null。

resize方法是直接看最高位，而不是像1.7中拿整个数组大小的二进制树重新&运算

资料

HashMap源码分析
HashMap方法hash()、tableSizeFor()
谈谈HashMap的hash()方法巧妙之处
Java集合：HashMap底层实现和原理（源码解析）Note:文章的内容基于JDK1.7进行分析。1.8做的改动文章末尾进行讲解。
JDK1.8 HashMap源码解读一一put方法
[JDK1.8 HashMap源码解读一一构造方法] (https://www.jianshu.com/p/6771eacb3802)

种一棵树最好的时间是十年前，其次是现在。

查看全文

相关阅读:
ExtJs之Grid
[java]转:String Date Calendar之间的转换
 SQL Server脚本备份
 Java实现文件夹的复制(包括子文件夹与文件)
Android webview使用详解
 zxing条码扫描横屏修改
 Genymotion的安装与eclipse配置教程
 开发中遇到的问题
 sql中COUNT()+GROUP BY +HAVING的组合使用
 由于包名与引用的库名相同导致的报错

原文地址：https://www.cnblogs.com/islch/p/12817125.html