zoukankan html css js c++ java

HashMap1.7 问题总结

　　Java SE 源码在面试中的考查也算是重中之重了，最近看了HashMap1.7 的源码，对其中一些代码的设计以及线程不安全等引发的问题，在此记录随笔，如有不正之处望指出。

为什么hashMap的负载因子是0.75

　　 hashMap 的负载因子为0.75是一个常识性问题，但是为什么负载因子不为0.5、0.6或者是1和0呢？
查看API 源码：
在HashMap注释中有这么一段：

Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million

翻译过来大概的意思就是：

容器中的节点遵循泊松分布，默认调整大小的参数平均约为0.5，阈值为0.75，虽然由于方差较大调整粒度。忽略方差列表大小k的出现次数是(exp(-0.5) * pow(0.5, k) /
阶乘(k))。（——有道）

可见当负载因子为0.75 的时候，桶中同一节点上链表超过8个元素的情况几乎不可能发生。负载因子的作用在于判断map 是否需要扩容，而扩容是一个比较耗时的操作，因为它需要遍历赋值。

如果负载因子过小，比如为0.5。在初始容量为16 的情况下，超过8个就会扩容到32，而未使用的空间为24，利用率仅为1/4。频繁的扩容不仅耗时而且浪费空间。

如果负载因子过大，比如为1 。桶中的元素达到16 过后才会扩容，虽然空间利用率大，但是桶中元素的冲突肯定变多。而map 数据结构的好处就在于理想情况下获取元素的时间复杂度为O(1)，负载增大会影响map 的查询等操作。

简而言之HashMap负载因子为0.75是空间和时间成本的一种折中。

为什么HashMap 的初始容量为2 的N 次幂

　　关于为什么HashMap 的初始容量为2 的N 次幂这个问题，网上的答案有很多，但是大多数都只是答道了：再次hash 的值 & (length - 1) 就相当于取余，保证下标在容量之下。但是这样的回答，我个人感觉有点牛头不对马嘴（没有嘲讽的意思，只是感觉怪怪的）。因为 & 操作肯定小于或等于最小的那个数。
　　哈希的意义在于使分布平均，前面两次哈希就可以看出这个思想。理所因当，我们在取下标的时候也应该如此。后来我最算是看到了比较合理的解释：

之所以取 2 的 N 次幂的原因是，(length - 1) ---> 它的二进制位都为1
而 1 & （0 /1）的结果也是（0 / 1），如果是 0 那结果一定为0
如果一定为0 就意味着一些下标值是取不到的，不够平均

比如我们取容量为 15

public class Test {
    int hash(Object k) {
        int h = 0;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    public static void main(String[] args) {
        for (int i = 1 ; i <= 12 ; i++){
            System.out.println((new Test().hash(new Test())) & 14);
        }
    }
}

输出：
 4 
14 
2 
14 
8 
14 
2 
4 
14 
12
 4
 0 
Process finished with exit code 0

输出的结果可以看出取值都为偶数，因为14 的最后一个二进制位为0 。

讲一讲HashMap 的put 方法引发的问题

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

//添加节点
void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

　　在讲put 引发的问题之前我们先总结一下put 函数做了哪些事情

判断map 有没有初始化
判断插入的key 是不是为null ，如果是插入null 的位置（table[0]）
取得下标，遍历链表判断有没相同的key ，有则添加新值返回旧值
没有则添加元素
添加元素先判断需不需要扩容，扩容采用头插法
创建新节点添加对应位置

HashMap 并不是线程安全的类，所以在resize() 操作时，并发下可能会造成的问题

形成环形链表

并发插入：

丢失值

就不说明原因了，网上多。

查看全文

相关阅读:
父亲节前参考四级考试
 rpm小解
 oracle忘记sys/system/scott用户的密码怎么办
 yum 小解
 linux下设置swap文件
 启动mysql 报错： ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (2)
mysql 常用命令
 wget安装
 删除mysql
什么是swap分区

原文地址：https://www.cnblogs.com/tanwt/p/9829647.html