zoukankan html css js c++ java

7. hashmap的底层实现

存储接口-字段

　　HashMap的数据结构是数组+链表+红黑树（JDK1.8增加了红黑树部分）实现的，如下图所示：

HashMap类有一个非常重要的字段Node[] table，即哈希桶数组，是一个Node数组

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */

transient Node<K,V>[] table;

首先来看一下Node类：

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;

}

哈希表为解决冲突，可以采用开放地址法和链地址法等来解决问题，Java中HashMap采用了链地址法。简单来说，就是数组加链表的结合。在每个数组元素上加一个链表结构，当数据被Hash后，得到数组下标，把数据放在对应下标元素的链表上。

map.put(“Mjx”,”Master");

系统将“Mjx”这个key的hashCode()方法得到其hashCode值，然后通过hash算法来定位该键值对的存储位置，有时两个key会定位到相同位置，表示发生了Hash碰撞，当然Hash算法计算结果越分散均匀，Hash碰撞的概率就越小，map存取效率就越高

好的Hash算法和扩容机制能够控制map使得Hash碰撞的概率小，而且数组占用空间也少。

int threshold;

首先，Node[]table 数组的初始化长度length(默认为16)，loadFactor为负载因子(默认为0.75)。threshold是HashMap所能容纳的最大数据量的Node（键值对）个数。正常情况下threshold=length * load factor。如果说构造对象的时候传入了capacity，那么threshold等于超过这个数的最小2的n次方的数。

final float loadFactor;

默认的负载因子0.75是对空间和事件效率的一个平衡选择，建议不要修改，除非在时间和空间比较特殊的情况下，如果内存空间很多而对事件效率要求很高，可以降低负载因子load factor的值；相反，如果内存空间紧张而对事件效率要求不高，可以增加负载因子load factor的值，这个值可以大于1。

int modCount;

modCount字段用来记录HashMap内部结构发生变化的次数。内部结构发生变化指的是结构发生变化，例如put新键值对，但是某个key对应的value值被覆盖不属于结构变化。以下不算。

map.put(1,2);

map.put(1,3);

int size;

HashMap中实际存在的键值对数量。put多少个，size就多少。注意区分size，threshold，length的区别。不要混淆。

这里存在一个问题，即使负载因子和Hash算法设计的在合理，也免不了出现拉链过长的情况，一旦出现拉链过长，则会严重影响HashMap的性能。于是JDK1.8中，当链表长度太长（默认超过8）时，链表就转换为红黑树，利用红黑树快速增删改查的特点提高HashMap性能。

红黑树链接：

http://www.sohu.com/a/201923614_466939

https://blog.csdn.net/v_july_v/article/details/6105630

功能实现-方法

主要分析根据key获取哈希桶数组索引位置、put方法的详细执行、扩容过程三个具有代表性的点深入展开讲解。

1.确定哈希桶数组索引位置

我们当然希望这个HashMap里面的元素位置尽量分布均匀，尽量使得每个位置上的元素数量只有一个，那么当我们用hash算法求得这个位置的时候，马上就可以知道对应位置就是我们要的，不用遍历链表，大大优化了查询的效率。

HashMap中的hash算法，确定数组位置：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

}

第一步：h=key.hashCode()，获取hashCode值

第二步：h^h(h>>>16)，高位参与运算

第三步：用数组长度-1对hash值进行取模运算

本质上就是三步：取key的hashCode值，高位运算，取模运算

下面举例说明下，n为table的长度。

2.分析HashMap的put方法

首先put的方法执行过程可以通过下图来理解：

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;

}

1.判断键值对数组table[i]是否为空或null，否则执行resize()进行扩容；

2.根据键值key计算hash值得到插入的数组索引i，如果table[i]==null，直接新建节点添加，转向6。如果不为空转到3；

3.判断table[I]的首个元素是否和key一样，如果相同则直接覆盖value，否则转向4，这里相同指的是hashCode以及equals；

4.判断table[I]是否为treeNode，即table[I]是否是红黑树，如果是红黑树，则直接在树种插入键值对，否则转向5；

5.遍历table[I]，判断链表长度是否大于8，大于8的话把链表转换为红黑树，在红黑树中执行插入操作，否则进行链表的插入操作；遍历过程中发现key已经存在直接覆盖value即可；

6.插入成功后，判断实际存在的键值对数量size是否超过了最大容量threshold，如果超过，进行扩容。

扩容机制

扩容就是重新计算容量，Java里的数组是无法自动扩容的，需要使用一个新的数组代替已有的容量小的数组，就想我们用一个下桶装水，如果想装更多的谁，就得换大桶。

JDK1.7的做法：newTable[I]的引用赋给了e.next，也就是使用了单链表的头插入方式，同一位置上新元素总会被放在链表的头部位置；这样先放在一个索引上的元素最终会被放到Entry链的尾部（如果发生了hash冲突的话）

如图所示：

JDK1.8的优化：我们使用的是2次幂的扩展，所以元素的位置要么是在原位置，要么是在原位置再移动2次幂的位置。看下图。图（a）表示扩容前的key1和key2两种key确定索引位置的示例，图（b）表示扩容后key1和key2两种key确定索引位置的示例，其中hash1是key1对应的哈希与高位运算结果。

元素在重新计算hash之后，因为n变为2倍，那么n-1的mask范围在高位多1bit（红色），因此新的index就会发生这样的变化：

所以说JDK7和JDK8在扩容上最大的区别还是JDK8不需要重新计算hash。因此我们扩容HashMap的时候，不需要像JDK1.7的实现那样重新hash，只需要看看原来的hash值新增的那个bit是1还是0就好了，是0的话索引没变，是1的话索引编程“原索引+oldCap“。示意图：

省去了计算hash值得事件，同时，resize过程中，均匀的把之前冲突的节点分散到新的bucket了，这一快捷就是JDK1.8新增的优化点，有一点注意，JDK1.7中resize的时候，链表元素会倒置，但是JDK1.8不会倒置。

1.7的做法是直接整个链表都放入新的数组，1.8的做法是上述，最多分2个链表，分别插入。一条是索引不变的，一条是索引变的。上述的图曾经有疑问，现在没有了。图没有错。

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;

}

链接：http://www.importnew.com/20386.html

查看全文

相关阅读:
R、Python、Scala 和 Java，到底该使用哪一种大数据编程语言？
iOS7
The “Experimental” status of Multipath TCP
(OK) porting MPTCP to LineageOS-14.1-kiwi (Android-7.1.1，运行在Huawei honor 5x) for VirtualBox- 100% 成功
 ip_route_output_key函数分析（1）
(OK) porting MPTCP to LineageOS-14.1-kiwi (Android-7.1.1，运行在Huawei honor 5x) for VirtualBox
(2) linux 3.x
【CodeForces 271D】Good Substrings
【CodeForces 987C】Three displays
【CodeForces 574B】Bear and Three Musketeers

原文地址：https://www.cnblogs.com/GrimMjxCl/p/9320592.html