zoukankan      html  css  js  c++  java
  • 没人比我更懂 HashMap :)

    哈,标题开个玩笑,0202 年的段子哈。

    一、首先看一下 HashMap 的构造函数

    /**
         * Constructs an empty <tt>HashMap</tt> with the specified initial
         * capacity and load factor.
         *
         * @param  initialCapacity the initial capacity
         * @param  loadFactor      the load factor
         * @throws IllegalArgumentException if the initial capacity is negative
         *         or the load factor is nonpositive
         */
        public HashMap(int initialCapacity, float loadFactor) {
            if (initialCapacity < 0)
                throw new IllegalArgumentException("Illegal initial capacity: " +
                                                   initialCapacity);
            if (initialCapacity > MAXIMUM_CAPACITY)
                initialCapacity = MAXIMUM_CAPACITY;
            if (loadFactor <= 0 || Float.isNaN(loadFactor))
                throw new IllegalArgumentException("Illegal load factor: " +
                                                   loadFactor);
            this.loadFactor = loadFactor;
            this.threshold = tableSizeFor(initialCapacity); // 奇怪的是这里初始化阈值没有用到负载因子。
        }

    第一个参数是初始化容量大小,第二个参数是负载因子。

    对这两个参数有如下介绍:

    * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
     * performance: <i>initial capacity</i> and <i>load factor</i>.  The
     * <i>capacity</i> is the number of buckets in the hash table, and the initial
     * capacity is simply the capacity at the time the hash table is created.  The
     * <i>load factor</i> is a measure of how full the hash table is allowed to
     * get before its capacity is automatically increased.  When the number of
     * entries in the hash table exceeds the product of the load factor and the
     * current capacity, the hash table is <i>rehashed</i> (that is, internal data
     * structures are rebuilt) so that the hash table has approximately twice the
     * number of buckets.

    机翻的意思就是:

    HashMap 的实例有两个影响其性能的参数:初始容量和负载因子。

    容量是哈希表中的桶数,初始容量就是创建哈希表时的容量。

    负载因子是一种度量方法,用来衡量在自动增加哈希表的容量之前,哈希表允许达到的满度。

    当哈希表中的条目数超过负载因子和当前容量的乘积时,哈希表将被重新哈希(即重新构建内部数据结构),

    这样哈希表的桶数大约是原来的两倍。

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    最大的容量是 2 的 30 次方,因为 int 类型最大值是 2 的 31 次方减一。容量还必须是 2 的幂。

     * <p>This implementation provides constant-time performance for the basic
     * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
     * disperses the elements properly among the buckets.  Iteration over
     * collection views requires time proportional to the "capacity" of the
     * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
     * of key-value mappings).  Thus, it's very important not to set the initial
     * capacity too high (or the load factor too low) if iteration performance is
     * important.

    另外不要将容量设置太高,或者将负载因子设置太低,这都会影响性能。

    // The next size value at which to resize (capacity * load factor).
    int threshold;

    阈值,等于容量和负载因子的乘积,如果 table.length 大于 阈值,就得进行 2 倍扩容。

    接下来看看 tableSizeFor 方法,也就是计算阈值的:

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

    翻译的意思是返回给定目标容量的 2 的幂,也就是大于且最接近给定目标容量的最小 2 的幂。

    这段代码可能看着不太好理解,我们假设 n 的最高位的 1 在位置 i 上,>>> 表示无符号右移。

    (>>> 和 >> 的区别就是前者高位不管正数负数都取0,后者正数取 0,负数取 1)。

    右移一位再和原来的值进行或操作,那么结果位置 i 和 i-1 的值一定也为 1。

    同理,最后结果一定是 0 ~ i 位都为 1,再加 1 的话,就是最接近给定值的最小2的幂。

    另外如果 cap 为  0 的话,那么就是所有位都是 1 了,n 就小于 0,阈值就为 1。

    现在我们在来看下另外的三个构造函数:

        /**
         * Constructs an empty <tt>HashMap</tt> with the specified initial
         * capacity and the default load factor (0.75).
         *
         * @param  initialCapacity the initial capacity.
         * @throws IllegalArgumentException if the initial capacity is negative.
         */
        public HashMap(int initialCapacity) {
            this(initialCapacity, DEFAULT_LOAD_FACTOR);
        }
    
        /**
         * Constructs an empty <tt>HashMap</tt> with the default initial capacity
         * (16) and the default load factor (0.75).
         */
        public HashMap() {
            this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
        }
    
        /**
         * Constructs a new <tt>HashMap</tt> with the same mappings as the
         * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
         * default load factor (0.75) and an initial capacity sufficient to
         * hold the mappings in the specified <tt>Map</tt>.
         *
         * @param   m the map whose mappings are to be placed in this map
         * @throws  NullPointerException if the specified map is null
         */
        public HashMap(Map<? extends K, ? extends V> m) {
            this.loadFactor = DEFAULT_LOAD_FACTOR;
            putMapEntries(m, false);
        }

    对于前两个就不用说了,默认的负载因子为 0.75,为什么要取这个值呢?

     * <p>As a general rule, the default load factor (.75) offers a good
     * tradeoff between time and space costs.  Higher values decrease the
     * space overhead but increase the lookup cost (reflected in most of
     * the operations of the <tt>HashMap</tt> class, including
     * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
     * the map and its load factor should be taken into account when
     * setting its initial capacity, so as to minimize the number of
     * rehash operations.  If the initial capacity is greater than the
     * maximum number of entries divided by the load factor, no rehash
     * operations will ever occur.

    机翻如下:

    作为一般规则,默认的负载系数(.75)在时间和空间成本之间提供了一个很好的折衷。

    较高的值减少了空间开销,但增加了查找成本(反映在HashMap类的大部分操作中,包括get和put)。

    在设置初始容量时,应该考虑映射中的预期条目数及其负载因子,以便最小化重散列操作的数量。

    如果初始容量大于最大条目数除以负载因子(初始容量和负载因子的乘积大于最大条目数),则不会发生重新散列操作。

    接下来看看 putMapEntries 方法,最初构造时为假,否则为真。

        /**
         * Implements Map.putAll and Map constructor.
         *
         * @param m the map
         * @param evict false when initially constructing this map, else
         * true (relayed to method afterNodeInsertion).
         */
        final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
            int s = m.size();
            if (s > 0) {
                if (table == null) { // pre-size // table 没有被初始化过(也就是构建函数是调用的),就初始化一下阈值
                    float ft = ((float)s / loadFactor) + 1.0F;
                    int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                             (int)ft : MAXIMUM_CAPACITY);
                    if (t > threshold)
                        threshold = tableSizeFor(t);
                }
                else if (s > threshold)   // table 已经被初始化过了,长度大于阈值需要进行扩容处理
                    resize();
                for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                    K key = e.getKey();
                    V value = e.getValue();
                    putVal(hash(key), key, value, false, evict);
                }
            }
        }

     table 就是存储 Map 键值对的数组,并根据需要调整大小,长度总是 2 的幂。

        /**
         * The table, initialized on first use, and resized as
         * necessary. When allocated, length is always a power of two.
         * (We also tolerate length zero in some operations to allow
         * bootstrapping mechanics that are currently not needed.)
         */
        transient Node<K,V>[] table;

    Node 就是一个静态内部类,也就是存储 Map 的实体。

        /**
         * Basic hash bin node, used for most entries.  (See below for
         * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
         */
        static class Node<K,V> implements Map.Entry<K,V> {
            final int hash;
            final K key;
            V value;
            Node<K,V> next;
    
            Node(int hash, K key, V value, Node<K,V> next) {
                this.hash = hash;
                this.key = key;
                this.value = value;
                this.next = next;
            }
    
            public final K getKey()        { return key; }
            public final V getValue()      { return value; }
            public final String toString() { return key + "=" + value; }
    
            public final int hashCode() {
                return Objects.hashCode(key) ^ Objects.hashCode(value);
            }
    
            public final V setValue(V newValue) {
                V oldValue = value;
                value = newValue;
                return oldValue;
            }
    
            public final boolean equals(Object o) {
                if (o == this)
                    return true;
                if (o instanceof Map.Entry) {
                    Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                    if (Objects.equals(key, e.getKey()) &&
                        Objects.equals(value, e.getValue()))
                        return true;
                }
                return false;
            }
        }

    我们目前看的是构建时候调用,就只看构建时候走的逻辑。那么我们看下 putVal 方法 和 hash 方法:

    hash 方法(每个 key 的 hash 是不会变的,这个无符号右移 16 位的操作可以减少冲突):

    /**
    * Computes key.hashCode() and spreads (XORs) higher bits of hash
    * to lower. Because the table uses power-of-two masking, sets of
    * hashes that vary only in bits above the current mask will
    * always collide. (Among known examples are sets of Float keys
    * holding consecutive whole numbers in small tables.) So we
    * apply a transform that spreads the impact of higher bits
    * downward. There is a tradeoff between speed, utility, and
    * quality of bit-spreading. Because many common sets of hashes
    * are already reasonably distributed (so don't benefit from
    * spreading), and because we use trees to handle large sets of
    * collisions in bins, we just XOR some shifted bits in the
    * cheapest possible way to reduce systematic lossage, as well as
    * to incorporate impact of the highest bits that would otherwise
    * never be used in index calculations because of table bounds.
    */
    static final int hash(Object key) {
            int h;
            return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); // 保留高16位,将高16位和低16位进行异或的结果作为低16位。
        }
    public final int hashCode() {
                return Objects.hashCode(key) ^ Objects.hashCode(value);
            }

    putVal 方法:

        /**
         * Implements Map.put and related methods.
         *
         * @param hash hash for key
         * @param key the key
         * @param value the value to put
         * @param onlyIfAbsent if true, don't change existing value
         * @param evict if false, the table is in creation mode.
         * @return previous value, or null if none
         */
        final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                       boolean evict) {
            Node<K,V>[] tab; Node<K,V> p; int n, i;
            if ((tab = table) == null || (n = tab.length) == 0)  // table 还未被初始化过或者数据被清空,进行初始化。
                n = (tab = resize()).length;
            if ((p = tab[i = (n - 1) & hash]) == null)  // 如果这个表里没有这个 key 的哈希,就把这个键值对存表里,因为是与操作(n是2的幂),所以扩容对位置没有影响1
                tab[i] = newNode(hash, key, value, null);
            else { // 如果表里已经有这个 key 的哈希了,再进行进一步的比对,判断是否存在
                Node<K,V> e; K k;
                if (p.hash == hash && 
                    ((k = p.key) == key || (key != null && key.equals(k)))) // 先比对第一个节点的值
                    e = p;
                else if (p instanceof TreeNode) // 如果第一个节点不同,判断是否是红黑树结构,进行比对
                    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
                else { // 说明是链表结构,进行比对
                    for (int binCount = 0; ; ++binCount) {
                        if ((e = p.next) == null) {  // 如果找不到,那么就将这个键值对存进去,并判断是否到达了要转换成红黑树的条件
                            p.next = newNode(hash, key, value, null);
                            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                                treeifyBin(tab, hash);
                            break;
                        }
                        if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                            break;
                        p = e;
                    }
                }
                if (e != null) { // existing mapping for key
                    V oldValue = e.value;
                    if (!onlyIfAbsent || oldValue == null) // 如果为 onlyIfAbsent 为 true,不改变现在的值
                        e.value = value;
                    afterNodeAccess(e);
                    return oldValue;
                }
            }
            ++modCount;   // 用于记录修改映射的数量,该字段用于使 HashMap 集合视图上的迭代器快速失效
            if (++size > threshold) // 说明找不到该 key 的键值对,就插入进去
                resize();
            afterNodeInsertion(evict);
            return null;
        }

    如果 table 为空的话,我们来看下 resize 方法,蛮长的,初始化或者给表的长度加倍:

        /**
         * Initializes or doubles table size.  If null, allocates in
         * accord with initial capacity target held in field threshold.
         * Otherwise, because we are using power-of-two expansion, the
         * elements from each bin must either stay at same index, or move
         * with a power of two offset in the new table.
         *
         * @return the table
         */
        final Node<K,V>[] resize() {
            Node<K,V>[] oldTab = table;
            int oldCap = (oldTab == null) ? 0 : oldTab.length;
            int oldThr = threshold;
            int newCap, newThr = 0;
            if (oldCap > 0) {    // 原来的 table 有值
                if (oldCap >= MAXIMUM_CAPACITY) { // 容量大于最大容量,阈值设为最大。
                    threshold = Integer.MAX_VALUE;
                    return oldTab;
                }
                else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && // 新容量扩展为原来两倍并且小于最大容量大于初始容量,就把新新的阈值设为原来两倍。
                         oldCap >= DEFAULT_INITIAL_CAPACITY)
                    newThr = oldThr << 1; // double threshold
            }
            else if (oldThr > 0) // initial capacity was placed in threshold // 原来的 table 已经初始化过,但是table 里没有数据,新容量等于原来的阈值。
                newCap = oldThr;
            else {               // zero initial threshold signifies using defaults // table 还没有初始化过,进行初始化。
                newCap = DEFAULT_INITIAL_CAPACITY; 
                newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
            }
            if (newThr == 0) { // 原来的 table 已经初始化过,但是 table 里没有数据,计算一下新阈值。
                float ft = (float)newCap * loadFactor;
                newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                          (int)ft : Integer.MAX_VALUE);
            }
            threshold = newThr;  // 设置阈值
            @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
            table = newTab;
            if (oldTab != null) {   // 如果原来 table 有值,就把值放进新的 table 里.
                for (int j = 0; j < oldCap; ++j) {
                    Node<K,V> e;
                    if ((e = oldTab[j]) != null) {
                        oldTab[j] = null;
                        if (e.next == null)  // 这个节点没有发生过冲突
                            newTab[e.hash & (newCap - 1)] = e;  
                        else if (e instanceof TreeNode) // 发生过冲突,并且是红黑树结构
                            ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                        else { // preserve order  // 发生过冲突,是链表结构
                            Node<K,V> loHead = null, loTail = null;
                            Node<K,V> hiHead = null, hiTail = null;
                            Node<K,V> next;
                            do {
                                next = e.next;
                                if ((e.hash & oldCap) == 0) {    // 分成两段链表,暂时不知道为什么。
                                    if (loTail == null)
                                        loHead = e;
                                    else
                                        loTail.next = e;
                                    loTail = e;
                                }
                                else {
                                    if (hiTail == null)
                                        hiHead = e;
                                    else
                                        hiTail.next = e;
                                    hiTail = e;
                                }
                            } while ((e = next) != null);
                            if (loTail != null) {
                                loTail.next = null;
                                newTab[j] = loHead;
                            }
                            if (hiTail != null) {
                                hiTail.next = null;
                                newTab[j + oldCap] = hiHead;
                            }
                        }
                    }
                }
            }
            return newTab;
        }

    二、我们再来看看常用的 get 和 set 方法(感觉没啥好解释的了):

        public V get(Object key) {
            Node<K,V> e;
            return (e = getNode(hash(key), key)) == null ? null : e.value;
        }
        public V put(K key, V value) {
            return putVal(hash(key), key, value, false, true);
        }
        final Node<K,V> getNode(int hash, Object key) {
            Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
            if ((tab = table) != null && (n = tab.length) > 0 &&
                (first = tab[(n - 1) & hash]) != null) {
                if (first.hash == hash && // always check first node
                    ((k = first.key) == key || (key != null && key.equals(k))))
                    return first;
                if ((e = first.next) != null) {
                    if (first instanceof TreeNode)
                        return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                            return e;
                    } while ((e = e.next) != null);
                }
            }
            return null;
        }

    三、更详细的 HashMap 源码分析

    四、ArrayList 扩容机制

  • 相关阅读:
    【Vue】状态管理
    【Vue】路由
    【Vue】组件
    【Vue】基础(数据 & 计算属性 & 方法)
    【Vue】基础(虚拟DOM & 响应式原理)
    【Vue】基础(生命周期 & 常用指令)
    【Vue】搭建开发环境
    【Mongodb】事务
    【Mongodb】视图 && 索引
    【Mongodb】聚合查询 && 固定集合
  • 原文地址:https://www.cnblogs.com/M-Anonymous/p/13926220.html
Copyright © 2011-2022 走看看