zoukankan html css js c++ java

【Java】浅谈HashMap

HashMap是常用的集合类，以Key-Value形式存储值。下面一起从代码层面理解它的实现。

构造方法

它有好几个构造方法，但几乎都是调此构造方法：

    public HashMap(int initialCapacity, float loadFactor) { // 初始容量，过载因子
        if (initialCapacity < 0) // 初始容量<0的异常判断
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY; // 容量的饱顶
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) // 过载因子的范围校验
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity) // 按初始容量找到最近的2的n次方值，为真实的容量。为什么？个人认为因计算下标用&元素效率较高
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); // 计算扩容阀值，容量 * 过载因子
        table = new Entry[capacity]; // 实例化容量的数组
        useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init(); // HashMap构造完毕，还没有插入任何元素的回调方法
    }

放入元素，put(K key, V value)

实际的逻辑在putVal方法：

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value); // 存储在table[0]
        int hash = hash(key); // 计算hash
        int i = indexFor(hash, table.length); // 计算数组下标
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { // 首先判断hash值是否相等（不同hash有可能映射到同一下标），再判断引用是否相等或equal方法相等
                V oldValue = e.value; // 暂存旧值
                e.value = value; // 赋予新值
                e.recordAccess(this); // 调用覆盖值回调方法
                return oldValue; // 返回旧值
            }
        }

        modCount++; // 递增变更次数
        addEntry(hash, key, value, i); // 构造Entry，添加在i下标的链表中
        return null;
    }

通过hash和数组长度计算数组下标，indexFor(int h, int length)

    static int indexFor(int h, int length) {
        return h & (length-1); // hash和数组长度-1做与运算，得到下标
    }

Value被覆盖回调方法，当put(k,v)覆盖原值时调用，recordAccess()

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

结构变更次数，modCount

此字段记录HashMap结构变更次数，如添加新元素、rehash、删除元素。此字段用于迭代器的快速失败机制。

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

添加元素，addEntry()

此方法包含数组是否扩容的判断，如需扩容，会调用扩容方法：

    /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) { // 数组是否扩容的标志：大小是否大于阀值，并且当前下标的链表不为空
            resize(2 * table.length); // 两倍扩容
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length); // 重新计算映射到扩容后数组的下标
        }

        createEntry(hash, key, value, bucketIndex);
    }

实际的创建元素，createEntry()

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex]; // 获取链表首元素
        table[bucketIndex] = new Entry<>(hash, key, value, e); // 构建新节点，其下一节点指向链表首元素，再讲链表首元素指向新元素（从前面插入）
        size++; // 递增容量
    }

数组扩容，resize()

    void resize(int newCapacity) {
        Entry[] oldTable = table; // 暂存原数组
        int oldCapacity = oldTable.length; // 暂存原数组容量
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity]; // 实例化新容量的数组
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing; // 是否重新hash
        transfer(newTable, rehash); // 转移所有元素到新数组
        table = newTable; // 正式使用新数组
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); // 重新计算阀值
    }

转移所有元素到新数组

逐个遍历，映射到新数组的链表中：

    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) { // 遍历数组
            while(null != e) { // 遍历链表
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key); // 重新hash
                }
                int i = indexFor(e.hash, newCapacity); // 重新计算下标
                e.next = newTable[i]; // 当前节点的下一节点指向链表首元素（在链表前插入）
                newTable[i] = e; // 链表首元素指向当前节点
                e = next;
            }
        }
    }

删除元素，remove()

删除元素的入口如下，其实质调用removeEntryForKey方法：

    public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }

真实的删除元素，removeEntryForKey()

    final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key); // 计算hash值
        int i = indexFor(hash, table.length); // 计算下标
        Entry<K,V> prev = table[i]; // 该下标的链表首元素
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++; // 删除元素，也属于结构变化
                size--; // 容量减一
                if (prev == e) // 如果当前元素是链表首元素
                    table[i] = next; // 链表首元素指向当前节点的下一节点
                else
                    prev.next = next; // 当前节点的前一节点的next指向当前节点的下一节点（删除当前节点，即跳过当前节点）
                e.recordRemoval(this); // 删除后的回调方法
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

获取元素，get()

    public V get(Object key) {
        if (key == null)
            return getForNullKey(); // 在table[0]的下标寻找
        Entry<K,V> entry = getEntry(key); // 计算下标、遍历链表对比（与之前的put、remove方法找元素类似）

        return null == entry ? null : entry.getValue();
    }

小疑问

计算最接近的2的n次方，roundUpToPowerOf2(int number)

这个方法是计算number最接近的2的N次方数。
其中Integer.highestOneBit()是取最高位1对应的数，如果是正数，返回的是最接近的比它小的2的N次方；如果是负数，返回的是-2147483648，即Integer的最小值。
那为什么要先减1，再求highestOneBit()？
举几个数的二进制就知道了：
00001111 = 15 -> 00011110 = 30 -> highestOneBit(30) = 16
00010000 = 16 -> 00100000 = 32 -> highestOneBit(32) = 32
所以，为了获取number最接近的2的N次方数，就先减一。

private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
            ? MAXIMUM_CAPACITY
            : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}

计算映射到指定范围的下标，indexFor(int h, int length)

将h映射到length的范围里，效果就像求模。

return h & (length-1);

将h和length - 1和操作就可以了。
比如length为16，那么：
16 = 00010000
15 = 00001111

为什么hash数组的长度要弄成2的N次方？

要将散列值映射到一定范围内，一般来说有2种方法，一是求模，二是与2的N次方值作&运算。而现代CPU对除法、求模运算的效率不算高，所以用第二种方法会效率比较高，所以数组被设计为2的N次方。

拓展：LinkedHashMap

见此类的声明可知其继承自HashMap，而实际的存储逻辑也是由HashMap提供：

public class LinkedHashMap<K,V>
    extends HashMap<K,V>
    implements Map<K,V>

链表的维护顺序

而LinkedHashMap中维护了遍历的顺序，是通过另外的双向链表维护的，比如，链表首元素：

    /**
     * The head of the doubly linked list.
     */
    private transient Entry<K,V> header;

元素之间的指向：

        // These fields comprise the doubly linked list used for iteration.
        Entry<K,V> before, after;

用此字段表示链表维护的顺序，true表示访问顺序，false表示插入顺序：

    private final boolean accessOrder;

放入元素

覆盖了HashMap的addEntry和createEntry方法：

    /**
     * This override alters behavior of superclass put method. It causes newly
     * allocated entry to get inserted at the end of the linked list and
     * removes the eldest entry if appropriate.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        super.addEntry(hash, key, value, bucketIndex); // 沿用HashMap的逻辑

        // Remove eldest entry if instructed
        Entry<K,V> eldest = header.after;
        if (removeEldestEntry(eldest)) { // 是否删除最老元素（LRU原则）
            removeEntryForKey(eldest.key); // 删除最老元素
        }
    }

    /**
     * This override differs from addEntry in that it doesn't resize the
     * table or remove the eldest entry.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        HashMap.Entry<K,V> old = table[bucketIndex];
        Entry<K,V> e = new Entry<>(hash, key, value, old);
        table[bucketIndex] = e;
        e.addBefore(header); // 插入到Header节点前
        size++;
    }

        /**
         * Inserts this entry before the specified existing entry in the list.
         */
        private void addBefore(Entry<K,V> existingEntry) {
            after  = existingEntry; // 指定节点的后节点
            before = existingEntry.before; // 指定节点的前节点
            before.after = this; // 将当前节点赋予前节点的后节点赋值
            after.before = this; // 将当前节点赋予后节点的前节点赋值
        }

获取元素

    public V get(Object key) {
        Entry<K,V> e = (Entry<K,V>)getEntry(key);
        if (e == null)
            return null;
        e.recordAccess(this); // 维护链表的顺序
        return e.value;
    }

        void recordAccess(HashMap<K,V> m) {
            LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
            if (lm.accessOrder) { // 如果按访问顺序记录
                lm.modCount++;
                remove(); // 删除当前节点
                addBefore(lm.header); // 将当前节点加入到列表头
            }
        }

        /**
         * Removes this entry from the linked list.
         */
        private void remove() {
            before.after = after; // 将“当前节点的后节点”赋予“当前节点的前节点的后节点”
            after.before = before; // 将“当前节点的前节点”赋予“当前节点的后节点的前节点”
        }

查看全文

相关阅读:
[ERR] Node 10.211.55.8:7001 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
PAT A1137 Final Grading （25 分）——排序
 PAT A1136 A Delayed Palindrome （20 分）——回文，大整数
 PAT A1134 Vertex Cover （25 分）——图遍历
 PAT A1133 Splitting A Linked List （25 分）——链表
 PAT A1132 Cut Integer （20 分）——数学题
 PAT A1130 Infix Expression （25 分）——中序遍历
 PAT A1142 Maximal Clique （25 分）——图
 PAT A1141 PAT Ranking of Institutions （25 分）——排序，结构体初始化
 PAT A1140 Look-and-say Sequence （20 分）——数学题

原文地址：https://www.cnblogs.com/nick-huang/p/7405015.html