Java中String的hash函数分析

zoukankan html css js c++ java

Java中String的hash函数分析
转载自：http://blog.csdn.net/hengyunabc/article/details/7198533

JDK6的源码：
[java] view plain copy

   /**
    * Returns a hash code for this string. The hash code for a
    * <code>String</code> object is computed as
    * <blockquote><pre>
    * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
    * </pre></blockquote>
    * using <code>int</code> arithmetic, where <code>s[i]</code> is the
    * <i>i</i>th character of the string, <code>n</code> is the length of
    * the string, and <code>^</code> indicates exponentiation.
    * (The hash value of the empty string is zero.)
    *
    * @return  a hash code value for this object.
    */
   public int hashCode() {
int h = hash;
if (h == 0) {
    int off = offset;
    char val[] = value;
    int len = count;

           for (int i = 0; i < len; i++) {
               h = 31*h + val[off++];
           }
           hash = h;
       }
       return h;
   }
以字符串"123"为例：

字符'1'的ascii码是49

hashCode = （49*31 + 50）*31 + 51

或者这样看：

hashCode=（'1' * 31 + '2' ） * 31 + '3'

可见实际可以看作是一种权重的算法，在前面的字符的权重大。

这样有个明显的好处，就是前缀相同的字符串的hash值都落在邻近的区间。

好处有两点：

1.可以节省内存，因为hash值在相邻，这样hash的数组可以比较小。比如当用HashMap，以String为key时。

2.hash值相邻，如果存放在容器，比好HashSet，HashMap中时，实际存放的内存的位置也相邻，则存取的效率也高。（程序局部性原理）

以31为倍数，原因了31的二进制全是1，则可以有效地离散数据。

最后看下，两个字符串，由Eclipse生成的代码是如何计算hash值的：
[java] view plain copy

public class Name{
    String firstName;
    String lastName;
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((firstName == null) ? 0 : firstName.hashCode());
        result = prime * result
                + ((lastName == null) ? 0 : lastName.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Name other = (Name) obj;
        if (firstName == null) {
            if (other.firstName != null)
                return false;
        } else if (!firstName.equals(other.firstName))
            return false;
        if (lastName == null) {
            if (other.lastName != null)
                return false;
        } else if (!lastName.equals(other.lastName))
            return false;
        return true;
    }
}
可见，还是以31为倍数， hashCode = firstName.hashCode() * 31 + lastName.hashCode() 。

BTW：Java的字符串的hash做了缓存，第一次才会真正算，以后都是取缓存值。

eclipse生成的equals函数质量也很高，各种情况都考虑到了。

总结：字符串hash函数，不仅要减少冲突，而且要注意相同前缀的字符串生成的hash值要相邻。
查看全文

相关阅读:
Kafka科普系列 | Kafka中的事务是什么样子的？
RabbitMQ和Kafka,更加便捷高效的消息队列使用方式，请放心食用
 艰涩难懂，不存在的，消息队列其实很简单
 这七个关于分布式消息服务的常见问题，你知道吗？
别再犯低级错误，带你了解更新缓存的四种Desigh Pattern
详细介绍redis的集群功能，带你了解真正意义上的分布式
 教你简单理解分布式与传统单体架构的区别
 新手向：从不同的角度来详细分析Redis
Java多线程Runnable与Callable区别与拓展
 项目中是用eCharts

原文地址：https://www.cnblogs.com/ycpanda/p/3637288.html