Hash table lengths and prime numbers

zoukankan html css js c++ java

Hash table lengths and prime numbers
Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html

This has been bugging me for some time now...

The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that you most probably have read somewhere
- If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good, you will definitely suffer from clustering of many keys in a few hash buckets.
- But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware). And don't even try telling me that the proof is experimental.

Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others). I suggest you think about the solution on your own before reading further...

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering
1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.
2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.
If you need some more kick from hash maps take a look at

Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/

Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html
查看全文

相关阅读:
MIP技术进展月报第3期：MIP小姐姐听说，你想改改MIP官网？
MIP技术进展月报第2期: 数据绑定，异步脚本加速
 WebP 在减少图片体积和流量上的效果如何？MIP技术实践分享
 改造MIP获得搜索青睐，轻松完成SEO
MIP 技术进展月报：储存功能全新上线，MIP-Cache域名升级，校验更严谨
 【转】W3C中国与百度联合组织移动网页加速技术研讨会
 百度将与W3C中国召开MIP技术研讨会
 【公告】MIP组件审核平台故障-影响说明
 【公告】关于8.8MIP组件审核平台故障的说明
 MIP 移动网页加速器视频教程全新发布

原文地址：https://www.cnblogs.com/vigorz/p/10499135.html