zoukankan      html  css  js  c++  java
  • bloom filter + murmurhash

    是一种hash方法,其实核心思想就是,将一个字符串通过多个普通hash函数映射到hash表上,然后再进行检索的时候同样计算hash函数,如果全都都hash表上出现过,那么说明有极大的可能出现过,如果没有出现那么一定没有出现过。

    可以通过公式计算采取多少个普通hash函数和多大映射空间使正确率变得最低。

    有一个称为万能hash函数的,看了下简介看起来很屌,所以纪录下来。

    //-----------------------------------------------------------------------------
    // MurmurHash2, 64-bit versions, by Austin Appleby
    
    // The same caveats as 32-bit MurmurHash2 apply here - beware of alignment 
    // and endian-ness issues if used across multiple platforms.
    
    typedef unsigned long int uint64_t;
    
    // 64-bit hash for 64-bit platforms
    uint64_t MurmurHash64A ( const void * key, int len, unsigned int seed )
    {
            const uint64_t m = 0xc6a4a7935bd1e995;
            const int r = 47;
    
            uint64_t h = seed ^ (len * m);
    
            const uint64_t * data = (const uint64_t *)key;
            const uint64_t * end = data + (len/8);
    
            while(data != end)
            {
                    uint64_t k = *data++;
    
                    k *= m; 
                    k ^= k >> r; 
                    k *= m; 
    
                    h ^= k;
                    h *= m; 
            }
    
            const unsigned char * data2 = (const unsigned char*)data;
    
            switch(len & 7)
            {
            case 7: h ^= uint64_t(data2[6]) << 48;
            case 6: h ^= uint64_t(data2[5]) << 40;
            case 5: h ^= uint64_t(data2[4]) << 32;
            case 4: h ^= uint64_t(data2[3]) << 24;
            case 3: h ^= uint64_t(data2[2]) << 16;
            case 2: h ^= uint64_t(data2[1]) << 8;
            case 1: h ^= uint64_t(data2[0]);
                    h *= m;
            };
     
            h ^= h >> r;
            h *= m;
            h ^= h >> r;
    
            return h;
    } 
    
    
    // 64-bit hash for 32-bit platforms
    uint64_t MurmurHash64B ( const void * key, int len, unsigned int seed )
    {
            const unsigned int m = 0x5bd1e995;
            const int r = 24;
    
            unsigned int h1 = seed ^ len;
            unsigned int h2 = 0;
    
            const unsigned int * data = (const unsigned int *)key;
    
            while(len >= 8)
            {
                    unsigned int k1 = *data++;
                    k1 *= m; k1 ^= k1 >> r; k1 *= m;
                    h1 *= m; h1 ^= k1;
                    len -= 4;
    
                    unsigned int k2 = *data++;
                    k2 *= m; k2 ^= k2 >> r; k2 *= m;
                    h2 *= m; h2 ^= k2;
                    len -= 4;
            }
    
            if(len >= 4)
            {
                    unsigned int k1 = *data++;
                    k1 *= m; k1 ^= k1 >> r; k1 *= m;
                    h1 *= m; h1 ^= k1;
                    len -= 4;
            }
    
            switch(len)
            {
            case 3: h2 ^= ((unsigned char*)data)[2] << 16;
            case 2: h2 ^= ((unsigned char*)data)[1] << 8;
            case 1: h2 ^= ((unsigned char*)data)[0];
                            h2 *= m;
            };
    
            h1 ^= h2 >> 18; h1 *= m;
            h2 ^= h1 >> 22; h2 *= m;
            h1 ^= h2 >> 17; h1 *= m;
            h2 ^= h1 >> 19; h2 *= m;
    
            uint64_t h = h1;
    
            h = (h << 32) | h2;
    
            return h;
    } 
  • 相关阅读:
    Windows Azure Cloud Service (14) 使用Windows Azure诊断收集日志记录数据
    Windows Azure Cloud Service (13) 用Visual Studio 2010 将应用程序部署到Windows Azure平台
    Windows Azure Cloud Service (15) 多个VM Instance场景下如何处理ASP.NET Session
    Windows Azure Storage (5) Windows Azure Drive
    Windows Azure Storage (7) 使用工具管理Windows Azure Storage
    SQL Azure(二) SQL Azure vs SQL Server
    webbrowser的自动提交
    提取视频的背景声音的软件
    Listview列排序的bug原因
    两个奇怪的问题
  • 原文地址:https://www.cnblogs.com/chenhuan001/p/4885630.html
Copyright © 2011-2022 走看看