zoukankan      html  css  js  c++  java
  • MinHash算法+实现

    参考:

    原理ppt:  http://wenku.baidu.com/view/089e85c42cc58bd63186bdfc.html 

    求解实现算法: http://fuliang.iteye.com/blog/1025638​ 最后部分. 感谢原作者.   算法原理+数学证明见原文.

    简单的实现python代码如下:

     1 import os
    2 import sys
    3
    4
    5 def hash_func_demo1(x):
    6 return x % 5
    7
    8 def hash_func_demo2(x):
    9 return (2 * x + 1) % 5
    10
    11
    12 ### data:[C1, C2, C3, ... CM]; C1:[a1, a2, a3 ... an]. thus D: n * m
    13 ### hash_funcs;[h1, h2, ..., hr]
    14 ### return: r * m matrix
    15 def min_hash(data, hash_funcs):
    16 MAX = 100000000
    17 M, N, R = len(data), len(data[0]), len(hash_funcs)
    18
    19 rt = []
    20 for i in range(0, R):
    21 rt.append(map(lambda x : x, [MAX] * M ))
    22
    23 for r in range(0, N):
    24 hashes = map(lambda x : x(r + 1), hash_funcs)
    25 for col in range(0, M):
    26 if data[col][r] == 0:
    27 continue
    28 for k in range(0, R):
    29 rt[col][k] = min(rt[col][k], hashes[k])
    30
    31 return rt
    32
    33
    34 if __name__ == "__main__":
    35 data = [[1, 0, 1, 1, 0],
    36 [0, 1, 1, 0, 1],
    37 ]
    38
    39 hash_funcs = [hash_func_demo1, hash_func_demo2]
    40 rt = min_hash(data, hash_funcs)
    41 print rt



  • 相关阅读:
    Rom定制
    android home键2
    蓝牙分享
    关闭系统锁屏
    android home键
    android view 背景重复
    android 找开软件所在市场页面
    jquery 选项卡
    ajaxfileupload ie 多参数
    找回 ie 图标
  • 原文地址:https://www.cnblogs.com/foreveryl/p/2370490.html
Copyright © 2011-2022 走看看