zoukankan      html  css  js  c++  java
  • python pybloom

    1.安装:

    pip install pybloom

    or:

    https://pypi.python.org/pypi/pybloom/1.0.2

    2.使用:

    from pybloom import BloomFilter

    bl = BloomFilter(capacity=10000, error_rate=0.001) #容器大小10000条,错误率为0.001

    for i in datalist:

        bl.add(i)

    for i in newdata:

        if i in bl:

            print 'has this data'

        else:

           bl.add(i)

     -----------

    	try:
    		bl = BloomFilter(capacity=1000, error_rate=0.001)
    		with open('allfile','a+') as fd:
    			[bl.add(x)for x in fd.readlines()]
    			if os.path.isdir(path):
    				filelist = os.listdir(path)
    				for i in filelist:
    					with open(path+'/'+i,'r') as fdd:
    						for c in fdd.readlines():
    							con = c.strip('
    ')
    							url = urlparse(con)
    							print url.netloc
    							if url.netloc in bl:
    								pass
    							else:
    								bl.add(url.netloc)
    								fd.write(url.netloc+'
    ')
    								fd.flush()
    
    			elif os.path.isfile(path):
    				print 'file..'
    			
    	except Exception,e:
    		print str(e)
    

      

    ------------

  • 相关阅读:
    mysql 统计数据库基本资源sql
    java ffmpeg (Linux)截取视频做封面
    shutil模块
    json模块与pickle模块
    hashlib模块
    sys模块
    os模块
    paramiko模块
    Python reduce() 函数
    瀑布流展示图片
  • 原文地址:https://www.cnblogs.com/spacepirate/p/8084049.html
Copyright © 2011-2022 走看看