zoukankan      html  css  js  c++  java
  • 一次使用布隆过滤器的经历

    最近有个新需求:判断一个用户是否是新用户。

    为了方便记录,已经将所有的常量都替换过来了。

    首先定义了插入量和错误率:

        /**
         * 预计插入量 默认100000000
         */
        private long expectedInsertions = 100000000L;
        /**
         * 可接受的错误率
         */
        private double fpp = 0.001F;

    bit数组的长度:


    private long numBits = optimalNumOfBits(expectedInsertions, fpp);
        private long optimalNumOfBits(long n, double p) {
            if (p == 0) {
                p = Double.MIN_VALUE;
            }
            return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
        }

    hash函数的个数:

    private int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    
        private int optimalNumOfHashFunctions(long n, long m) {
            return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
        }

    接下来就是将数据加入过滤器中了,这里选用的是redis,并且pipleline来降低redis的并发:

        public void put(String bloomName, String key) {
            long[] indexs = getIndexs(key);
            redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
                Arrays.stream(indexs)
                        .forEach(n -> connection.setBit(bloomName.getBytes(), n, true));
                return null;
            });
        }

    判断用户是否在过滤器中。(有一点不同的是,处理业务put的时候,仅仅是put老用户进过滤器中,这样只要不存在过滤器中的用户,就认为他是新用户,这个错误率会随着时间越来越准确):

        public boolean isExist(String bloomName, String key) {
            long[] indexs = getIndexs(key);
            List list = redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
                for (long index : indexs) {
                    connection.getBit(bloomName.getBytes(), index);
                }
                return null;
            });
            return !list.contains(false);
        }

    接下来就是获取下标的方法了:

        private long[] getIndexs(String key) {
            long hash1 = hash(key);
            long hash2 = hash1 >>> 16;
            long[] result = new long[numHashFunctions];
            for (int i = 0; i < numHashFunctions; i++) {
                long combinedHash = hash1 + i * hash2;
                if (combinedHash < 0) {
                    combinedHash = ~combinedHash;
                }
                result[i] = combinedHash % numBits;
            }
            return result;
        }

    总体来说还是比较简单的,了解布隆过滤器之前也是查阅了很多文档,只要静下心去了解,一切问题都会迎刃而解。

    下面贴全部的代码:

    public class BloomFilter {
        @Autowired
        private RedisTemplate redisTemplate;
        /**
         * 预计插入量 默认100000000
         */
        private long expectedInsertions = 100000000L;
        /**
         * 可接受的错误率
         */
        private double fpp = 0.001F;
    
        /**
         * bit 长度
         */
        private long numBits = optimalNumOfBits(expectedInsertions, fpp);
        /**
         * hash 量
         */
        private int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    
        /**
         * 计算hash函数个数
         *
         * @param n
         * @param m
         * @return
         */
        private int optimalNumOfHashFunctions(long n, long m) {
            return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
        }
    
        /**
         * 计算bit数组长度
         *
         * @param n
         * @param p
         * @return
         */
        private long optimalNumOfBits(long n, double p) {
            if (p == 0) {
                p = Double.MIN_VALUE;
            }
            return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
        }
    
    
        /**
         * 判断keys是否存在于集合bloomName中
         *
         * @param bloomName
         * @param key
         * @return
         */
        public boolean isExist(String bloomName, String key) {
            long[] indexs = getIndexs(key);
            
            List list = redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
                for (long index : indexs) {
                    connection.getBit(bloomName.getBytes(), index);
                }
                return null;
            });
            
            return !list.contains(false);
        }
    
        /**
         * 将key存入redis bitmap
         */
        public void put(String bloomName, String key) {
            long[] indexs = getIndexs(key);
            
            redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
                Arrays.stream(indexs)
                        .forEach(n -> connection.setBit(bloomName.getBytes(), n, true));
                return null;
            });
        }
    
        /**
         * 根据key获取bitmap下标
         */
        private long[] getIndexs(String key) {
            long hash1 = hash(key);
            long hash2 = hash1 >>> 16;
            long[] result = new long[numHashFunctions];
            for (int i = 0; i < numHashFunctions; i++) {
                long combinedHash = hash1 + i * hash2;
                if (combinedHash < 0) {
                    combinedHash = ~combinedHash;
                }
                result[i] = combinedHash % numBits;
            }
            return result;
        }
    
        /**
         * 获取一个hash值
         */
        private long hash(String key) {
            Charset charset = Charset.forName("UTF-8");
            return Hashing.murmur3_128().hashObject(key, Funnels.stringFunnel(charset)).asLong();
        }
    
    }

  • 相关阅读:
    cdn与http缓存
    EntityFramework、Dapper vs 草根框架性能
    docker10件事
    TCP的阻塞和重传
    ngCookies模块
    Net Core- 配置组件
    获取synchronized锁中的阻塞队列中的线程是非公平的
    Java线程并发中常见的锁--自旋锁 偏向锁
    byte为什么要与上0xff(转)
    Tair是一个高性能,分布式,可扩展,高可靠的key/value结构存储系统(转)
  • 原文地址:https://www.cnblogs.com/LvJiaXuanBlogs/p/11585189.html
Copyright © 2011-2022 走看看