zoukankan      html  css  js  c++  java
  • 使用bitset实现毫秒级查询

    前言

    因为业务要求api的一次请求响应时间在10ms以内,所以传统的数据库查询操作直接被排除(网络io和磁盘io)。通过调研,最终使用了bitset,目前已经正常运行了很久


    bitset介绍

    看JDK中的解释简直一头雾水,用我自己的理解概括一下

    1. bitset的内部实现是long数组
    2. set中每一个位的默认值为false(0)
    3. bitset长度按需增长
    4. bitset非线程安全

    bitset关键方法分析
        /**
         * Sets the bit at the specified index to {@code true}.
         *
         * @param  bitIndex a bit index
         * @throws IndexOutOfBoundsException if the specified index is negative
         * @since  JDK1.0
         */
        public void set(int bitIndex) {
            if (bitIndex < 0)
                throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
    
            int wordIndex = wordIndex(bitIndex);
            expandTo(wordIndex);
    
            words[wordIndex] |= (1L << bitIndex); // Restores invariants
    
            checkInvariants();
        }
    

    设置指定“位”为true,bitIndex参数为非负整数。假设我们执行以下代码,观察上面代码中worIndex,words[wordIndex]值的变化

    BitSet bs = new BitSet()
    bs.set(0);
    bs.set(1);
    bs.set(2);
    bs.set(3);
    bs.set(4);
    
    bitIndex wordIndex words[wordIndex] words[wordIndex]二进制表示
    0 0 1 0001
    1 0 3 0011
    2 0 7 0111
    3 0 15 1111
    4 0 31 0001 1111

    通过上表,我们可以很清晰的根据bitIndex和words[wordIndex]二进制值的对应关系,得到一个结论,即:bitset中每一个long可以表示64个非负整数在bitSet中存在与否。例如:0001可以表示整数0在bitset中存在,1111可以表示整数3,2,1,0在bitset中存在。


    进入正题,实现bitset毫秒级查询

    想象一个场景,我们有一张user表,name唯一。

    CREATE TABLE `user` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `name` varchar(255) NOT NULL,
      `address` varchar(255) NOT NULL comment '地址',
      `gender` varchar(10) NOT NULL comment '性别',
      `age` varchar(10) NOT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `name` (`name`)
    ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
    

    假设我们要查询“北京市18岁的女生”,那么对应的sql如下:

    select * from `user` where address='北京' and age='18' and gender='girl';
    

    如何使用bitset实现同样的查询呢?

    1. 将user表数据加载进内存中
    2. 为user表建立address,age,gender维度的bitset索引
    3. 根据索引查询数据
    1.将user表数据加载进内存中

    将user表从数据库读取出来存入List

    User实体类:

    public class User implements Cloneable {
    	private String name;
    	private String address;
    	private String gender;
    	private String age;
    
    	@Override
    	public String toString() {
    		return "User [name=" + name + ", address=" + address + ", gender=" + gender + ", age=" + age + "]";
    	}
    
    	@Override
    	public User clone() {
    		User user = null;
    		try {
    			user = (User) super.clone();
    		} catch (CloneNotSupportedException e) {
    			e.printStackTrace();
    		}
    		return user;
    	}
        //省略get set 方法。。。
    
    
    
    2.建立索引

    创建bitset索引模型类

    public class BitSetIndexModel {
    	private String type;//索引类型:address,age,gender
    	private ConcurrentHashMap<String, Integer> map;//索引类型和bitSet在bsList中下标的映射关系
    	private List<String> list;//索引类型的值集合,例如gender:girl,boy
    	private List<BitSet> bsList;//bitset集合
    
    	public BitSetIndex() {
    		
    	}
    
    	public BitSetIndexModel(String type) {
    		this.type = type;
    		map = new ConcurrentHashMap<String, Integer>();
    		list = new ArrayList<String>();
    		bsList = new ArrayList<BitSet>();
    	}
    	
    	public String getType() {
    		return type;
    	}
    
    	public void setType(String type) {
    		this.type = type;
    	}
    
    	public Map<String, Integer> getMap() {
    		return map;
    	}
    
    	public void setMap(ConcurrentHashMap<String, Integer> map) {
    		this.map = map;
    	}
    
    	public List<String> getList() {
    		return list;
    	}
    
    	public void setList(List<String> list) {
    		this.list = list;
    	}
    
    	public List<BitSet> getBsList() {
    		return bsList;
    	}
    
    	public void setBsList(List<BitSet> bsList) {
    		this.bsList = bsList;
    	}
    
    	/**
    	 *
    	 * @param str
    	 * @param i
    	 */
    	public void createIndex(String str, int i) {
    		BitSet bitset = null;
    		//获取‘str’对应的bitset在bsList中的下标
    		Integer index = this.getMap().get(str);
    		if (index != null) {
    			bitset = this.getBsList().get(index);
    			if (bitset == null) {
    				bitset = new BitSet();
    				this.getBsList().add(index, bitset);
    			}
    			bitset.set(i, true);// 将str对应的位置为true,true可省略
    		} else {
    			bitset = new BitSet();
    			List<String> list = this.getList();
    			list.add(str);
    			index = list.size() - 1;
    			bitset.set(i, true);
    			this.getBsList().add(bitset);
    			this.getMap().put(str, index);
    		}
    	}
    	
    	/**
    	 * 从entity里拿出符合条件的bitset
    	 *
    	 * @param str
    	 * @return
    	 */
    	public BitSet get(String str) {
    		BitSet bitset = null;
    		str = str.toLowerCase();
    		Integer index = this.getMap().get(str);
    
    		if (index != null) {
    			bitset = this.getBsList().get(index);
    		} else {
    			bitset = new BitSet();
    		}
    		return bitset;
    	}
    
    	/**
    	 * bitset的与运算
    	 *
    	 * @param str
    	 * @param bitset
    	 * @return
    	 */
    	public BitSet and(String str, BitSet bitset) {
    		if (str != null) {
    			str = str.toLowerCase();
    			if (bitset != null) {
    				bitset.and(get(str));
    			} else {
    				bitset = new BitSet();
    				bitset.or(get(str));
    			}
    		}
    		return bitset;
    	}
    
    	/**
    	 * bitset的或运算
    	 *
    	 * @param str
    	 * @param bitset
    	 * @return
    	 */
    	public BitSet or(String str, BitSet bitset) {
    		if (str != null) {
    			str = str.toLowerCase();
    			if (bitset != null) {
    				bitset.or(get(str));
    			} else {
    				bitset = new BitSet();
    				bitset.or(get(str));
    			}
    		}
    		return bitset;
    	}
    
    	/**
    	 * 获取bitset值为true的 即 把 bitset翻译为list的索引
    	 *
    	 * @param bitset
    	 * @return
    	 */
    	public static List<Integer> getRealIndexs(BitSet bitset) {
    		List<Integer> indexs = new ArrayList<Integer>();
    		if (bitset != null) {
    			int i = bitset.nextSetBit(0);
    			if (i != -1) {
    				indexs.add(i);
    				for (i = bitset.nextSetBit(i + 1); i >= 0; i = bitset.nextSetBit(i + 1)) {
    					int endOfRun = bitset.nextClearBit(i);
    					do {
    						indexs.add(i);
    					} while (++i < endOfRun);
    				}
    			}
    		}
    
    		return indexs;
    	}
    	
    }
    
    
    

    为每一个user对象创建address,gender,age维度索引

    public class UserIndexStore {
    
    	private static final String ADDRESS = "address";
    	private static final String GENDER = "gender";
    	private static final String AGE = "age";
    	private BitSetIndexModel address;
    	private BitSetIndexModel gender;
    	private BitSetIndexModel age;
    	private ConcurrentHashMap<Integer, User> userMap;//存储所有的user数据
            private ConcurrentHashMap<String, Integer> nameIndexMap;//name和index映射
    
    	public static final UserIndexStore INSTANCE = getInstance();
    
    	private UserIndexStore() {
    		init();
    	}
    
    	public static UserIndexStore getInstance() {
    		return UserIndexStoreHolder.instance;
    	}
    
    	private static class UserIndexStoreHolder {
    		private static UserIndexStore instance = new UserIndexStore();
    	}
    
    	private void init() {
    		this.address = new BitSetIndexModel(ADDRESS);
    		this.gender = new BitSetIndexModel(GENDER);
    		this.age = new BitSetIndexModel(AGE);
    		userMap = new ConcurrentHashMap<Integer, User>();
                    nameIndexMap =  new ConcurrentHashMap<String, Integer>();
    	}
    
    	/**
    	 * 构建索引
    	 * @param users 
    	 */
    	public void createIndex(List<User> users) {
    		if (users != null && users.size() > 0) {
    			for (int index = 0; index < users.size(); index++) {
    				User user = users.get(index);
    				createIndex(user, index);
    			}
    		}
    	}
    
    	private void createIndex(User user, int index) {
    		getAddress().update(user.getAddress(), index);
    		getGender().update(user.getGender(), index);
    		getAge().update(user.getAge(), index);
    		this.userMap.put(index, user);
    		this.nameIndexMap.put(user.getName(), index);
    	}
    
    	public BitSet query(String address, String gender, String age) {
    		BitSet bitset = null;
    		bitset = getAddress().and(address, bitset);
    		bitset = getGender().and(gender, bitset);
    		bitset = getAge().and(age, bitset);
    		return bitset;
    	}
    
    	public User findUser(Integer index) {
    		User user = this.userMap.get(index);
    		if (user != null) {
    			return user.clone();//可能会对user做修改操作,要保证内存原始数据不变
    		}
    		return null;
    	}
    
    	public BitSetIndexModel getAddress() {
    		return address;
    	}
    
    	public void setAddress(BitSetIndexModel address) {
    		this.address = address;
    	}
    
    	public BitSetIndexModel getGender() {
    		return gender;
    	}
    
    	public void setGender(BitSetIndexModel gender) {
    		this.gender = gender;
    	}
    
    	public BitSetIndexModel getAge() {
    		return age;
    	}
    
    	public void setAge(BitSetIndexModel age) {
    		this.age = age;
    	}
    
    }
    
    
    3.测试bitset
    public class BitSetTest {
    
    	public static void main(String[] args) {
    		List<User> users = buildData();
    		UserIndexStore.getInstance().createIndex(users);
    		ExecutorService executorService = Executors.newFixedThreadPool(20);
    		int num = 2000;
    		long begin1 = System.currentTimeMillis();
    		for (int i = 0; i < num; i++) {
    			Runnable syncRunnable = new Runnable() {
    				@Override
    				public void run() {
    					List<Integer> indexs = BitSetIndexModel.getRealIndexs(UserIndexStore.getInstance().query("北京", "girl", "18"));
    					for (Integer index : indexs) {
    						UserIndexStore.getInstance().findUser(index);
    					}
    				}
    			};
    			executorService.execute(syncRunnable);
    		}
    		executorService.shutdown();
    		while (true) {
    			try {
    				if (executorService.awaitTermination(1, TimeUnit.SECONDS)) {
    					System.out.println("单次查询时间为:" + (System.currentTimeMillis() - begin1) / num + "ms");
    					break;
    				}
    			} catch (InterruptedException e) {
    				e.printStackTrace();
    			}
    		}
    	}
    
    	private static List<User> buildData() {
    		String[] addresss = { "北京", "上海", "深圳" };
    		String[] ages = { "16", "17", "18" };
    		List<User> users = new ArrayList<>();
    		for (int i = 0; i < 200000; i++) {
    			User user = new User();
    			user.setName("name" + i);
    			int rand = ThreadLocalRandom.current().nextInt(3);
    			user.setAddress(addresss[ThreadLocalRandom.current().nextInt(3)]);
    			user.setGender((rand & 1) == 0 ? "girl" : "boy");
    			user.setAge(ages[ThreadLocalRandom.current().nextInt(3)]);
    			users.add(user);
    		}
    		return users;
    	}
    
    }
    
    

    测试结果(查询2w次):

    数据量(users.size()) 并发数 平均查询时间
    20w 10 1ms
    50w 20 3ms
    100w 50 9ms

    测试机为thinkpad x240 i5 8g内存

    4.总结

    优点
    通过测试发现随着数据量的增大和并发数的提高,平均耗时并没有明显升高,并且响应时间都在10ms以内

    缺点

    1. 不适合数据量太大的情况,因为需要把数据全部加载进内存
    2. 不适合复杂查询
    3. 不适合对name,id等唯一值做查询
    后记

    因为我们的查询业务比较简单,唯一的要求是速度,并且数据量也不大,每张表的数据量都不超过100w,所以使用这种方式比较合适。
    在本篇文章中只谈到了如何创建索引,以及最基本的查询,在下一篇中我会继续说明如何更新索引,以及一些复杂查询,比如<,>,between and。

    转载请注明出处

  • 相关阅读:
    shell脚本编程练习
    linux中()、[]、{}、(())、[[]]等各种括号的使用
    Linux Shell 变量自加
    while read line [linux] shell 学习
    踢出某正在访问的用户||永久禁止某IP访问
    linux设置 自定义脚本开机启动
    syntax error: unexpected end of file完美解决方案
    Linux利用nc命令脚本批量检测服务器指定端口是否开放
    41-贪心算法
    38-动态规划
  • 原文地址:https://www.cnblogs.com/ncbest/p/7703615.html
Copyright © 2011-2022 走看看