zoukankan      html  css  js  c++  java
  • 记一次Hbase的行键过滤器事故问题

    数据总数:746条数据

    因为后面需要进行算法合成,而且spark目前对这种算法支持并不好,因此采用代码编写,所以在查询hbase的过程中采用的是java直接查询,

    但是为了加快查询速度,我尽可能的使用了过滤器

    1:初期Hbase的rowkey组合:时间+"_"+订单id

    查询思路:

    1:能快速检索,减少GC,采用过滤器

    2:支持时间段查询

    根据上面两点,我采用时间过滤,比如:startTime=201904010000  endTime=201904180000|;【注意这个符号:“|” 】然后根据行键过滤器

    CompareFilter.CompareOp.GREATER_OR_EQUAL和
    CompareFilter.CompareOp.LESS_OR_EQUAL进行大小对比

    使用代码在查询的时候,添加了行键过滤器

     FilterList filterList=new FilterList();
                //time+id
                if(startTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
                            new BinaryComparator(Bytes.toBytes(startTime)));
                    filterList.addFilter(rf);
                }
                if(endTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
                            new BinaryComparator(Bytes.toBytes(endTime)));
                    filterList.addFilter(rf);
                }
                scan.setFilter(filterList);

    完整代码:

     /**
         * 行键过滤器
         * */
        public static List<Map<String , String>> rowFilter(String tableName , String startTime , String endTime){
            Connection connection = null;
            Scan scan = new Scan();
            scan.setCacheBlocks(false);
            ResultScanner rs = null;
            Table table = null;
            List<Map<String , String>> list = new ArrayList<Map<String , String>>();
            try{
                connection = ConnectionFactory.createConnection(config);
                table = connection.getTable(TableName.valueOf(tableName));
                FilterList filterList=new FilterList();
                //time+id
                if(startTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
                            new BinaryComparator(Bytes.toBytes(startTime)));
                    filterList.addFilter(rf);
                }
                if(endTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
                            new BinaryComparator(Bytes.toBytes(endTime)));
                    filterList.addFilter(rf);
                }
                scan.setFilter(filterList);
                rs = table.getScanner(scan);
                for (Result r : rs) {
                    Map<String , String> map = new HashMap<String , String>();
                    for (Cell cell : r.listCells()) {
                        map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())
                                , Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                    }
                    list.add(map);
                }
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                if (null != rs) {
                    rs.close();
                }
                try {
                    if (null != table) {
                        table.close();
                    }
                    if (null != connection && !connection.isClosed()) {
                        System.out.println("scan Result is closed");
                        connection.close();
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            return list;
    
    
    
        }
    初期完整代码

    那么这种方案查询后返回的结果是:361条数据! 实际Hbase测试表中有746条数据,那么可以肯定,是行键过滤器出错了(后面再研究为啥出错)

    改善:

    更改rowkey结构,采用:订单id+"_"+time来实现

    然后过滤器代码改善:

     FilterList filterList=new FilterList();
                //id+time
                if(startTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
                            new RegexStringComparator(".*_"+startTime));
                    filterList.addFilter(rf);
                }
                if(endTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
                            new RegexStringComparator(".*_"+endTime));
                    filterList.addFilter(rf);
                }
                scan.setFilter(filterList);

    上面其实就是采用正则表达式进行后缀识别,这样我就可以根据后缀进行时间过滤

    完整代码:

        /**
         * 行键过滤器
         * */
        public static List<Map<String , String>> rowEndFilter(String tableName , String startTime , String endTime){
            Connection connection = null;
            Scan scan = new Scan();
            scan.setCacheBlocks(false);
            ResultScanner rs = null;
            Table table = null;
            List<Map<String , String>> list = new ArrayList<Map<String , String>>();
            try{
                connection = ConnectionFactory.createConnection(config);
                table = connection.getTable(TableName.valueOf(tableName));
                FilterList filterList=new FilterList();
                //id+time
                if(startTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
                            new RegexStringComparator(".*_"+startTime));
                    filterList.addFilter(rf);
                }
                if(endTime != null){
                    RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
                            new RegexStringComparator(".*_"+endTime));
                    filterList.addFilter(rf);
                }
                scan.setFilter(filterList);
                rs = table.getScanner(scan);
                for (Result r : rs) {
                    Map<String , String> map = new HashMap<String , String>();
                    for (Cell cell : r.listCells()) {
                        map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())
                                , Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                    }
                    list.add(map);
                }
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                if (null != rs) {
                    rs.close();
                }
                try {
                    if (null != table) {
                        table.close();
                    }
                    if (null != connection && !connection.isClosed()) {
                        System.out.println("scan Result is closed");
                        connection.close();
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            return list;
    
    
    
        }
    View Code

    上面就会查询出完整数据。

  • 相关阅读:
    计算机体系结构的铁律(iron law)
    PHP 画图——使用jpgraph画图
    理解Paxos Made Practical
    【bzoj1015】【JSOI2008】【星球大战】【并查集+离线】
    Spark调研笔记第3篇
    hduoj2094产生冠军
    使用HD/IDE层的ioctl接口获取磁盘容量get_hdd_max_sector
    给GridView设置行高
    tomcat的一些简单配置
    【JavaScript】--JavaScript总结一览无余
  • 原文地址:https://www.cnblogs.com/niutao/p/10733272.html
Copyright © 2011-2022 走看看