zoukankan      html  css  js  c++  java
  • HBase Filter 过滤器之 DependentColumnFilter 详解

    前言:本文详细介绍了 HBase DependentColumnFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考。DependentColumnFilter 也称参考列过滤器,是一种允许用户指定一个参考列或引用列来过滤其他列的过滤器,过滤的原则是基于参考列的时间戳来进行筛选。

    该过滤器尝试找到该列所在的每一行,并返回该行具有相同时间戳的全部键值对;如果某行不包含这个指定的列,则什么都不返回。参数dropDependentColumn 决定参考列被返回还是丢弃,为true时表示参考列被返回,为false时表示被丢弃。可以把DependentColumnFilter理解为一个valueFilter和一个时间戳过滤器的组合。如果想要获取同一时间线的数据可以考虑使用此过滤器。比较器细节及原理请参照之前的更文:HBase Filter 过滤器之比较器 Comparator 原理及源码学习

    一。Java Api

    头部代码

    public class DependentColumnFilterDemo {
    
        private static boolean isok = false;
        private static String tableName = "test";
        private static String[] cfs = new String[]{"f1", "f2"};
        private static String[] data1 = new String[]{"row-1:f2:c3:1234abc56", "row-3:f1:c3:1234321"};
        private static String[] data2 = new String[]{
                "row-1:f1:c1:abcdefg", "row-1:f2:c2:abc", "row-2:f1:c1:abc123456", "row-2:f2:c2:1234abc567"
        };
    
        public static void main(String[] args) throws IOException, InterruptedException {
    
            MyBase myBase = new MyBase();
            Connection connection = myBase.createConnection();
            if (isok) {
                myBase.deleteTable(connection, tableName);
                myBase.createTable(connection, tableName, cfs);
                // 造数据
                myBase.putRows(connection, tableName, data1);  // 第一批数据
                Thread.sleep(10);
                myBase.putRows(connection, tableName, data2);  // 第二批数据
            }
            Table table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
    

    中部代码
    向右滑动滚动条可查看输出结果。

            // 构造方法一
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"));  // [row-1:f1:c1:abcdefg, row-1:f2:c2:abc, row-2:f1:c1:abc123456, row-2:f2:c2:1234abc567]
    
            // 构造方法二 boolean dropDependentColumn=true
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true);  // [row-1:f2:c2:abc, row-2:f2:c2:1234abc567]
    
            // 构造方法二 boolean dropDependentColumn=false  默认为false
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false); // [row-1:f1:c1:abcdefg, row-1:f2:c2:abc, row-2:f1:c1:abc123456, row-2:f2:c2:1234abc567]
    
            // 构造方法三 + BinaryComparator 比较器过滤数据
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,
                    CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("abcdefg"))); // [row-1:f1:c1:abcdefg, row-1:f2:c2:abc]
    
            // 构造方法三 + BinaryPrefixComparator 比较器过滤数据
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,
                    CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("abc")));  // [row-1:f1:c1:abcdefg, row-1:f2:c2:abc, row-2:f1:c1:abc123456, row-2:f2:c2:1234abc567]
    
            // 构造方法三 + SubstringComparator 比较器过滤数据
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,
                    CompareFilter.CompareOp.EQUAL, new SubstringComparator("1234"));  // [row-2:f1:c1:abc123456, row-2:f2:c2:1234abc567]
    
            // 构造方法三 + RegexStringComparator 比较器过滤数据
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,
                    CompareFilter.CompareOp.EQUAL, new RegexStringComparator("[a-z]"));  // [row-1:f1:c1:abcdefg, row-1:f2:c2:abc, row-2:f1:c1:abc123456, row-2:f2:c2:1234abc567]
    
            // 构造方法三 + RegexStringComparator 比较器过滤数据
            DependentColumnFilter filter = new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,
                    CompareFilter.CompareOp.EQUAL, new RegexStringComparator("1234[a-z]"));  // []  思考题:与上例对比,想想为什么为空?
    
    

    该过滤器同时也支持各比较器的不同比较语法,同之前介绍的各种过滤器是一样的,这里不再一一举例了。

    尾部代码

    		scan.setFilter(filter);
            ResultScanner scanner = table.getScanner(scan);
            Iterator<Result> iterator = scanner.iterator();
            LinkedList<String> keys = new LinkedList<>();
            while (iterator.hasNext()) {
                String key = "";
                Result result = iterator.next();
                for (Cell cell : result.rawCells()) {
                    byte[] rowkey = CellUtil.cloneRow(cell);
                    byte[] family = CellUtil.cloneFamily(cell);
                    byte[] column = CellUtil.cloneQualifier(cell);
                    byte[] value = CellUtil.cloneValue(cell);
                    key = Bytes.toString(rowkey) + ":" + Bytes.toString(family) + ":" + Bytes.toString(column) + ":" + Bytes.toString(value);
                    keys.add(key);
                }
            }
            System.out.println(keys);
            scanner.close();
            table.close();
            connection.close();
        }
    }
    

    二。Shell Api

    HBase test 表数据一览:

    hbase(main):009:0> scan 'test'
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-1                                           column=f2:c3, timestamp=1589794115241, value=1234abc56
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
     row-3                                           column=f1:c3, timestamp=1589794115241, value=1234321
    3 row(s) in 0.0280 seconds
    

    0. 简单构造方法

    hbase(main):006:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0450 seconds
    
    hbase(main):008:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',false)"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0310 seconds
    
    hbase(main):007:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',true)"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0250 seconds
    

    1. BinaryComparator 构造过滤器

    方式一:

    hbase(main):004:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',false,=,'binary:abcdefg')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
    1 row(s) in 0.0330 seconds
    
    hbase(main):005:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',true,=,'binary:abcdefg')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
    1 row(s) in 0.0120 seconds
    

    支持的比较运算符:= != > >= < <=,不再一一举例。

    方式二:

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.BinaryComparator
    import org.apache.hadoop.hbase.filter.DependentColumnFilter
    
    hbase(main):016:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('abcdefg')))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
    1 row(s) in 0.0170 seconds
    
    hbase(main):017:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true,CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('abcdefg')))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
    1 row(s) in 0.0140 seconds
    

    支持的比较运算符:LESS、LESS_OR_EQUAL、EQUAL、NOT_EQUAL、GREATER、GREATER_OR_EQUAL,不再一一举例。

    推荐使用方式一,更简洁方便。

    2. BinaryPrefixComparator 构造过滤器

    方式一:

    hbase(main):019:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',false,=,'binaryprefix:abc')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0330 seconds
    
    hbase(main):020:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',true,=,'binaryprefix:abc')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0600 seconds
    

    方式二:

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.BinaryPrefixComparator
    import org.apache.hadoop.hbase.filter.DependentColumnFilter
    
    hbase(main):023:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,CompareFilter::CompareOp.valueOf('EQUAL'), BinaryPrefixComparator.new(Bytes.toBytes('abc')))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0180 seconds
    
    hbase(main):022:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true,CompareFilter::CompareOp.valueOf('EQUAL'), BinaryPrefixComparator.new(Bytes.toBytes('abc')))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0190 seconds
    

    其它同上。

    3. SubstringComparator 构造过滤器

    方式一:

    hbase(main):025:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',false,=,'substring:abc')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0340 seconds
    
    hbase(main):024:0> scan 'test',{FILTER=>"DependentColumnFilter('f1','c1',true,=,'substring:abc')"}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0160 seconds
    

    方式二:

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.SubstringComparator
    import org.apache.hadoop.hbase.filter.DependentColumnFilter
    
    hbase(main):028:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('abc'))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0150 seconds
    
    hbase(main):029:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true,CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('abc'))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0170 seconds
    

    区别于上的是这里直接传入字符串进行比较,且只支持EQUALNOT_EQUAL两种比较符。

    4. RegexStringComparator 构造过滤器

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.RegexStringComparator
    import org.apache.hadoop.hbase.filter.DependentColumnFilter
    
    hbase(main):035:0> scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), false,CompareFilter::CompareOp.valueOf('EQUAL'), RegexStringComparator.new('[a-z]'))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f1:c1, timestamp=1589794115268, value=abcdefg
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f1:c1, timestamp=1589794115268, value=abc123456
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0170 seconds
    
    hbase(main):034:0* scan 'test',{FILTER => DependentColumnFilter.new(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true,CompareFilter::CompareOp.valueOf('EQUAL'), RegexStringComparator.new('[a-z]'))}
    ROW                                              COLUMN+CELL
     row-1                                           column=f2:c2, timestamp=1589794115268, value=abc
     row-2                                           column=f2:c2, timestamp=1589794115268, value=1234abc567
    2 row(s) in 0.0150 seconds
    

    该比较器直接传入字符串进行比较,且只支持EQUALNOT_EQUAL两种比较符。若想使用第一种方式可以传入regexstring试一下,我的版本有点低暂时不支持,不再演示了。

    注意这里的正则匹配指包含关系,对应底层find()方法。

    DependentColumnFilter不支持使用LongComparator比较器,且BitComparatorNullComparator比较器用之甚少,也不再介绍。

    到此为止,所有的比较过滤器就总结完毕了。

    查看文章全部源代码请访以下GitHub地址:

    https://github.com/zhoupengbo/demos-bigdata/blob/master/hbase/hbase-filters-demos/src/main/java/com/zpb/demos/DependentColumnFilterDemo.java
    

    扫描二维码关注博主公众号

    转载请注明出处!欢迎关注本人微信公众号【HBase工作笔记】

  • 相关阅读:
    nginx-1.8.1的安装
    ElasticSearch 在3节点集群的启动
    The type java.lang.CharSequence cannot be resolved. It is indirectly referenced from required .class files
    sqoop导入导出对mysql再带数据库test能跑通用户自己建立的数据库则不行
    LeetCode 501. Find Mode in Binary Search Tree (找到二叉搜索树的众数)
    LeetCode 437. Path Sum III (路径之和之三)
    LeetCode 404. Sum of Left Leaves (左子叶之和)
    LeetCode 257. Binary Tree Paths (二叉树路径)
    LeetCode Questions List (LeetCode 问题列表)- Java Solutions
    LeetCode 561. Array Partition I (数组分隔之一)
  • 原文地址:https://www.cnblogs.com/zpb2016/p/12921448.html
Copyright © 2011-2022 走看看