zoukankan      html  css  js  c++  java
  • hbase shell操作之scan+filter

    转载于:https://blog.csdn.net/liuxiao723846/article/details/73823056

    有点草率,最近课程紧

    创建表

     

    create 'test1', 'lf', 'sf'

    lf: column family of LONG values (binary value) 
    -- sf: column family of STRING values

     

    导入数据

     

    1.  
      put 'test1', 'user1|ts1', 'sf:c1', 'sku1'
    2.  
      put 'test1', 'user1|ts2', 'sf:c1', 'sku188'
    3.  
      put 'test1', 'user1|ts3', 'sf:s1', 'sku123'
    4.  
       
    5.  
      put 'test1', 'user2|ts4', 'sf:c1', 'sku2'
    6.  
      put 'test1', 'user2|ts5', 'sf:c2', 'sku288'
    7.  
      put 'test1', 'user2|ts6', 'sf:s1', 'sku222'


    一个用户(userX),在什么时间(tsX),作为rowkey 
    对什么产品(value:skuXXX),做了什么操作作为列名,比如,c1: click from homepage; c2: click from ad; s1: search from homepage; b1: buy

     

    查询案例


    1、谁的值=sku188

    1.  
      scan 'test1', FILTER=>"ValueFilter(=,'binary:sku188')"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188

    2、谁的值包含88

     

     

    1.  
      scan 'test1', FILTER=>"ValueFilter(=,'substring:88')"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    5.  
      user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288


    3、通过广告点击进来的(column为c2)值包含88的用户

     

     

    1.  
      scan 'test1', FILTER=>"ColumnPrefixFilter('c2') AND ValueFilter(=,'substring:88')"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288


    4、通过搜索进来的(column为s)值包含123或者222的用户

     

     

    1.  
      scan 'test1', FILTER=>"ColumnPrefixFilter('s') AND ( ValueFilter(=,'substring:123') OR ValueFilter(=,'substring:222') )"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123
    5.  
      user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222

    5、rowkey为user1开头的

     

     

    1.  
      scan 'test1', FILTER => "PrefixFilter ('user1')"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1
    5.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    6.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123


    6、FirstKeyOnlyFilter: 一个rowkey可以有多个version,同一个rowkey的同一个column也会有多个的值, 只拿出key中的第一个column的第一个version 
    KeyOnlyFilter: 只要key,不要value 

    1.  
      scan 'test1', FILTER=>"FirstKeyOnlyFilter() AND ValueFilter(=,'binary:sku188') AND KeyOnlyFilter()"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=


    7、从user1|ts2开始,找到所有的rowkey以user1开头的

     

     

    1.  
      scan 'test1', {STARTROW=>'user1|ts2', FILTER => "PrefixFilter ('user1')"}
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    5.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123


    8、从user1|ts2开始,找到所有的到rowkey以user2开头

     

     

    1.  
      scan 'test1', {STARTROW=>'user1|ts2', STOPROW=>'user2'}
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    5.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123


    9、查询rowkey里面包含ts3的

     

     

    1.  
      import org.apache.hadoop.hbase.filter.CompareFilter
    2.  
      import org.apache.hadoop.hbase.filter.SubstringComparator
    3.  
      import org.apache.hadoop.hbase.filter.RowFilter
    4.  
      scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts3'))}
    5.  
       
    6.  
      ROW COLUMN+CELL
    7.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123

    10、查询rowkey里面包含ts的

     

     

    1.  
      import org.apache.hadoop.hbase.filter.CompareFilter
    2.  
      import org.apache.hadoop.hbase.filter.SubstringComparator
    3.  
      import org.apache.hadoop.hbase.filter.RowFilter
    4.  
      scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts'))}
    5.  
       
    6.  
      ROW COLUMN+CELL
    7.  
      user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1
    8.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    9.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123
    10.  
      user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2
    11.  
      user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288
    12.  
      user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222


    加入一条测试数据

     

     

    put 'test1', 'user2|err', 'sf:s1', 'sku999'


    11、查询rowkey里面以user开头的,新加入的测试数据并不符合正则表达式的规则,故查询不出来 

    1.  
      import org.apache.hadoop.hbase.filter.RegexStringComparator
    2.  
      import org.apache.hadoop.hbase.filter.CompareFilter
    3.  
      import org.apache.hadoop.hbase.filter.SubstringComparator
    4.  
      import org.apache.hadoop.hbase.filter.RowFilter
    5.  
      scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^userd+|tsd+$'))}
    6.  
       
    7.  
      ROW COLUMN+CELL
    8.  
      user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1
    9.  
      user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188
    10.  
      user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123
    11.  
      user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2
    12.  
      user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288
    13.  
      user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222


    加入测试数据 

     

     

    put 'test1', 'user1|ts9', 'sf:b1', 'sku1'


    12、b1开头的列中并且值为sku1的:

     

     

    1.  
      scan 'test1', FILTER=>"ColumnPrefixFilter('b1') AND ValueFilter(=,'binary:sku1')"
    2.  
       
    3.  
      ROW COLUMN+CELL
    4.  
      user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1


    13、SingleColumnValueFilter的使用,b1开头的列中并且值为sku1的

     

     

    1.  
      import org.apache.hadoop.hbase.filter.CompareFilter
    2.  
      import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
    3.  
      import org.apache.hadoop.hbase.filter.SubstringComparator
    4.  
      scan 'test1', {COLUMNS => 'sf:b1', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('sf'), Bytes.toBytes('b1'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('sku1'))}
    5.  
       
    6.  
      ROW COLUMN+CELL
    7.  
      user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1


    hbase zkcli 的使用 

     

     

    1.  
      hbase zkcli
    2.  
      ls /
    3.  
      [hbase, zookeeper]
    4.  
       
    5.  
      [zk: hadoop000:2181(CONNECTED) 1] ls /hbase
    6.  
      [meta-region-server, backup-masters, table, draining, region-in-transition, running, table-lock, master, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs]
    7.  
       
    8.  
      [zk: hadoop000:2181(CONNECTED) 2] ls /hbase/table
    9.  
      [member, test1, hbase:meta, hbase:namespace]
    10.  
       
    11.  
      [zk: hadoop000:2181(CONNECTED) 3] ls /hbase/table/test1
    12.  
      []
    13.  
       
    14.  
      [zk: hadoop000:2181(CONNECTED) 4] get /hbase/table/test1
    15.  
      ?master:60000}l$??lPBUF
    16.  
      cZxid = 0x107
    17.  
      ctime = Wed Aug 27 14:52:21 HKT 2014
    18.  
      mZxid = 0x10b
    19.  
      mtime = Wed Aug 27 14:52:22 HKT 2014
    20.  
      pZxid = 0x107
    21.  
      cversion = 0
    22.  
      dataVersion = 2
    23.  
      aclVersion = 0
    24.  
      ephemeralOwner = 0x0
    25.  
      dataLength = 31
    26.  
      numChildren = 0
  • 相关阅读:
    纹理作用于栅格建模
    Blender模拟全局照明的简单方法
    材质组合卡通眼球
    Blender 曲线操作
    材质纹理的初级示例
    Quick Noodle Physics in Blender Tutorial
    Blender简单动画:一个小球从一座山上滚下.
    PostgreSQL的目录结构及修改数据目录
    PostgreSQL的配置文件
    CentOS7安装PostgreSQL10,pgadmin4
  • 原文地址:https://www.cnblogs.com/LEPENGYANG/p/14088125.html
Copyright © 2011-2022 走看看