zoukankan      html  css  js  c++  java
  • HBase使用

    Hbase
             高表(tall table)比宽表(tall table)的性能更高(50%以上)
    概念:
    **cell** 通过row和columns确定的为一个存贮单元称为cell
    **timestamp ** 每个cell都保存着同一份数据的多个版本。版本通过时间戳来索引。时间戳的类型是 64位整型。
    **Family** 列族在创建之前需要定义好,cloumn可以动态插入
    **rowKey** Rowkey排序是按照ASCII码表进行排序
     
    # 建表
    1 create 'table',{NAME=>'dataset',DATA_BLOCK_ENCODING=>'PREFIX'} # 指定表名/列簇/压缩方式
    2 create 'table',{NAME=>'1'},{NAME=>'2'}
    3 alter 'table','family' # 添加列簇
    # 删除
    1 disable 'table' # 删除表
    2 drop 'table'     
    3  
    4 alter 'table',{NAME=>'1',METHOD=>'delete'} # 删除列簇
    5 delete 'tebale','row','family:coloumn' # 删除列delete <table>,<rowkey>,<family:column>
    6 deleteall 'table','row' # 删除行deleteall <table>,<rowkey>,<family:column>
    7 eg:
    8 deleteall 'annotation_task','oilT2My9Asrsi85CV0M.6.xj8upd8kbypm7vIQsoE'
    9 deleteall 'annotation_task',"oilT2My9Asrsi85CV0M.x5Cx00x5Cx00x5Cx00x5Cx06.xj8upd8kbypm7vIQsoE" (双引号)
    # 增加
    1 put <table>,<rowkey>,<family:column>,<value>,<timestamp>
    2 put 'table','sfsfsf','id:lisi','1993' # column 可以临时创建,通过:指定
    # 查询
    1 count 'table',{INTERVAL => 100, CACHE => 500} #表中有多少行,每100条显示一次,缓存区为500
    2 get 'table','row','family:column'
    3  
    4 scan 'table',{COLUM=>'info'} # 扫描info这个列簇
    5 scan 'table',{COLUMNS=>'info:birthday'} # 扫描指定列
    6 scan 'table', {STARTROW => 'Sariel', LIMIT=>1, VERSIONS=>1}
    #除了列(COLUMNS)修饰词外,HBase还支持Limit(限制查询结果行数),STARTROW(ROWKEY起始行。会先根据这个key定位到region,再向后扫描)、STOPROW(结束行)、TIMERANGE(限定时间戳范围)、VERSIONS(版本数)、和FILTER(按条件过滤行)等。比如我们从Sariel这个rowkey开始,找下一个行的最新版本
    1 scan 'table', { STARTROW => 'rowKey', LIMIT=>1, VERSIONS=>1}
    # Filter是一个非常强大的修饰词,可以设定一系列条件来进行过滤。比如我们要限制某个列的值等于26
    1 scan 'table', FILTER=>"ValueFilter(=,'binary:26’)"
    2 scan 'member', FILTER=>"ValueFilter(=,'substring:6')" # 值包含6这个值
    3 scan 'member', FILTER=>"ColumnPrefixFilter('birth') # 列名中的前缀为birth
    4 scan 'table',FILTER=>"PrefixFilter('rowPrefix')" # 过滤扫描rowkey
    5 scan 'member', FILTER=>"ColumnPrefixFilter('birth') AND ValueFilter ValueFilter(=,'substring:1988')" # 多重条件过滤
    6 scan 'hbase:meta',FILTER=>"PrefixFilter('table')" # 获取指定table的region信息
    # 其他     
    1 exists 'table' # 判断表名是否存在
    2 disable 'table' # 修改表结构,先disable,再enable
    3 alter 'table',{NAME=>'1',TTL=>'18888'} 
    4 ebale 'table'    
    # 创建lemon表
     1 create 'sample_set_lemon',
     2 {NAME => 's', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 
     3 {NAME => 'l', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 
     4 METADATA => { 
     5     'lemon.autoindex.enabled' => 'true', 
     6     'lemon.index.enabled' => 'true', 
     7     'lemon.index.regions' => '1',
     8     'lemon.update.enabled' =>'true', 
     9     'lemon.index.meta' => '{"indexes":[
    10         {"nameType":"E","family":"s","column":"sample_id","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
    11         {"nameType":"E","family":"s","column":"sample_name","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"},
    12         {"nameType":"E","family":"s","column":"sample_dir","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"},
    13         {"nameType":"E","family":"s","column":"sample_size","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleSizeExtractor"},
    14         {"nameType":"E","family":"s","column":"sample_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
    15         {"nameType":"E","family":"s","column":"sample_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
    16         {"nameType":"E","family":"s","column":"annotation_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
    17         {"nameType":"E","family":"s","column":"annotated_by","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"},
    18         {"nameType":"E","family":"s","column":"reviewer","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"},
    19         {"nameType":"E","family":"s","column":"review_score","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
    20         {"nameType":"E","family":"s","column":"create_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
    21         {"nameType":"E","family":"s","column":"update_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
    22         {"nameType":"E","family":"s","column":"metadata","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleMetadataExtractor"},
    23         {"nameType":"F","family":"l","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleLabelsExtractor"}
    24     ]}'
    25 }
     
    java API操作HBase表
     
    Hbase连接的正确姿势:

    一个应用(进程)对应着一个connection,每个应用里的线程通过调用coonection的getTable方法从connection维护的线程池里获得table实例,按官方的说法,这种方式获得的table是线程安全的。每次table读写之后应该把table close掉,整个进程结束的时候才把connection close掉。当面对多线程访问需求时,为了避免较大的系统资源开销,需要预先建立HConnection。Connection是线程安全的,而Table和Admin则不是线程安全的,因此正确的做法是一个进程共用一个Connection对象,而在不同的线程中使用单独的Table和Admin对象。

    详见:https://www.jianshu.com/p/fd0cddb43222

     
  • 相关阅读:
    Asp.Net Core 进阶(一) —— 读取appsettings.json
    chrome控制台模拟hover、focus、active等状态,方便调试
    windows server 注意windows的temp目录
    (转)大公司里怎样开发和部署前端代码?
    排序算法——二分插入排序
    排序算法——归并排序
    排序算法——冒泡排序
    排序算法——插入排序
    排序算法——快速排序
    linux安装和配置 mysql、redis 过程中遇到的问题记录
  • 原文地址:https://www.cnblogs.com/luckyboylch/p/12327298.html
Copyright © 2011-2022 走看看