Hbase
高表(tall table)比宽表(tall table)的性能更高(50%以上)
概念:
**cell** 通过row和columns确定的为一个存贮单元称为cell
**timestamp ** 每个cell都保存着同一份数据的多个版本。版本通过时间戳来索引。时间戳的类型是 64位整型。
**Family** 列族在创建之前需要定义好,cloumn可以动态插入
**rowKey** Rowkey排序是按照ASCII码表进行排序
# 建表
1 create 'table',{NAME=>'dataset',DATA_BLOCK_ENCODING=>'PREFIX'} # 指定表名/列簇/压缩方式 2 create 'table',{NAME=>'1'},{NAME=>'2'} 3 alter 'table','family' # 添加列簇
# 删除
1 disable 'table' # 删除表 2 drop 'table' 3 4 alter 'table',{NAME=>'1',METHOD=>'delete'} # 删除列簇 5 delete 'tebale','row','family:coloumn' # 删除列delete <table>,<rowkey>,<family:column> 6 deleteall 'table','row' # 删除行deleteall <table>,<rowkey>,<family:column> 7 eg: 8 deleteall 'annotation_task','oilT2My9Asrsi85CV0M.6.xj8upd8kbypm7vIQsoE' 9 deleteall 'annotation_task',"oilT2My9Asrsi85CV0M.x5Cx00x5Cx00x5Cx00x5Cx06.xj8upd8kbypm7vIQsoE" (双引号)
# 增加
1 put <table>,<rowkey>,<family:column>,<value>,<timestamp> 2 put 'table','sfsfsf','id:lisi','1993' # column 可以临时创建,通过:指定
# 查询
1 count 'table',{INTERVAL => 100, CACHE => 500} #表中有多少行,每100条显示一次,缓存区为500 2 get 'table','row','family:column' 3 4 scan 'table',{COLUM=>'info'} # 扫描info这个列簇 5 scan 'table',{COLUMNS=>'info:birthday'} # 扫描指定列 6 scan 'table', {STARTROW => 'Sariel', LIMIT=>1, VERSIONS=>1}
#除了列(COLUMNS)修饰词外,HBase还支持Limit(限制查询结果行数),STARTROW(ROWKEY起始行。会先根据这个key定位到region,再向后扫描)、STOPROW(结束行)、TIMERANGE(限定时间戳范围)、VERSIONS(版本数)、和FILTER(按条件过滤行)等。比如我们从Sariel这个rowkey开始,找下一个行的最新版本
1 scan 'table', { STARTROW => 'rowKey', LIMIT=>1, VERSIONS=>1}
# Filter是一个非常强大的修饰词,可以设定一系列条件来进行过滤。比如我们要限制某个列的值等于26
1 scan 'table', FILTER=>"ValueFilter(=,'binary:26’)" 2 scan 'member', FILTER=>"ValueFilter(=,'substring:6')" # 值包含6这个值 3 scan 'member', FILTER=>"ColumnPrefixFilter('birth') # 列名中的前缀为birth 4 scan 'table',FILTER=>"PrefixFilter('rowPrefix')" # 过滤扫描rowkey 5 scan 'member', FILTER=>"ColumnPrefixFilter('birth') AND ValueFilter ValueFilter(=,'substring:1988')" # 多重条件过滤 6 scan 'hbase:meta',FILTER=>"PrefixFilter('table')" # 获取指定table的region信息
# 其他
1 exists 'table' # 判断表名是否存在 2 disable 'table' # 修改表结构,先disable,再enable 3 alter 'table',{NAME=>'1',TTL=>'18888'} 4 ebale 'table'
# 创建lemon表
1 create 'sample_set_lemon', 2 {NAME => 's', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 3 {NAME => 'l', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 4 METADATA => { 5 'lemon.autoindex.enabled' => 'true', 6 'lemon.index.enabled' => 'true', 7 'lemon.index.regions' => '1', 8 'lemon.update.enabled' =>'true', 9 'lemon.index.meta' => '{"indexes":[ 10 {"nameType":"E","family":"s","column":"sample_id","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"}, 11 {"nameType":"E","family":"s","column":"sample_name","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"}, 12 {"nameType":"E","family":"s","column":"sample_dir","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"}, 13 {"nameType":"E","family":"s","column":"sample_size","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleSizeExtractor"}, 14 {"nameType":"E","family":"s","column":"sample_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"}, 15 {"nameType":"E","family":"s","column":"sample_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"}, 16 {"nameType":"E","family":"s","column":"annotation_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"}, 17 {"nameType":"E","family":"s","column":"annotated_by","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"}, 18 {"nameType":"E","family":"s","column":"reviewer","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"}, 19 {"nameType":"E","family":"s","column":"review_score","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"}, 20 {"nameType":"E","family":"s","column":"create_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"}, 21 {"nameType":"E","family":"s","column":"update_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"}, 22 {"nameType":"E","family":"s","column":"metadata","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleMetadataExtractor"}, 23 {"nameType":"F","family":"l","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleLabelsExtractor"} 24 ]}' 25 }
java API操作HBase表:
Hbase连接的正确姿势:
一个应用(进程)对应着一个connection,每个应用里的线程通过调用coonection的getTable方法从connection维护的线程池里获得table实例,按官方的说法,这种方式获得的table是线程安全的。每次table读写之后应该把table close掉,整个进程结束的时候才把connection close掉。当面对多线程访问需求时,为了避免较大的系统资源开销,需要预先建立HConnection。Connection是线程安全的,而Table和Admin则不是线程安全的,因此正确的做法是一个进程共用一个Connection对象,而在不同的线程中使用单独的Table和Admin对象。