zoukankan      html  css  js  c++  java
  • HBase-存储-HFile格式

    HBase-存储-HFile格式

    实际的存储文件功能是由HFile类实现的,它被专门创建以达到一个目的:有效地存储HBase的数据。它们基于Hadoop的TFile类,并模仿Google的BigTable架构使用的SSTable格式。
    文件格式的详细信息如下图


    这些文件是可变长度的,唯一固定的块是File Info块和Trailer块。Trailer有指向其它块的指针。它是在持久化数据到文件结束时写入的,写入后即确定其成为不可变的数据存储文件。Index块记录Data和Meta块的偏移量。Data和Meta块实际上都是可选的,但是考虑到HBase如何使用数据文件,在存储文件中用户几乎总能找到Data块。
    块大小是由HColumnDescriptor配置的,而该配置可以在创建表时由用户指定或者使用比较合理的默认值。

    hbase(main):002:0> desc 'test_table_mr'
    Table test_table_mr is ENABLED 
    test_table_mr 
    COLUMN FAMILIES DESCRIPTION 
    {NAME => 'data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIO
    NS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
    

    这里的默认值是64KB(65536字节)。
    HFile在JavaDoc中的解释
    块大小的最小值。对于一般的应用,建议将最小的块大小设置为8KB-1MB。如果应用主要涉及顺序访问,较大的块大小将更加合适。不过这会降低随机读性能(因为需要解压缩更多的数据)。较小的块更有利于随机数据访问,不过同时也需要更多的内存来存储块索引,并且可能创建过程也会变得更慢(因为我们必须在每个数据块结束的时候刷写压缩流,这会导致一个FS I/O刷写)。此外,由于压缩解码器在内部缓存,导致可能的最小的块大小是20KB-30KB。
    每个块都包含一个magic头部和一定数量的序列化的KeyValue实例。如果用户没有使用压缩算法,每个块大小和配置的块大小差不多。写入程序必须适应用户写入的数据:如果用户存储了一个比块大小更大的KeyValue实例,则HBase也必须接受它。不过即使是较小的值,对于块大小的检查也是在最后一个值写入后才进行的,所以在实际情况中,大部分块会稍大。
    当使用压缩算法时,用户对于块大小的控制力将更弱。压缩解码器在能够自己控制获取的数据量时才能达到最有效的压缩比率。例如,把块大小设置为256KB,并使用LZO压缩算法,系统将写更小的块来适应LZO的内部缓冲区大小。
    HBase不知道用户是否选择了一个压缩算法:它将按照块大小的限制来写原始数据,并尽量让原始数据的大小与这个限制接近。如果用户启用了压缩,则保存到磁盘上的数据将更少。这意味着最终的存储文件由相同数量的块组成,但是由于每一个块都更小,所以总大小也更小。
    在HDFS中,文件的默认块大小是128MB,这个是HFile默认块大小的2048倍。因此HBase存储文件的块与hadoop的块之间没有匹配关系。事实上,这两种块类型之间根本没有相关性。HBase把它的文件透明的存储到文件系统中,而HDFS也使用块来切分文件仅仅是一个巧合,并且HDFS不知道HBase存储的是什么,它只能看到二进制文件。

    有时候,用户有必要绕过HBase并直接访问一个HFile,例如,检查它的健康程度,或者转存它的内容。HFile.main()方法提供了这样的工具

    [root@node-231 ~]# hbase org.apache.hadoop.hbase.io.hfile.HFile
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop-yarn/ProcessLog-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
    [-s] [-v] [-w <arg>]
    -a,--checkfamily Enable family check
    -b,--printblocks Print block index meta data
    -e,--printkey Print keys
    -f,--file <arg> File to scan. Pass full-path; e.g.
    hdfs://a:9000/hbase/hbase:meta/12/34
    -h,--printblockheaders Print block headers for each block.
    -i,--checkMobIntegrity Print all cells whose mob files are missing
    -k,--checkrow Enable row order check; looks for out-of-order
    keys
    -m,--printmeta Print meta data of file
    -p,--printkv Print key/value pairs
    -r,--region <arg> Region to scan. Pass region name; e.g.
    'hbase:meta,,1'
    -s,--stats Print statistics
    -v,--verbose Verbose output; emits file and meta data
    delimiters
    -w,--seekToRow <arg> Seek to this row and print all the kvs for this
    row only
    

      

    查看目录

    [root@node-231 ~]# hadoop fs -lsr /apps/hbase
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop-yarn/ProcessLog-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    lsr: DEPRECATED: Please use 'ls -R' instead.
    drwxr-xr-x - hbase hdfs 0 2018-09-19 18:26 /apps/hbase/data
    drwxr-xr-x - hbase hdfs 0 2018-09-27 17:49 /apps/hbase/data/.tmp
    drwxr-xr-x - hbase hdfs 0 2018-09-27 17:49 /apps/hbase/data/.tmp/data
    drwxr-xr-x - hbase hdfs 0 2018-09-28 17:54 /apps/hbase/data/.tmp/data/default
    drwxr-xr-x - hbase hdfs 0 2018-10-09 17:59 /apps/hbase/data/MasterProcWALs
    -rw-r--r-- 3 hbase hdfs 0 2018-10-09 17:59 /apps/hbase/data/MasterProcWALs/state-00000000000000001877.log
    drwxr-xr-x - hbase hdfs 0 2018-09-19 18:27 /apps/hbase/data/WALs
    drwxr-xr-x - hbase hdfs 0 2018-07-10 10:31 /apps/hbase/data/WALs/node231,16020,1531189330072
    drwxr-xr-x - hbase hdfs 0 2018-07-10 10:54 /apps/hbase/data/WALs/node231,16020,1531189883651
    drwxr-xr-x - hbase hdfs 0 2018-07-10 11:02 /apps/hbase/data/WALs/node231,16020,1531191257857
    drwxr-xr-x - hbase hdfs 0 2018-07-10 11:14 /apps/hbase/data/WALs/node231,16020,1531191741322
    drwxr-xr-x - hbase hdfs 0 2018-07-10 18:14 /apps/hbase/data/WALs/node231,16020,1531192949461
    drwxr-xr-x - hbase hdfs 0 2018-07-11 17:06 /apps/hbase/data/WALs/node231,16020,1531219308266-splitting
    -rw-r--r-- 3 hbase hdfs 91 2018-07-11 17:06 /apps/hbase/data/WALs/node231,16020,1531219308266-splitting/node231%2C16020%2C1531219308266..meta.1531298682095.meta
    drwxr-xr-x - hbase hdfs 0 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235
    -rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235/node231%2C16020%2C1537352815235..meta.1539077336609.meta
    -rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235/node231%2C16020%2C1537352815235.default.1539077320915
    drwxr-xr-x - hbase hdfs 0 2018-07-09 17:31 /apps/hbase/data/WALs/node232,16020,1531128455707-splitting
    -rw-r--r-- 3 hbase hdfs 814 2018-07-09 17:29 /apps/hbase/data/WALs/node232,16020,1531128455707-splitting/node232%2C16020%2C1531128455707..meta.1531128485582.meta
    drwxr-xr-x - hbase hdfs 0 2018-07-17 11:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting
    -rw-r--r-- 3 hbase hdfs 1867 2018-07-17 12:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531794867296.meta
    -rw-r--r-- 3 hbase hdfs 3490 2018-07-17 12:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531796373363.meta
    -rw-r--r-- 3 hbase hdfs 83 2018-07-17 11:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531799973563.meta
    drwxr-xr-x - hbase hdfs 0 2018-07-25 15:11 /apps/hbase/data/WALs/node233,16020,1531878086352
    drwxr-xr-x - hbase hdfs 0 2018-10-09 17:26 /apps/hbase/data/WALs/node233,16020,1537352818756
    -rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:26 /apps/hbase/data/WALs/node233,16020,1537352818756/node233%2C16020%2C1537352818756.default.1539077334363
    drwxr-xr-x - hbase hdfs 0 2018-07-10 10:22 /apps/hbase/data/WALs/node234,16020,1531188485576
    drwxr-xr-x - hbase hdfs 0 2018-07-10 11:02 /apps/hbase/data/WALs/node234,16020,1531191251113
    drwxr-xr-x - hbase hdfs 0 2018-07-10 11:14 /apps/hbase/data/WALs/node234,16020,1531191744628
    drwxr-xr-x - hbase hdfs 0 2018-07-10 11:22 /apps/hbase/data/WALs/node234,16020,1531192469368
    drwxr-xr-x - hbase hdfs 0 2018-07-10 18:14 /apps/hbase/data/WALs/node234,16020,1531192953492
    drwxr-xr-x - hbase hdfs 0 2018-07-10 18:41 /apps/hbase/data/WALs/node234,16020,1531218644614
    drwxr-xr-x - hbase hdfs 0 2018-07-17 09:56 /apps/hbase/data/WALs/node234,16020,1531736897611
    drwxr-xr-x - hbase hdfs 0 2018-10-09 17:30 /apps/hbase/data/WALs/node234,16020,1537352822378
    -rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:30 /apps/hbase/data/WALs/node234,16020,1537352822378/node234%2C16020%2C1537352822378..meta.1539077644098.meta
    -rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:25 /apps/hbase/data/WALs/node234,16020,1537352822378/node234%2C16020%2C1537352822378.default.1539077313538
    drwxr-xr-x - hbase hdfs 0 2018-07-10 18:41 /apps/hbase/data/WALs/node235,16020,1531218644231
    drwxr-xr-x - hbase hdfs 0 2018-07-17 10:33 /apps/hbase/data/WALs/node235,16020,1531792606380
    drwxr-xr-x - hbase hdfs 0 2018-07-25 15:11 /apps/hbase/data/WALs/node235,16020,1531878078376
    drwxr-xr-x - hbase hdfs 0 2018-07-09 17:27 /apps/hbase/data/WALs/hregion-32519348
    drwxr-xr-x - hbase hdfs 0 2018-10-09 02:25 /apps/hbase/data/archive
    drwxr-xr-x - hbase hdfs 0 2018-07-09 17:31 /apps/hbase/data/corrupt
    drwxr-xr-x - hbase hdfs 0 2018-07-09 17:28 /apps/hbase/data/data
    drwxr-xr-x - hbase hdfs 0 2018-09-28 17:54 /apps/hbase/data/data/default
    drwxr-xr-x - hbase hdfs 0 2018-07-09 17:32 /apps/hbase/data/data/default/socialSecurity
    drwxr-xr-x - hbase hdfs 0 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tabledesc
    -rw-r--r-- 3 hbase hdfs 673 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tabledesc/.tableinfo.0000000006
    drwxr-xr-x - hbase hdfs 0 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tmp
    drwxr-xr-x - hbase hdfs 0 2018-09-24 15:06 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8
    -rw-r--r-- 3 hbase hdfs 49 2018-07-09 17:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/.regioninfo
    drwxr-xr-x - hbase hdfs 0 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/.tmp
    drwxr-xr-x - hbase hdfs 0 2018-09-19 18:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/recovered.edits
    -rw-r--r-- 3 hbase hdfs 0 2018-09-19 18:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/recovered.edits/228.seqid
    drwxr-xr-x - hbase hdfs 0 2018-10-07 00:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag
    -rw-r--r-- 3 hbase hdfs 102271 2018-10-07 00:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d
    drwxr-xr-x - hbase hdfs 0 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/userInfo
    -rw-r--r-- 3 hbase hdfs 101512 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/userInfo/549a3822eb484e32a12842287293435a
    

      

    查看HFile状况

    [root@node-231 ~]# hbase org.apache.hadoop.hbase.io.hfile.HFile -f /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d -v -m -p
    省略部分。。。。
    K: 653128197810268592/tag:basicTag_1400/1531132350695/Put/vlen=4/seqid=0 V: 1400
    K: 653224199208208529/tag:basicTag_1364/1531129064927/Put/vlen=4/seqid=0 V: 1364
    K: 653224199208208529/tag:basicTag_1367/1531130762508/Put/vlen=4/seqid=0 V: 1367
    K: 653224199208208529/tag:basicTag_1374/1531130561522/Put/vlen=4/seqid=0 V: 1374
    K: 653224199208208529/tag:basicTag_1399/1531132350695/Put/vlen=4/seqid=0 V: 1399
    K: 654324197605195204/tag:basicTag_1364/1531129064915/Put/vlen=4/seqid=0 V: 1364
    K: 654324197605195204/tag:basicTag_1368/1531130762525/Put/vlen=4/seqid=0 V: 1368
    K: 654324197605195204/tag:basicTag_1373/1531130561519/Put/vlen=4/seqid=0 V: 1373
    K: 654324197605195204/tag:basicTag_1400/1531132350695/Put/vlen=4/seqid=0 V: 1400
    K: 659000198306113231/tag:basicTag_1363/1531129064927/Put/vlen=4/seqid=0 V: 1363
    K: 659000198306113231/tag:basicTag_1367/1531130762508/Put/vlen=4/seqid=0 V: 1367
    K: 659000198306113231/tag:basicTag_1371/1531130561522/Put/vlen=4/seqid=0 V: 1371
    K: 659000198306113231/tag:basicTag_1399/1531132350695/Put/vlen=4/seqid=0 V: 1399
    Block index size as per heapsize: 480
    reader=/apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d,
    compression=none,
    cacheConf=CacheConfig:disabled,
    firstKey=110115199402265244/tag:basicTag_1364/1531129064927/Put,
    lastKey=659000198306113231/tag:basicTag_1399/1531132350695/Put,
    avgKeyLen=46,
    avgValueLen=4,
    entries=1682,
    length=102271
    Trailer:
    fileinfoOffset=97813,
    loadOnOpenDataOffset=97650,
    dataIndexCount=2,
    metaIndexCount=0,
    totalUncomressedBytes=102109,
    entryCount=1682,
    compressionCodec=NONE,
    uncompressedDataIndexSize=89,
    numDataIndexLevels=1,
    firstDataBlockOffset=0,
    lastDataBlockOffset=65593,
    comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
    encryptionKey=NONE,
    majorVersion=3,
    minorVersion=0
    Fileinfo:
    DELETE_FAMILY_COUNT = x00x00x00x00x00x00x00x00
    EARLIEST_PUT_TS = x00x00x01d~gmxD3
    MAJOR_COMPACTION_KEY = xFF
    MAX_SEQ_ID_KEY = 29
    TIMERANGE = 1531129064915....1531132350695
    hfile.AVG_KEY_LEN = 46
    hfile.AVG_VALUE_LEN = 4
    hfile.CREATE_TIME_TS = x00x00x01fJ.8!
    hfile.LASTKEY = x00x12659000198306113231x03tagbasicTag_1399x00x00x01d~x99x90xE7x04
    Mid-key: x00x0544023x00x7FxFFxFFxFFxFFxFFxFFxFFxFF
    Bloom filter:
    Not present
    Delete Family Bloom filter:
    Not present
    Scanned kv count -> 1682

    输出的第一部分是序列化的KeyValue实例所存储的真实数据。第二部分转存内部的HFile.Reader属性和trailer块的详细信息。最后一个部分以Fileinfo开头,是file info块的值。

  • 相关阅读:
    影响CSS的margin合并的几个属性
    Mouse w/o Borders实现两台主机共用一套键鼠方法及问题处理
    隐藏"Input"标签默认样式
    如何快速开发网站?
    如何让Web.xml变得简洁?
    关于中文处理方面的研究
    Hello,World 百态
    UI开发的终极解决方案
    构建网络爬虫?so easy
    MDA数据校验规则定义
  • 原文地址:https://www.cnblogs.com/EnzoDin/p/9766290.html
Copyright © 2011-2022 走看看