1 .压缩测试工具
hbase org.apache.hadoop.hbase.util.CompressionTest
1G数据不同的压缩算法得到的结果
+--------------------+--------------+
| MODIFIER | SIZE (bytes) |
+--------------------+--------------+
| none | 1108553612 |
+--------------------+--------------+
| compression:SNAPPY | 427335534 |
+--------------------+--------------+
| compression:LZO | 270422088 |
+--------------------+--------------+
| compression:GZ | 152899297 |
+--------------------+--------------+
| codec:PREFIX | 1993910969 |
+--------------------+--------------+
| codec:DIFF | 1960970083 |
+--------------------+--------------+
| codec:FAST_DIFF | 1061374722 |
+--------------------+--------------+
| codec:PREFIX_TREE | 1066586604 |
+--------------------+--------------+
(1)安装Sannpy 压缩
export HBASE_LIBRARY_PATH=/pathtoyourhadoop/lib/native/Linux-amd64-64 测试sannpy压缩hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy
(2)配置压缩
hbase-site.xml
中,配置hbase.regionserver.codecs
,可选的值有LZO,Snappy,GZIP
2.HFile工具
查看HFile
hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475
3.WAL工具
查看WAL文件(FSHLog文件)
hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
强制split WAL文件
hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/
HLogPrettyPrinter 打印HLog 内容
4.表拷贝工具
将一个集群中的表拷贝到另外一个表中,前提是目标集群中必须有同样的表存在。
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase TestTable
其他选项:
starttime
Beginning of the time range. Without endtime means starttime to forever.endtime
End of the time range. Without endtime means starttime to forever.versions
Number of cell versions to copy.new.name
New table's name.peer.adr
Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parentfamilies
Comma-separated list of ColumnFamilies to copy.all.cells
Also copy delete markers and uncollected deleted cells (advanced option).
hbase.client.scanner.caching
通过表拷贝实现在线数据备份:http://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/
5.导出表数据
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
6.导入表数据
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
不同hbase版本的表数据导入
hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
7.WALPlayer
先生成HFile ,然后bulk 导入。
hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2
默认是分布式马屁reduce,可以改成本地模式。-Dmapred.job,traker=local
8.RowCounter CellCounter
RowCounter是一个MR程序,用于计算表的row数。
hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
CellCount 得到的结果有:
- Total number of rows in the table.
- Total number of CFs across all rows.
- Total qualifiers across all rows.
- Total occurrence of each CF.
- Total occurrence of each qualifier.
- Total number of versions of each qualifier.
hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]
9.mlockall
export HBASE_REGIONSERVER_OPTS="-agentpath:./libmlockall_agent.so=user=hbase"
hbase --mlock user=hbase regionserver start
JDK必须是root用户安装的
10.先下紧缩工具
hbase org.apache.hadoop.hbase.regionserver.CompactionTool
11.region合并工具
hbase org.apache.hadoop.hbase.util.Merge <tablename> <region1> <region2>