导入数据到HBase的方式选择

zoukankan html css js c++ java

导入数据到HBase的方式选择
Choosing the Right Import Method

If the data is already in an HBase table:
- To move the data from one HBase cluster to another, use snapshot and either the clone_snapshot or ExportSnapshot utility; or, use the CopyTable utility.
- To move the data from one HBase cluster to another without downtime on either cluster, use replication.
- To migrate data between HBase version that are not wire compatible, such as from CDH 4 to CDH 5, see Importing HBase Data From CDH 4 to CDH 5.
If the data currently exists outside HBase:
- If possible, write the data to HFile format, and use a BulkLoad to import it into HBase. The data is immediately available to HBase and you can bypass the normal write path, increasing efficiency.
- If you prefer not to use bulk loads, and you are using a tool such as Pig, you can use it to import your data.
If you need to stream live data to HBase instead of import in bulk:
- Write a Java client using the Java API, or use the Apache Thrift Proxy API to write a client in a language supported by Thrift.
- Stream data directly into HBase using the REST Proxy API in conjunction with an HTTP client such as wget or curl.
- Use Flume or Spark.
Most likely, at least one of these methods works in your situation. If not, you can use MapReduce directly. Test the most feasible methods with a subset of your data to determine which one is optimal.

摘自：http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_import.html
查看全文

相关阅读:
0803C#如何高效读取EXCEL文件
 0711笔记
 笔记0709
0708:XML专题
 笔记0705
笔记0704
笔记0627
笔记0626
gridview合并单元格
 笔记0624

原文地址：https://www.cnblogs.com/admln/p/5381774.html

导入数据到HBase的方式选择

Choosing the Right Import Method