zoukankan      html  css  js  c++  java
  • 导入数据到HBase的方式选择

    Choosing the Right Import Method

    If the data is already in an HBase table:

    • To move the data from one HBase cluster to another, use snapshot and either the clone_snapshot or ExportSnapshot utility; or, use the CopyTable utility.

    • To move the data from one HBase cluster to another without downtime on either cluster, use replication.

    • To migrate data between HBase version that are not wire compatible, such as from CDH 4 to CDH 5, see Importing HBase Data From CDH 4 to CDH 5.

    If the data currently exists outside HBase:

    • If possible, write the data to HFile format, and use a BulkLoad to import it into HBase. The data is immediately available to HBase and you can bypass the normal write path, increasing efficiency.

    • If you prefer not to use bulk loads, and you are using a tool such as Pig, you can use it to import your data.

    If you need to stream live data to HBase instead of import in bulk:

    • Write a Java client using the Java API, or use the Apache Thrift Proxy API to write a client in a language supported by Thrift.

    • Stream data directly into HBase using the REST Proxy API in conjunction with an HTTP client such as wget or curl.

    • Use Flume or Spark.

    Most likely, at least one of these methods works in your situation. If not, you can use MapReduce directly. Test the most feasible methods with a subset of your data to determine which one is optimal.


    摘自:http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_import.html

  • 相关阅读:
    0803C#如何高效读取EXCEL文件
    0711笔记
    笔记0709
    0708:XML专题
    笔记0705
    笔记0704
    笔记0627
    笔记0626
    gridview合并单元格
    笔记0624
  • 原文地址:https://www.cnblogs.com/admln/p/5381774.html
Copyright © 2011-2022 走看看