zoukankan      html  css  js  c++  java
  • SQL for HBase

    本文转自:http://www.orzota.com/sql-for-hbase/

    HBase provides random, read-write access to Big Data stored in very large tables as a distributed columnar store. HBase thus gained immediately popularity as a Big Data technology that unlike Hadoop which was primarily used in the backend data warehousing infrastructure could be deployed to service online transactions as well. Just as Hive brought SQL to Hadoop (and now to HBase as described in my earlier blog post), there are many alternate projects providing SQL for HBase which is the subject of this blog article.

    Phoenix

    Phoenix is a technology developed by Salesforce.com to put a SQL skin over HBase. This is not adding a new layer, rather it exposes HBase functionality through SQL using an embedded JDBC Driver that allows clients to run at native HBase speed. The JDBC driver compiles SQL into native HBase calls.

    The claim is that performance of Phoenix is much better than that of Hive over HBase – upto 10x at 100 million rows. Performance is also more constant as the number of rows increases. Optimizations are transparent to the user.

    Impala

    Cloudera has been promoting Impala heavily since its announcement in Strata last year. Impala is a SQL engine that can run on HDFS or HBase or both. They too claim that Impala performance is much better than that of Hive – over 45 times! However this comparison seems to be based on native Hive, not Hive over HBase. The performance gains are achieved by not writing intermediate results to disk (like MapReduce), no spin up/down times, optimized code as Impala is written in C++ rather than Java. This means that it needs to use JNI to make calls into Java for HBase integrations.

    Impala also includes authentication via Kerberos support, JDBC/ODBC drivers (similar to Phoenix), an interactive shell and support for popular hadoop file formats such as parquet, avro, thrift and sequence files. Parquet was specifically engineered for Impala to get the best performance.

    The use cases for HBase and Impala include the following:

    • Data is being streamed in
    • Need to run queries on data as it is being created
    • Need real time edits of data in-place
    • Need random access to computed results (results can be placed in HBase)

    Drill

    Apache Drill (the only in this series which is an Apache project) was inspired by Google Dremel. It is the most interesting in terms of its emphasis on interactive analysis of large scale datasets. It is similar to Impala in many ways but is community driven. Although Hive is also a community project, the need for Drill rose because of the tight coupling of Hive to MapReduce which is based on a pessimistic execution principle i.e. all jobs will be long running and process lots of data. This causes a lot of overhead for short-running jobs which Drill addresses.

    Drill does great when execution times of 100ms to 10 seconds are required. Phoenix provides even better response times – below 10ms.

    Drill is unique in its focus on analytical queries, so a lot of emphasis is placed on bulk reads and aggregations. Phoenix is more focused on write optimizations as in the RDBMS world.

    Drill also provides extensions to SQL for nested data types similar to Big Query. Many systems nest data structures inside of HBase cells (for example a cell can be a whole log record which in turn has many nested fields). It is much better to treat these nested cells and process them in the query language for optimal performance.

    Drill leverages recent research approaches (late record materialization, vectorized operators, etc.) As such it definitely seems the most promising for analytical work.

    Summary

    With all of these different options and rich technologies that provide SQL for HBase, HBase’s popularity is bound to grow. Of course with choice comes decisions. Which technology is right for you? Orzota can help analyze your use case and recommend the right set of technologies based not just on technical merits alone, but also taking into consideration its maturity, cost and vendor support. Please contact usfor more information.

  • 相关阅读:
    linux 查看磁盘空间大小
    linux查看防火墙状态及开启关闭命令
    运行安装mysql 报错 [root@localhost mysql-mult]# ./scripts/mysql_install_db  --defaults-file=conf/3306my.cnf FATAL ERROR: please install the following Perl modules before executing ./scripts/mysql_install_
    linux lsof命令详解
    centos6下无法使用lsof命令"-bash: lsof: command not found"
    服务器创建好后怎样使用远程连接工具链接的一些问题
    mysql在linux下的安装
    jmeter-linux下运行
    【WPF】一组CheckBox的全选/全不选功能
    【WPF】TabControl垂直分页栏/选项卡
  • 原文地址:https://www.cnblogs.com/kivi/p/3214097.html
Copyright © 2011-2022 走看看