zoukankan      html  css  js  c++  java
  • hive整合hbase

    Hive整合HBase后的好处:

    通过Hive把数据加载到HBase中,数据源可以是文件也可以是Hive中的表。

    通过整合,让HBase支持JOIN、GROUP等SQL查询语法。

    通过整合,不仅可完成HBase的数据实时查询,也可以使用Hive查询HBase中的数据完成复杂的数据分析。

    使用Hive操作HBase中的表,只是提供了便捷性,hiveQL引擎使用的是MapReduce,对于性能上,表现比较糟糕,在实际应用过程中可针对不同的场景酌情使用。

    配置

    因为Hive与HBase整合的实现是利用两者本身对外的API接口互相通信来完成的,其具体工作交由Hive的lib目录中的hive-hbase-handler-.jar工具类来实现。所以只需要将hive的 hive-hbase-handler-.jar 复制到hbase/lib中就可以了。

     [root@host lib]# cp hive-hbase-handler-2.1.1.jar $HBASE_HOME/lib

    测试

    通过hive创建hbase表

    hive> CREATE TABLE t_name (id INT, NAME string)
        >      stored BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        >     WITH serdeproperties (
        >     "hbase.columns.mapping" = ":key,st1:name")
        >    tblproperties ("hbase.table.name" = "t_name","hbase.mapred.output.outputtable" = "t_name");
    OK
    Time taken: 1.625 seconds

    在hive中查看:

    hive> show tables;
    OK
    cust_copy
    t_name
    Time taken: 0.127 seconds, Fetched: 2 row(s)

    hive> show create table t_name;
    OK
    CREATE TABLE `t_name`(
      `id` int COMMENT '',
      `name` string COMMENT '')
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.hbase.HBaseSerDe'
    STORED BY
      'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES (
      'hbase.columns.mapping'=':key,st1:name',
      'serialization.format'='1')
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}',
      'hbase.mapred.output.outputtable'='t_name',
      'hbase.table.name'='t_name',
      'numFiles'='0',
      'numRows'='0',
      'rawDataSize'='0',
      'totalSize'='0',
      'transient_lastDdlTime'='1526546542')
    Time taken: 0.308 seconds, Fetched: 19 row(s)

    在HBASE中查看

    hbase(main):004:0> list 't_name'
    TABLE                                                                                                                                        
    t_name                                                                                                                                       
    1 row(s)
    Took 0.0092 seconds                                                                                                                          
    => ["t_name"]

    在hbase插入数据并查看数据:

    hbase(main):006:0> put 't_name','1','st1:name','xiaoma'
    Took 0.3709 seconds                                                                                                                          
    hbase(main):007:0> put 't_name','2','st1:name','xiaozhang'
    Took 0.0038 seconds                                                                                                                          
    hbase(main):008:0> put 't_name','3','st1:name','tianyongtao'
    Took 0.0051 seconds

    hbase(main):009:0> scan 't_name'
    ROW                                  COLUMN+CELL                                                                                             
     1                                   column=st1:name, timestamp=1526547097913, value=xiaoma                                                  
     2                                   column=st1:name, timestamp=1526547115702, value=xiaozhang                                               
     3                                   column=st1:name, timestamp=1526547130241, value=tianyongtao                                             
    3 row(s)
    Took 0.0327 seconds 

    通过hive查询:

    hive> select * from t_name;
    OK
    t_name.id       t_name.name
    1       xiaoma
    2       xiaozhang
    3       tianyongtao
    Time taken: 0.414 seconds, Fetched: 3 row(s)
    hive> select * from t_name where id=1;
    OK
    t_name.id       t_name.name
    1       xiaoma
    Time taken: 1.246 seconds, Fetched: 1 row(s)
    hive> select * from t_name where id>1;
    OK
    t_name.id       t_name.name
    2       xiaozhang
    3       tianyongtao
    Time taken: 0.383 seconds, Fetched: 2 row(s)

    删除表测试:

    hive> drop table t_name;
    OK
    Time taken: 1.851 seconds

    经查hbase中的t_name表被同步删除了

    多列族

    hive> CREATE TABLE t_role (id INT, NAME string,sex int,platid int)
        >      stored BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        >     WITH serdeproperties (
        >     "hbase.columns.mapping" = ":key,info:NAME,info:sex,plat:platid")
        >    tblproperties ("hbase.table.name" = "t_role","hbase.mapred.output.outputtable" = "t_role");
    OK
    Time taken: 3.179 seconds

    hbase(main):039:0> scan 't_role'
    ROW                                           COLUMN+CELL                                                                                                                       
     1                                            column=info:name, timestamp=1526549089030, value=feige                                                                            
     1                                            column=info:sex, timestamp=1526549206235, value=0                                                                                 
     1                                            column=plat:platid, timestamp=1526549241774, value=785                                                                            
    1 row(s)
    Took 0.0287 seconds

    hive> select * from t_role;
    OK
    t_role.id       t_role.name     t_role.sex      t_role.platid
    1       NULL    0       785
    Time taken: 0.417 seconds, Fetched: 1 row(s)

    发现name字段为空

    hbase(main):040:0> put 't_role','1','info:NAME','feige'
    Took 0.0033 seconds

    hive> select * from t_role;
    OK
    t_role.id       t_role.name     t_role.sex      t_role.platid
    1       feige   0       785
    Time taken: 0.422 seconds, Fetched: 1 row(s)

    发现name字段被填充因此要注意字段的大小写

    -----------------------

    spark访问hive-hbase表,需要制定jars包如下:

    spark-shell --master local-cluster[3,2,1024] --num-executors 3 --executor-memory 1g --jars /root/hive/apache-hive-2.1.1/lib/hive-hbase-handler-2.1.1.jar

  • 相关阅读:
    SQL
    HTTP协议
    工具命令
    安全策略
    日志与审核
    python视频教程免费下载,百度云网盘资源,全套!
    《Python基础教程(第3版)》PDF电子版百度云网盘免费下载
    老男孩Python全栈开发视频教程全套完整版!免费分享!
    Pycharm激活码分享,2020最新Pycharm永久激活码~
    老男孩Python视频教程全套完整版!无偿分享~
  • 原文地址:https://www.cnblogs.com/playforever/p/9051796.html
Copyright © 2011-2022 走看看