Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类
首先保证版本一致
cd /home/hadoop/hive-1.1.0-cdh5.5.2/lib
查看版本是否一致hbase-server-1.0.0-cdh5.5.2.jar zookeeper-3.4.5-cdh5.5.2.jar hive-hbase-handler-1.1.0-cdh5.5.2.jar
如果不一致删除原有jar cp 原hbase和zookeeper下的jar包到hive的lib中
2.修改 hive/conf下hive-site.xml文件
[hadoop@h91 hive-0.7.1-cdh3u5]$ mkdir tmp
[hadoop@h91 hive-0.7.1-cdh3u5]$ mkdir logs
[hadoop@h91 conf]$ vi hive-site.xml
以下内容 添加到文件底部 </configuration> 之上
<!--
<property>
<name>hive.exec.scratchdir</name>
<value>/home/hadoop/hive-1.1.0-cdh5.5.2/tmp/</value>
</property>
-->
<property>
<name>hive.querylog.location</name>
<value>/home/hadoop/hive-1.1.0-cdh5.5.2/logs/</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>
file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,
file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hbase-server-1.0.0-cdh5.5.2.jar,
file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/zookeeper-3.4.5-cdh5.5.2.jar
</value>
</property>
3.拷贝hbase-server-1.0.0-cdh5.5.2.jar 到所有hadoop节点 lib下(包括master节点)
[hadoop@h91 hbase-0.90.6-cdh3u5]$ cp hbase-server-1.0.0-cdh5.5.2.jar /home/hadoop/hadoop-2.6.0-cdh5.5.2/lib
[hadoop@h91 hbase-0.90.6-cdh3u5]$ scp hbase-server-1.0.0-cdh5.5.2.jar hadoop@h202:/home/hadoop/hadoop-2.6.0-cdh5.5.2/lib
[hadoop@h91 hbase-0.90.6-cdh3u5]$ scp hbase-server-1.0.0-cdh5.5.2.jarhadoop@h203:/home/hadoop/hadoop-2.6.0-cdh5.5.2/lib
4.拷贝hbase/conf下的hbase-site.xml文件到所有hadoop节点(包括master)的hadoop/conf下
[hadoop@h91 conf]$ cp hbase-site.xml /home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/
[hadoop@h91 conf]$ scp hbase-site.xml hadoop@h202:/home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/
[hadoop@h91 conf]$ scp hbase-site.xml hadoop@h203:/home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/
5.启动hive
[hadoop@h91 hive-0.7.1-cdh3u5]$ bin/hive -hiveconf hbase.zookeeper.quorum=h201,h202,h203
----------------------------------------------------------------
例子
1创建 hbase识别的表
hive>
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "sq");
***(hbase.table.name 定义在hbase的table名称
hbase.columns.mapping 定义在hbase的列族)****
2.创建 hive表
hive> create table ha1(id int,name string)
row format delimited
fields terminated by ' '
stored as textfile;
[hadoop@h91 ~]$ vi ha1.txt
11 zs
22 ls
33 ww
hive> load data local inpath '/home/hadoop/ha1.txt' into table ha1;
hive> insert into table hbase_table_1 select * from ha1;
~~hive> select * from hbase_table_1;
3.[hadoop@h91 hbase-0.90.6-cdh3u5]$ bin/hbase shell
hbase(main):002:0> scan 'sq'
(能看到结果 说明hive把数据存到hbase中)
第二种方法直接修改 hive/conf下hive-site.xml文件
<!--
<property>
<name>hive.exec.scratchdir</name>
<value>/home/hadoop/hive-1.1.0-cdh5.5.2/tmp/</value>
</property>
-->
<property>
<name>hive.querylog.location</name>
<value>/home/hadoop/hive-1.1.0-cdh5.5.2/logs/</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>
file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,
file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/guava-14.0.1.jar,
file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-common-1.0.0-cdh5.5.2.jar,
file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-client-1.0.0-cdh5.5.2.jar,
file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-server-1.0.0-cdh5.5.2.jar,
file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/netty-all-4.0.23.Final.jar,
file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-hadoop2-compat-1.0.0-cdh5.5.2.jar,
file:///home/hadoop/zookeeper-3.4.5-cdh5.5.2/zookeeper-3.4.5-cdh5.5.2.jar
</value>
</property>
然后重启hive即可。
HIVE 关联存在的hbase表
CREATE external TABLE hive_hbase_offset(key string, value map<string,string>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:feed.meta.kafka_offset");
当Hive使用overwrite关键字进行插入数据时。删除Hive表对HBase没有影响,但是先删除HBase表Hive就会报TableNotFoundException,但是删除Hive不会报上面这个错。map<string,string>第一个string代表列名,第二个string代表hbase中的values
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,F:")
解释一下这句,这句功能是让HBase来识别Hive表中数据,默认Hive表中第一个字段为HBase的主键,也就是这里设置的key(似乎是固定写法),F表示列族名,这里没有设置列名,所以: 后面为空。
外部表不支持load data操作,所以需要insert overtwrite操作来插入数据
创建表例子
CREATE external TABLE CY_relationship(rowkey string, relationshipPermId string, relationshipTypeName string,subjectPermId string,objectPermId string,RPPT string,RPPV string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:rpid,d:type,d:sbjtpid,d:objt,d:propt[0],d:propv[0]")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:master.data.relationship");
CREATE external TABLE CY_master_instrument(rowkey string,isin string, instrumentPermid string,dataSource string,relationshipPermId map<string,string>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:id[0]/val,d:pid,d:src,d:rel.*")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:master.data.instrument");
d:rel.*将列族名开头为d:rel全部放入map集合中。
字段为map集合的查询方式:
select * from cy_aggr_quote where concat_ws(',',map_values(relationshippermid)) like '%200764004179%'