1、Hive简介:
--------
解决繁琐的Map和reduce分析,设计,拆解,以及编码,编译过程,
2、Hive架构原理:
---------------
3、Hive服务器搭建:
-----------------
A、在客户端安装Hive1.1.2
B、配置Hive环境
---[hadoop@CloudDeskTop bin]$ vi hive-config.sh
export JAVA_HOME=/softwarek/jdk1.7.0_79
export HADOOP_HOME=/software/hadoop-2.7.3
export HIVE_HOME=/software/hive-1.2.2
[hadoop@CloudDeskTop conf]$ cp hive-default.xml.template hive-site.xml
[hadoop@CloudDeskTop conf]$ vi hive-site.xml
2911 <name>hive.server2.logging.operation.log.location</name>
2912 <value>/tmp/hive/operation_logs</value>
51 <name>hive.exec.local.scratchdir</name>
52 <value>/tmp/hive</value>
56 <name>hive.downloaded.resources.dir</name>
57 <value>/tmp/hive/resources</value>
[hadoop@CloudDeskTop conf]$ cp -a hive-log4j.properties.template hive-log4j.properties
[hadoop@CloudDeskTop conf]$ vi hive-log4j.properties +72
#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
4、启动HDFS和YARN集群,在客户端启动Hive
A、初识:
----交互式执行sql语句
[hadoop@CloudDeskTop bin]$ ./hive --hiveconf hive.root.logger=ERROR,console
hive>show databases;
hive> create database bunfly;
hive> use bunfly;
hive> create table t_user(uid int,uname string,password string);
hive>show tables;
----非交互式运行sql语句
[hadoop@CloudDeskTop bin]$ ./hive -S -e"show databases;"
[hadoop@CloudDeskTop src]$ vi test.sql
[hadoop@CloudDeskTop bin]$ ./hive -S -f /home/hadoop/test/hive/src/test.sql
[hadoop@CloudDeskTop bin]$ echo -e "use bnyw;
select * from myuser;">>/install/myuser.sql
[hadoop@CloudDeskTop bin]$ ./hive -S -e "select * from bunfly.t_user">>/install/result.data
[hadoop@CloudDeskTop bin]$ ./hive -S<<EOF
B、进阶:
换行符:row format delimited(默认
)
字段符:fields terminated by ' '
hive> create table bunfly.t_user(uid int,uname string,uage int,uhight double) row format delimited fields
terminated by ' ';
----数据导入操作:
[hadoop@CloudDeskTop src]$ hdfs dfs -put myuser02 /user/hive/warehouse/bunfly.db/t_user
将HDFS中数据导入Hive表;
------
注意:
使用HDFS上传和使用load data导入本地文件从本质意义上讲都是文件的转移过程,
如果转移的文件是来自于本地则发生数据拷贝,如果转移的文件是来自于HDFS文件系统
则发生数据移动
overwrite关键字在load data句法中将导致hive表中的数据先被清空,然后再转移数据,
即发生hive表的覆盖写入操作;如果没有overwrite关键字则发生数据文件的追加操作
Hive不支持delete和update两种DML操作
将所需数据导出到指定本地目录下:
hive> insert overwrite local directory '/home/hadoop/test/hive/dst/out.data' select * from t_user
将所需数据导出到指定集群目录下:
hive> insert overwrite directory '/data/out.data' select * from t_user
表操作:
多表的数据迁移:
----------A、
insert into bunfly.myuser select * from bunfly.t_user where uid=1;
----------B、注意拷贝的表和存在的表的格式是否一至(Tab--> Ctrl+v+a)
hdfs dfs -cp /user/hive/warehouse/bunfly.db/t_user/myuser03 /user/hive/warehouse/bunfly.db/myuser
表之间的数据复制:
---------------
insert into bunfly.myuser select * from bunfly.t_user where userid=1;
创建分隔符为Tab键的表:
--------------------
create table if not exists bunfly.t_user(uid int,uname string,uage int,uhight double) row format delimited
fields terminated by ' ';
Hive高级运维部分:
准备工作:
删除bnyw库
1、创建员工表:
create table if not exists emp(eno int,ename string,eage int,bithday date,sal double,com double,gender
string,dno int) row format delimited fields terminated by ' ';
2、创建部门表:
create table dept(dno int,dname string,loc string) row format delimited fields terminated by ' ';
a、根据部门id和性别
hive> select dno,gender,count(1) from emp group by dno,gender;
b、根据部门id和性别,然后根据人数降序排列
hive> select dno,gender,count(1) renshu from emp group by dno,gender order by renshu desc;
c、多列排序
hive> select eno,ename,sal,com from emp order by sal desc,com desc;
d、多表连接与子查询
hive> select e.*,d.* from emp e ,dept d where e.dno=d.dno;(sql92语法)
hive> select e.*,d.* from emp e inner join dept d on e.dno=d.dno;(sql99语法)
hive> select d.dno avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by dno
order by avgsal;
子查询:
select d.dname,avgsal from (select d.dno,avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where
eage>20 group by d.dno order by avgsal) mid,dept d where mid.dno=d.dno;
分页查询:
hive> select row_number() over(),e.* from emp e;
hive> select row_number() over(order by sal desc),e.* from emp e;
select * from (select row_number() over(order by sal desc) seq,e.* from emp e) mid where mid.seq>5 and
mid.seq<11;