zoukankan html css js c++ java

Hive-1.2.1_02_简单操作与访问方式

1. Hive默认显示当前使用库

 1 1、需要用时，即时配置，在cli执行属性设置，这种配置方式，当重新打开cli时，就会生效：
 2 hive> set hive.cli.print.current.db=true;
 3 
 4 2、一次配置，永久生效，在当前用户的HOME目录下，新建.hiverc文件，把属性设置命令放置到该文件中，每次打开cli时，都会先执行该文件。
 5 [yun@mini01 ~]$ pwd
 6 /app 
 7 [yun@mini01 ~]$ cat .hiverc 
 8 set hive.cli.print.current.db=true;
 9 
10 3、在hive配置文件中添加配置【推荐】,上一篇文章hive配置中已经有了该配置项
11   <!-- 显示当前使用的数据库 -->
12   <property>
13     <name>hive.cli.print.current.db</name>
14     <value>true</value>
15     <description>Whether to include the current database in the Hive prompt.</description>
16   </property>

2. 创建库

 1 # 没有显示当前使用库
 2 [yun@mini01 ~]$ hive
 3 
 4 Logging initialized using configuration in jar:file:/app/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
 5 hive> show databases;  # 默认库为default
 6 OK
 7 default
 8 Time taken: 0.774 seconds, Fetched: 1 row(s)
 9 # 创建库
10 hive> create database zhang;
11 OK
12 Time taken: 0.168 seconds
13 hive> show databases;  
14 OK
15 default
16 zhang
17 Time taken: 0.02 seconds, Fetched: 2 row(s)

浏览器访问

3. 创建表

 1 # 默认显示当前使用库
 2 hive (default)> show databases;
 3 OK
 4 default
 5 zhang
 6 Time taken: 0.729 seconds, Fetched: 2 row(s)
 7 hive (default)> use zhang;
 8 OK
 9 Time taken: 0.036 seconds
10 hive (zhang)> create table t_sz01(id int, name string)
11             > row format delimited
12             > fields terminated by ',';
13 OK
14 Time taken: 0.187 seconds
15 hive (zhang)> show tables;
16 OK
17 t_sz01
18 Time taken: 0.031 seconds, Fetched: 1 row(s)

浏览器访问

4. 创建数据并上传

 1 [yun@mini01 hive]$ cat sz.dat 
 2 1,zhangsan
 3 5,李四
 4 3,wangwu
 5 2,赵六
 6 4,zhouqi
 7 6,孙八
 8 [yun@mini01 hive]$ hadoop fs -put sz.dat /user/hive/warehouse/zhang.db/t_sz01   # 上传 
 9 [yun@mini01 hive]$ hadoop fs -ls /user/hive/warehouse/zhang.db/t_sz01/
10 Found 1 items
11 -rw-r--r--   2 yun supergroup         56 2018-07-03 21:27 /user/hive/warehouse/zhang.db/t_sz01/sz.dat
12 [yun@mini01 hive]$ hadoop fs -cat /user/hive/warehouse/zhang.db/t_sz01/sz.dat
13 1,zhangsan
14 5,李四
15 3,wangwu
16 2,赵六
17 4,zhouqi
18 6,孙八

5. Hive查询数据

 1 hive (zhang)> show tables;
 2 OK
 3 t_sz01
 4 Time taken: 0.028 seconds, Fetched: 1 row(s)
 5 hive (zhang)> select * from t_sz01;  # 全表查询 
 6 OK
 7 1    zhangsan
 8 5    李四
 9 3    wangwu
10 2    赵六
11 4    zhouqi
12 6    孙八
13 Time taken: 0.264 seconds, Fetched: 6 row(s)
14 hive (zhang)> select count(1) from t_sz01;  # 表数据条数 
15 Query ID = yun_20180703213443_ebca743c-2025-405a-8832-59359e1566c2
16 Total jobs = 1
17 Launching Job 1 out of 1
18 Number of reduce tasks determined at compile time: 1
19 In order to change the average load for a reducer (in bytes):
20   set hive.exec.reducers.bytes.per.reducer=<number>
21 In order to limit the maximum number of reducers:
22   set hive.exec.reducers.max=<number>
23 In order to set a constant number of reducers:
24   set mapreduce.job.reduces=<number>
25 Starting Job = job_1530619476293_0001, Tracking URL = http://mini02:8088/proxy/application_1530619476293_0001/
26 Kill Command = /app/hadoop/bin/hadoop job  -kill job_1530619476293_0001
27 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
28 2018-07-03 21:34:53,810 Stage-1 map = 0%,  reduce = 0%
29 2018-07-03 21:35:00,224 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.5 sec
30 2018-07-03 21:35:07,882 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.37 sec
31 MapReduce Total cumulative CPU time: 6 seconds 370 msec
32 Ended Job = job_1530619476293_0001
33 MapReduce Jobs Launched: 
34 Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.37 sec   HDFS Read: 6513 HDFS Write: 2 SUCCESS
35 Total MapReduce CPU Time Spent: 6 seconds 370 msec
36 OK
37 6
38 Time taken: 25.312 seconds, Fetched: 1 row(s)
39 hive (zhang)> select id,name from t_sz01 where id >3;  # 查询id>3 
40 OK
41 5    李四
42 4    zhouqi
43 6    孙八
44 Time taken: 0.126 seconds, Fetched: 3 row(s)
45 hive (zhang)> select id,name from t_sz01 where id >3 limit 2;  # 不能使用 limit m,n 
46 OK
47 5    李四
48 4    zhouqi
49 Time taken: 0.072 seconds, Fetched: 2 row(s)
50 hive (zhang)> select id, name from t_sz01 order by name;  # 使用order by 排序
51 Query ID = yun_20180703214314_db222afe-3287-4c8e-8077-73aa4fec62ef
52 Total jobs = 1
53 Launching Job 1 out of 1
54 Number of reduce tasks determined at compile time: 1
55 In order to change the average load for a reducer (in bytes):
56   set hive.exec.reducers.bytes.per.reducer=<number>
57 In order to limit the maximum number of reducers:
58   set hive.exec.reducers.max=<number>
59 In order to set a constant number of reducers:
60   set mapreduce.job.reduces=<number>
61 Starting Job = job_1530619476293_0002, Tracking URL = http://mini02:8088/proxy/application_1530619476293_0002/
62 Kill Command = /app/hadoop/bin/hadoop job  -kill job_1530619476293_0002
63 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
64 2018-07-03 21:43:25,676 Stage-1 map = 0%,  reduce = 0%
65 2018-07-03 21:43:34,166 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.64 sec
66 2018-07-03 21:43:40,606 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.85 sec
67 MapReduce Total cumulative CPU time: 4 seconds 850 msec
68 Ended Job = job_1530619476293_0002
69 MapReduce Jobs Launched: 
70 Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.85 sec   HDFS Read: 5789 HDFS Write: 74 SUCCESS
71 Total MapReduce CPU Time Spent: 4 seconds 850 msec
72 OK
73 3    wangwu
74 1    zhangsan
75 4    zhouqi
76 6    孙八
77 5    李四
78 2    赵六
79 Time taken: 26.768 seconds, Fetched: 6 row(s)

MapReduce信息

1 http://mini02:8088

6. Hive的访问方式

6.1. Hive交互shell

1 # 之间已经添加环境变量
2 [yun@mini01 ~]$ hive

6.2. Hive thrift服务

　　启动方式，（例如是在mini01上）：

1 # 之间已经添加环境变量
2 启动为前台： hiveserver2
3 启动为后台： nohup hiveserver2 1>/app/hive/logs/hiveserver.log 2>/app/hive/logs/hiveserver.err &
4 # 没有 /app/hive/logs 目录就创建

　　启动成功后，可以在别的节点上用beeline去连接

方式1

 1 # 由于没有在其他机器安装，所以还是在本机用beeline去连接
 2 [yun@mini01 bin]$ beeline 
 3 Beeline version 1.2.1 by Apache Hive
 4 beeline> !connect jdbc:hive2://mini01:10000   # jdbc连接 可以是mini01、127.0.0.0、10.0.0.11、172.16.1.11 
 5 Connecting to jdbc:hive2://mini01:10000
 6 Enter username for jdbc:hive2://mini01:10000: yun
 7 Enter password for jdbc:hive2://mini01:10000: 
 8 Connected to: Apache Hive (version 1.2.1)
 9 Driver: Hive JDBC (version 1.2.1)
10 Transaction isolation: TRANSACTION_REPEATABLE_READ
11 0: jdbc:hive2://mini01:10000>

方式2

1 # 或者启动就连接：
2 [yun@mini01 ~]$ beeline -u jdbc:hive2://mini01:10000 -n yun

接下来就可以做正常sql查询了

例如：

 1 0: jdbc:hive2://mini01:10000> show databases;
 2 +----------------+--+
 3 | database_name  |
 4 +----------------+--+
 5 | default        |
 6 | zhang          |
 7 +----------------+--+
 8 2 rows selected (0.437 seconds)
 9 0: jdbc:hive2://mini01:10000> use zhang;
10 No rows affected (0.058 seconds)
11 0: jdbc:hive2://mini01:10000> show tables;
12 +-----------+--+
13 | tab_name  |
14 +-----------+--+
15 | t_sz01    |
16 +-----------+--+
17 1 row selected (0.054 seconds)
18 0: jdbc:hive2://mini01:10000> select * from t_sz01;
19 +------------+--------------+--+
20 | t_sz01.id  | t_sz01.name  |
21 +------------+--------------+--+
22 | 1          | zhangsan     |
23 | 5          | 李四         |
24 | 3          | wangwu       |
25 | 2          | 赵六         |
26 | 4          | zhouqi       |
27 | 6          | 孙八         |
28 +------------+--------------+--+
29 6 rows selected (0.641 seconds)
30 0: jdbc:hive2://10.0.0.11:10000> select count(1) from t_sz01;  # 条数查询 
31 INFO  : Number of reduce tasks determined at compile time: 1
32 INFO  : In order to change the average load for a reducer (in bytes):
33 INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
34 INFO  : In order to limit the maximum number of reducers:
35 INFO  :   set hive.exec.reducers.max=<number>
36 INFO  : In order to set a constant number of reducers:
37 INFO  :   set mapreduce.job.reduces=<number>
38 INFO  : number of splits:1
39 INFO  : Submitting tokens for job: job_1530619476293_0003
40 INFO  : The url to track the job: http://mini02:8088/proxy/application_1530619476293_0003/
41 INFO  : Starting Job = job_1530619476293_0003, Tracking URL = http://mini02:8088/proxy/application_1530619476293_0003/
42 INFO  : Kill Command = /app/hadoop/bin/hadoop job  -kill job_1530619476293_0003
43 INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
44 INFO  : 2018-07-03 22:58:43,405 Stage-1 map = 0%,  reduce = 0%
45 INFO  : 2018-07-03 22:58:49,882 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.56 sec
46 INFO  : 2018-07-03 22:58:57,815 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.28 sec
47 INFO  : MapReduce Total cumulative CPU time: 5 seconds 280 msec
48 INFO  : Ended Job = job_1530619476293_0003
49 +------+--+
50 | _c0  |
51 +------+--+
52 | 6    |
53 +------+--+
54 1 row selected (25.433 seconds)

6.3. hive -e "HiveQL"

适用于写脚本

 1 [yun@mini01 ~]$ hive -e "use exercise; select * from student;"
 2 
 3 Logging initialized using configuration in jar:file:/app/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
 4 OK
 5 Time taken: 1.109 seconds
 6 OK
 7 95001    李勇    男    20    CS
 8 95002    刘晨    女    19    IS
 9 95003    王敏    女    22    MA
10 95004    张立    男    19    IS
11 95005    刘刚    男    18    MA
12 95006    孙庆    男    23    CS
13 95007    易思玲    女    19    MA
14 95008    李娜    女    18    CS
15 95009    梦圆圆    女    18    MA
16 95010    孔小涛    男    19    CS
17 Time taken: 0.786 seconds, Fetched: 10 row(s)

6.4. hive -f 'test.sql'

　　适用于hive直接调用一个脚本，该脚本中全是hive的类SQL语句。

1 # aztest.sql  脚本名称
2 CREATE DATABASE IF NOT EXISTS azkaban;
3 use azkaban;
4 DROP TABLE IF EXISTS aztest;
5 DROP TABLE IF EXISTS azres;
6 create table aztest(id int,name string) row format delimited fields terminated by ',';
7 load data inpath '/aztest/hiveinput/azkb.dat' into table aztest;
8 create table azres row format delimited fields terminated by '#' as select * from aztest;
9 insert overwrite directory '/aztest/hiveoutput' select count(1) from aztest;

7. 文章参考

1、Hive创建表格报【Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.

2、实战 - Hive cli命令行工具默认显示当前数据库

查看全文

相关阅读:
Hadoop-2.4.0中HDFS文件块大小默认为128M
看两种截然不同的设计理念：分布式文件系统支持大和小文件的思考
 SecondaryNameNode中的“Inconsistent checkpoint fields”错误原因
 查看HDFS集群信息
 执行“hdfs dfs -ls”时报ConnectException
启动Hadoop HDFS时的“Incompatible clusterIDs”错误原因分析
 “hdfs dfs -ls”命令的使用
 SuSE Linux上修改主机名
 strerror线程安全分析
 两个SSH2间免密码登录

原文地址：https://www.cnblogs.com/zhanglianghhh/p/9348958.html