参考博客:一个hive小案例:使用HIVE进行单词统计, 并把结果存入mysql
问题:统计客户某个年龄有多少人
- 客户表信息
hive> desc customer_info;
OK
id int
name string
age int
Time taken: 0.213 seconds, Fetched: 3 row(s)
hive> select * from customer_info limit 10;
OK
1 tom1 11
2 tom2 12
3 tom3 13
4 tom4 14
5 tom5 15
6 tom6 16
7 tom7 17
8 tom8 18
9 tom9 19
- 切割出age列,并保存到每个数组里面,并命名新的列名为AGE
hive> select split(age,',') as AGE from customer_info limit 10;
OK
["11"]
["12"]
["13"]
["14"]
["15"]
["16"]
["17"]
["18"]
["19"]
["20"]
- 将每个数组炸开
hive> select explode(split(age,',')) as AGE from customer_info limit 10;
OK
11
12
13
14
15
16
17
18
19
20
- 统计每个年龄的数量
hive> select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 2
21 2
22 2
23 2
24 2
25 2
26 2
27 2
28 3
29 3
30 3
31 3
32 3
33 3
34 3
35 3
36 3
37 3
38 2
39 2
40 1
41 1
42 1
43 1
44 1
- 将结果写入新的表中
create table customer_age_count as select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
- 将表导出到HDFS目录下
hive> export table customer_age_count to '/hive_export';
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -lsr /hive_export
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxr-xr-x 3 xiaoqiu supergroup 1303 2018-08-13 23:28 /hive_export/_metadata
drwxr-xr-x - xiaoqiu supergroup 0 2018-08-13 23:28 /hive_export/data
-rwxr-xr-x 3 xiaoqiu supergroup 170 2018-08-13 23:28 /hive_export/data/000000_0