zoukankan      html  css  js  c++  java
  • hive实现单词统计

    参考博客:一个hive小案例:使用HIVE进行单词统计, 并把结果存入mysql

    问题:统计客户某个年龄有多少人

    • 客户表信息
    hive> desc customer_info;
    OK
    id                      int
    name                    string
    age                     int
    Time taken: 0.213 seconds, Fetched: 3 row(s)
    hive> select * from customer_info limit 10;
    OK
    1       tom1    11
    2       tom2    12
    3       tom3    13
    4       tom4    14
    5       tom5    15
    6       tom6    16
    7       tom7    17
    8       tom8    18
    9       tom9    19
    
    
    • 切割出age列,并保存到每个数组里面,并命名新的列名为AGE
    hive> select split(age,',') as AGE from customer_info limit 10;
    OK
    ["11"]
    ["12"]
    ["13"]
    ["14"]
    ["15"]
    ["16"]
    ["17"]
    ["18"]
    ["19"]
    ["20"]
    
    • 将每个数组炸开
    hive> select explode(split(age,',')) as AGE from customer_info limit 10;
    OK
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    • 统计每个年龄的数量
    hive> select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
    11      1
    12      1
    13      1
    14      1
    15      1
    16      1
    17      1
    18      1
    19      1
    20      2
    21      2
    22      2
    23      2
    24      2
    25      2
    26      2
    27      2
    28      3
    29      3
    30      3
    31      3
    32      3
    33      3
    34      3
    35      3
    36      3
    37      3
    38      2
    39      2
    40      1
    41      1
    42      1
    43      1
    44      1
    
    • 将结果写入新的表中
     create table customer_age_count as select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
    • 将表导出到HDFS目录下
    hive> export table customer_age_count to '/hive_export';
    [xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -lsr /hive_export
    lsr: DEPRECATED: Please use 'ls -R' instead.
    -rwxr-xr-x   3 xiaoqiu supergroup       1303 2018-08-13 23:28 /hive_export/_metadata
    drwxr-xr-x   - xiaoqiu supergroup          0 2018-08-13 23:28 /hive_export/data
    -rwxr-xr-x   3 xiaoqiu supergroup        170 2018-08-13 23:28 /hive_export/data/000000_0
    
    •  
    欢迎关注我的公众号:小秋的博客 CSDN博客:https://blog.csdn.net/xiaoqiu_cr github:https://github.com/crr121 联系邮箱:rongchen633@gmail.com 有什么问题可以给我留言噢~
  • 相关阅读:
    python计算时间差的方法
    Apache Prefork、Worker和Event三種MPM分析
    nagios监控mysql
    linux安装nagios客户端
    简单易懂的snmpd.conf配置文件说明
    Linux基本监控项目
    python 2.0 s12 day5 常用模块介绍
    python2.0 s12 day4
    ubuntu常用系统命令
    ubuntu编译安装libimobiledevice,查看ios 日志
  • 原文地址:https://www.cnblogs.com/flyingcr/p/10326857.html
Copyright © 2011-2022 走看看