zoukankan      html  css  js  c++  java
  • Hive函数:SUM,AVG,MIN,MAX

    转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

    再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

    数据准备:

    echo ''>data_file.txt
    vim data_file.txt 
    cookie1,2015-04-10,1
    cookie1,2015-04-11,5
    cookie1,2015-04-12,7
    cookie1,2015-04-13,3
    cookie1,2015-04-14,2
    cookie1,2015-04-15,4
    cookie1,2015-04-16,4
    cookie2,2015-04-10,6
    cookie2,2015-04-11,5
    cookie2,2015-04-12,7
    cookie2,2015-04-13,4
    cookie2,2015-04-14,3
    cookie2,2015-04-15,5
    cookie2,2015-04-16,5
    hadoop fs -rm -r /user/jrf/test_data
    hadoop fs -mkdir /user/jrf/test_data
    hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
    drop table if exists test_data;
    create EXTERNAL TABLE test_data ( cookieid string, createtime string, --day pv INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/user/jrf/test_data/';
    select * from test_data; +---------------------+-----------------------+---------------+--+ | test_data.cookieid | test_data.createtime | test_data.pv | +---------------------+-----------------------+---------------+--+ | cookie1 | 2015-04-10 | 1 | | cookie1 | 2015-04-11 | 5 | | cookie1 | 2015-04-12 | 7 | | cookie1 | 2015-04-13 | 3 | | cookie1 | 2015-04-14 | 2 | | cookie1 | 2015-04-15 | 4 | | cookie1 | 2015-04-16 | 4 | | cookie2 | 2015-04-10 | 6 | | cookie2 | 2015-04-11 | 5 | | cookie2 | 2015-04-12 | 7 | | cookie2 | 2015-04-13 | 4 | | cookie2 | 2015-04-14 | 3 | | cookie2 | 2015-04-15 | 5 | | cookie2 | 2015-04-16 | 5 | +---------------------+-----------------------+---------------+--+

    SUM — 注意,结果和ORDER BY相关,默认为升序

    SELECT cookieid,createtime,pv,
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
    SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
    FROM test_data order by cookieid,createtime;
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookie1   | 2015-04-10  | 1   | 1    | 1    | 26   | 1    | 6    | 26   |
    | cookie1   | 2015-04-11  | 5   | 6    | 6    | 26   | 6    | 13   | 25   |
    | cookie1   | 2015-04-12  | 7   | 13   | 13   | 26   | 13   | 16   | 20   |
    | cookie1   | 2015-04-13  | 3   | 16   | 16   | 26   | 16   | 18   | 13   |
    | cookie1   | 2015-04-14  | 2   | 18   | 18   | 26   | 17   | 21   | 10   |
    | cookie1   | 2015-04-15  | 4   | 22   | 22   | 26   | 16   | 20   | 8    |
    | cookie1   | 2015-04-16  | 4   | 26   | 26   | 26   | 13   | 13   | 4    |
    | cookie2   | 2015-04-10  | 6   | 6    | 6    | 35   | 6    | 11   | 35   |
    | cookie2   | 2015-04-11  | 5   | 11   | 11   | 35   | 11   | 18   | 29   |
    | cookie2   | 2015-04-12  | 7   | 18   | 18   | 35   | 18   | 22   | 24   |
    | cookie2   | 2015-04-13  | 4   | 22   | 22   | 35   | 22   | 25   | 17   |
    | cookie2   | 2015-04-14  | 3   | 25   | 25   | 35   | 19   | 24   | 13   |
    | cookie2   | 2015-04-15  | 5   | 30   | 30   | 35   | 19   | 24   | 10   |
    | cookie2   | 2015-04-16  | 5   | 35   | 35   | 35   | 17   | 17   | 5    |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
    pv2: 同pv1
    pv3: 分组内(cookie1)所有的pv累加
    pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
    pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
    pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

    如果不指定ROWS BETWEEN,默认为从起点到当前行;
    如果不指定ORDER BY,则将分组内所有值累加;
    关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING:往前
    FOLLOWING:往后
    CURRENT ROW:当前行
    UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

    –其他AVG,MIN,MAX,和SUM用法一样。

    --AVG
    SELECT cookieid,createtime,pv,
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
    AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
    FROM test_data  order by cookieid,createtime;
    +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
    | cookieid  | createtime  | pv  |         pv1         |         pv2         |         pv3         |        pv4         |        pv5         |         pv6         |
    +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
    | cookie1   | 2015-04-10  | 1   | 1.0                 | 1.0                 | 3.7142857142857144  | 1.0                | 3.0                | 3.7142857142857144  |
    | cookie1   | 2015-04-11  | 5   | 3.0                 | 3.0                 | 3.7142857142857144  | 3.0                | 4.333333333333333  | 4.166666666666667   |
    | cookie1   | 2015-04-12  | 7   | 4.333333333333333   | 4.333333333333333   | 3.7142857142857144  | 4.333333333333333  | 4.0                | 4.0                 |
    | cookie1   | 2015-04-13  | 3   | 4.0                 | 4.0                 | 3.7142857142857144  | 4.0                | 3.6                | 3.25                |
    | cookie1   | 2015-04-14  | 2   | 3.6                 | 3.6                 | 3.7142857142857144  | 4.25               | 4.2                | 3.3333333333333335  |
    | cookie1   | 2015-04-15  | 4   | 3.6666666666666665  | 3.6666666666666665  | 3.7142857142857144  | 4.0                | 4.0                | 4.0                 |
    | cookie1   | 2015-04-16  | 4   | 3.7142857142857144  | 3.7142857142857144  | 3.7142857142857144  | 3.25               | 3.25               | 4.0                 |
    | cookie2   | 2015-04-10  | 6   | 6.0                 | 6.0                 | 5.0                 | 6.0                | 5.5                | 5.0                 |
    | cookie2   | 2015-04-11  | 5   | 5.5                 | 5.5                 | 5.0                 | 5.5                | 6.0                | 4.833333333333333   |
    | cookie2   | 2015-04-12  | 7   | 6.0                 | 6.0                 | 5.0                 | 6.0                | 5.5                | 4.8                 |
    | cookie2   | 2015-04-13  | 4   | 5.5                 | 5.5                 | 5.0                 | 5.5                | 5.0                | 4.25                |
    | cookie2   | 2015-04-14  | 3   | 5.0                 | 5.0                 | 5.0                 | 4.75               | 4.8                | 4.333333333333333   |
    | cookie2   | 2015-04-15  | 5   | 5.0                 | 5.0                 | 5.0                 | 4.75               | 4.8                | 5.0                 |
    | cookie2   | 2015-04-16  | 5   | 5.0                 | 5.0                 | 5.0                 | 4.25               | 4.25               | 5.0                 |
    +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
    --MIN
    SELECT cookieid,createtime,pv,
    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1 
    MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
    FROM test_data order by cookieid,createtime;
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookie1   | 2015-04-10  | 1   | 1    | 1    | 1    | 1    | 1    | 1    |
    | cookie1   | 2015-04-11  | 5   | 1    | 1    | 1    | 1    | 1    | 2    |
    | cookie1   | 2015-04-12  | 7   | 1    | 1    | 1    | 1    | 1    | 2    |
    | cookie1   | 2015-04-13  | 3   | 1    | 1    | 1    | 1    | 1    | 2    |
    | cookie1   | 2015-04-14  | 2   | 1    | 1    | 1    | 2    | 2    | 2    |
    | cookie1   | 2015-04-15  | 4   | 1    | 1    | 1    | 2    | 2    | 4    |
    | cookie1   | 2015-04-16  | 4   | 1    | 1    | 1    | 2    | 2    | 4    |
    | cookie2   | 2015-04-10  | 6   | 6    | 6    | 3    | 6    | 5    | 3    |
    | cookie2   | 2015-04-11  | 5   | 5    | 5    | 3    | 5    | 5    | 3    |
    | cookie2   | 2015-04-12  | 7   | 5    | 5    | 3    | 5    | 4    | 3    |
    | cookie2   | 2015-04-13  | 4   | 4    | 4    | 3    | 4    | 3    | 3    |
    | cookie2   | 2015-04-14  | 3   | 3    | 3    | 3    | 3    | 3    | 3    |
    | cookie2   | 2015-04-15  | 5   | 3    | 3    | 3    | 3    | 3    | 5    |
    | cookie2   | 2015-04-16  | 5   | 3    | 3    | 3    | 3    | 3    | 5    |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    --MAX
    SELECT cookieid,createtime,pv,
    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
    MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
    FROM test_data order by cookieid,createtime;
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    | cookie1   | 2015-04-10  | 1   | 1    | 1    | 7    | 1    | 5    | 7    |
    | cookie1   | 2015-04-11  | 5   | 5    | 5    | 7    | 5    | 7    | 7    |
    | cookie1   | 2015-04-12  | 7   | 7    | 7    | 7    | 7    | 7    | 7    |
    | cookie1   | 2015-04-13  | 3   | 7    | 7    | 7    | 7    | 7    | 4    |
    | cookie1   | 2015-04-14  | 2   | 7    | 7    | 7    | 7    | 7    | 4    |
    | cookie1   | 2015-04-15  | 4   | 7    | 7    | 7    | 7    | 7    | 4    |
    | cookie1   | 2015-04-16  | 4   | 7    | 7    | 7    | 4    | 4    | 4    |
    | cookie2   | 2015-04-10  | 6   | 6    | 6    | 7    | 6    | 6    | 7    |
    | cookie2   | 2015-04-11  | 5   | 6    | 6    | 7    | 6    | 7    | 7    |
    | cookie2   | 2015-04-12  | 7   | 7    | 7    | 7    | 7    | 7    | 7    |
    | cookie2   | 2015-04-13  | 4   | 7    | 7    | 7    | 7    | 7    | 5    |
    | cookie2   | 2015-04-14  | 3   | 7    | 7    | 7    | 7    | 7    | 5    |
    | cookie2   | 2015-04-15  | 5   | 7    | 7    | 7    | 7    | 7    | 5    |
    | cookie2   | 2015-04-16  | 5   | 7    | 7    | 7    | 5    | 5    | 5    |
    +-----------+-------------+-----+------+------+------+------+------+------+--+
    
    
    SELECT cookieid,
    createtime,
    pv,
    min(pv) OVER(PARTITION BY cookieid) AS min_pv,
    max(pv) OVER(PARTITION BY cookieid) AS max_pv
    FROM test_data;
    +-----------+-------------+-----+---------+---------+--+
    | cookieid  | createtime  | pv  | min_pv  | max_pv  |
    +-----------+-------------+-----+---------+---------+--+
    | cookie1   | 2015-04-10  | 1   | 1       | 7       |
    | cookie1   | 2015-04-16  | 4   | 1       | 7       |
    | cookie1   | 2015-04-15  | 4   | 1       | 7       |
    | cookie1   | 2015-04-14  | 2   | 1       | 7       |
    | cookie1   | 2015-04-13  | 3   | 1       | 7       |
    | cookie1   | 2015-04-12  | 7   | 1       | 7       |
    | cookie1   | 2015-04-11  | 5   | 1       | 7       |
    | cookie2   | 2015-04-16  | 5   | 3       | 7       |
    | cookie2   | 2015-04-15  | 5   | 3       | 7       |
    | cookie2   | 2015-04-14  | 3   | 3       | 7       |
    | cookie2   | 2015-04-13  | 4   | 3       | 7       |
    | cookie2   | 2015-04-12  | 7   | 3       | 7       |
    | cookie2   | 2015-04-11  | 5   | 3       | 7       |
    | cookie2   | 2015-04-10  | 6   | 3       | 7       |
    +-----------+-------------+-----+---------+---------+--+
  • 相关阅读:
    dubbo的超时机制
    今天又遇到之前的问题,后端返回数据long到前端失真
    如何在一台机子上配置两个github
    当sum函数返回null时处理
    Linux中zookeeper安装
    Linux常用指令
    sql执行顺序
    固定时间刷新某个固定值 java
    docker基础
    python之CSS
  • 原文地址:https://www.cnblogs.com/yy3b2007com/p/8666927.html
Copyright © 2011-2022 走看看