zoukankan      html  css  js  c++  java
  • Hive分析窗体函数之SUM,AVG,MIN和MAX

    Hive中提供了非常多的分析函数,用于完毕负责的统计分析。

    本文先介绍SUMAVGMINMAX这四个函数


    环境信息:

    Hive版本号为apache-hive-0.14.0-bin

    Hadoop版本号为hadoop-2.6.0

    Tez版本号为tez-0.7.0


    构造数据:

    P088888888888,2016-02-10,1

    P088888888888,2016-02-11,3

    P088888888888,2016-02-12,1

    P088888888888,2016-02-13,9

    P088888888888,2016-02-14,3

    P088888888888,2016-02-15,12

    P088888888888,2016-02-16,3

    创建表:

    hive (hiveinaction)> create table windows_func

                       >(

                      >     polno string,

                      >     createtime string,

                      >     pnum int   

                       >)

                       >ROW FORMAT DELIMITED

                       >FIELDS TERMINATED BY ','

                       >stored as textfile;

    导入数据到表中:

    load data local inpath '/home/hadoop/testhivedata/windows_func.txt' into table windows_func;

    測试:

    SELECT polno,

           createtime,

           pnum,

           SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime) AS pnum1, --默觉得从起点到当前行

           SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pnum2,  --从起点到当前行

           SUM(pnum) OVER(PARTITION BY polno) ASpnum3, --分组内全部行

           SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pnum4,  --当前行+往前3(当前行的值+前面三行的值)

           SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pnum5,  --当前行+往前3+往后1

           SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pnum6 ---当前行+往后全部行

    FROM windows_func;

    结果:

    polno

            createtime

    pnum

    pnum1

    pnum2

    pnum3

    pnum4

    pnum5

    pnum6

    P088888888888

    2016/2/10

    1

    1

    1

    32

    1

    4

    32

    P088888888888

    2016/2/11

    3

    4

    4

    32

    4

    5

    31

    P088888888888

    2016/2/12

    1

    5

    5

    32

    5

    14

    28

    P088888888888

    2016/2/13

    9

    14

    14

    32

    14

    17

    27

    P088888888888

    2016/2/14

    3

    17

    17

    32

    16

    28

    18

    P088888888888

    2016/2/15

    12

    29

    29

    32

    25

    28

    15

    P088888888888

    2016/2/16

    3

    32

    32

    32

    27

    27

    3

    凝视:

    1.   假设不指定ROWS BETWEEN,默觉得从起点到当前行;

    2.   假设不指定ORDER BY,则将分组内全部值累加;

    理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING
    :往前
    FOLLOWING
    :往后
    CURRENT ROW
    :当前行
    UNBOUNDED
    :起点,UNBOUNDED PRECEDING表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
    其它AVGMINMAX。和SUM使用方法一样。

     

    演示AVG环境:

    SELECT polno,

           createtime,

           pnum,

           AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime) AS pnum1, --默觉得从起点到当前行

           AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pnum2,  --从起点到当前行

           AVG(pnum) OVER(PARTITION BY polno) AS pnum3, --分组内全部行

           AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pnum4,  --当前行+往前3(当前行的值+前面三行的值)

           AVG(pnum) OVER(PARTITION BYpolno ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pnum5,  --当前行+往前3+往后1

           AVG(pnum) OVER(PARTITION BYpolno ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pnum6 ---当前行+往后全部行

    FROM windows_func;

    结果:

    polno

    createtime

    pnum

    pnum1

    pnum2

    pnum3

    pnum4

    pnum5

    pnum6

    P088888888888

    2016/2/10

    1

    1

    1

    4.57142857

    1

    2

    4.5714286

    P088888888888

    2016/2/11

    3

    2

    2

    4.57142857

    2

    1.666667

    5.1666667

    P088888888888

    2016/2/12

    1

    1.66667

    1.6667

    4.57142857

    1.666667

    3.5

    5.6

    P088888888888

    2016/2/13

    9

    3.5

    3.5

    4.57142857

    3.5

    3.4

    6.75

    P088888888888

    2016/2/14

    3

    3.4

    3.4

    4.57142857

    4

    5.6

    6

    P088888888888

    2016/2/15

    12

    4.83333

    4.8333

    4.57142857

    6.25

    5.6

    7.5

    P088888888888

    2016/2/16

    3

    4.57143

    4.5714

    4.57142857

    6.75

    6.75

    3

     

    其它相似的函数就不举例了。

     

  • 相关阅读:
    中文字,文本框,button按钮垂直居中对齐方法
    Denny_Yang的语录
    为SELECT结果集添加伪序号列
    考勤系统问题记录一:事务日志太大
    在DateTable中进行数据统计(不用循环)
    今天去游泳了
    关于HbSrv.exe
    有感于--特洛伊
    windows下DNS的问题
    Google新功能
  • 原文地址:https://www.cnblogs.com/wzjhoutai/p/7284443.html
Copyright © 2011-2022 走看看