zoukankan      html  css  js  c++  java
  • hive 窗口函数(一)

    窗口函数可以对数据进行分组,将数据按组分成一个个窗口。利用窗口函数可以对窗口内的数据进行排序,聚合操作,还可以通过window子句让窗口上下滑动,非常灵活。

    数据准备

    cookie1,2015-04-10,1
    cookie1,2015-04-11,5
    cookie1,2015-04-12,7
    cookie1,2015-04-13,3
    cookie1,2015-04-14,2
    cookie1,2015-04-15,4
    cookie1,2015-04-16,4
    
    create table  cookie1_sum
    (
        cookieid   string,
        createtime string,
        pv         int
    ) row format delimited fields terminated by ',';
    
    select * from   cookie1_sum;

     一、sum

    select cookieid,
           createtime,
           pv,
           sum(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row ) p1,
           sum(pv) over (partition by cookieid order by createtime )                                                  p2,
           sum(pv) over (partition by cookieid )                                                                      p3,
           sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following )         p4,
           sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row )         p5,
           sum(pv) over (partition by cookieid order by createtime rows between current row and unbounded following)  p6
    from cookie1_sum;

     

     说明:        

    pv1--rows between unbounded preceding and current row :
     分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
    pv2: 分组内从起点到当前行的pv累积  同上
    pv3: 分组内(cookie1)所有的pv累加
    pv4 rows between 3 preceding and current row : 
    分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
    pv5 rows between 3 preceding and 1 following: 
    分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
    pv6 rows between current row and unbounded following:
     分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

    如果不指定ROWS BETWEEN,默认为从起点到当前行;如果不指定ORDER BY,则将分组内所有值累加;关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING:往前
    FOLLOWING:往后
    CURRENT ROW:当前行
    UNBOUNDED:起点, 

    UNBOUNDED PRECEDING 表示从前面的起点,

    UNBOUNDED FOLLOWING:表示到后面的终点

    –其他AVG,MIN,MAX,和SUM用法一样。

    二、avg

    set mapreduce.map.memory.mb=8192;
    set mapreduce.reduce.memory.mb=8192;
    select
       cookieid,
       createtime,
       pv,
       avg(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       avg(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       avg(pv) over (partition by cookieid) as pv3, --分组内所有行
       avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       avg(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
     from hive_zb_serv_hq.cookie1_sum;

      三、max

    set mapreduce.map.memory.mb=8192;
    set mapreduce.reduce.memory.mb=8192;
    select
       cookieid,
       createtime,
       pv,
       max(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       max(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       max(pv) over (partition by cookieid) as pv3, --分组内所有行
       max(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       max(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       max(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
     from cookie1_sum;

      四、min

    set mapreduce.map.memory.mb=8192;
    set mapreduce.reduce.memory.mb=8192;
    select
       cookieid,
       createtime,
       pv,
       min(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       min(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       min(pv) over (partition by cookieid) as pv3, --分组内所有行
       min(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       min(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       min(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
     from cookie1_sum;

  • 相关阅读:
    Python3.6全栈开发实例[022]
    Python3.6全栈开发实例[021]
    Python3.6全栈开发实例[020]
    Python3.6全栈开发实例[019]
    Python3.6全栈开发实例[018]
    单选按钮QRadioButton
    QToolButton按钮
    文本编辑框QTextEdit
    信号-事件汇总
    QMessageBox消息框
  • 原文地址:https://www.cnblogs.com/wdh01/p/14783407.html
Copyright © 2011-2022 走看看