zoukankan      html  css  js  c++  java
  • Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备

    数据格式

    cookie1,2015-04-10,1
    cookie1,2015-04-11,5
    cookie1,2015-04-12,7
    cookie1,2015-04-13,3
    cookie1,2015-04-14,2
    cookie1,2015-04-15,4
    cookie1,2015-04-16,4

    创建数据库及表

    create database if not exists cookie;
    use cookie;
    drop table if exists cookie1;
    create table cookie1(cookieid string, createtime string, pv int) row format delimited fields terminated by ',';
    load data local inpath "/home/hadoop/cookie1.txt" into table cookie1;
    select * from cookie1;

    玩一玩SUM

    查询语句

    select 
       cookieid, 
       createtime, 
       pv, 
       sum(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, 
       sum(pv) over (partition by cookieid order by createtime) as pv2, 
       sum(pv) over (partition by cookieid) as pv3, 
       sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, 
       sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, 
       sum(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6 
    from cookie1;

    查询结果

    说明

    pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
    pv2: 同pv1
    pv3: 分组内(cookie1)所有的pv累加
    pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
    pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
    pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

    如果不指定ROWS BETWEEN,默认为从起点到当前行;
    如果不指定ORDER BY,则将分组内所有值累加;
    关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING:往前
    FOLLOWING:往后
    CURRENT ROW:当前行
    UNBOUNDED:起点,

      UNBOUNDED PRECEDING 表示从前面的起点,

      UNBOUNDED FOLLOWING:表示到后面的终点
    –其他AVG,MIN,MAX,和SUM用法一样。

    玩一玩AVG

    查询语句

    select 
       cookieid, 
       createtime, 
       pv, 
       avg(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       avg(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       avg(pv) over (partition by cookieid) as pv3, --分组内所有行
       avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       avg(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
    from cookie1;

    查询结果

    玩一玩MIN

    查询语句

    select 
       cookieid, 
       createtime, 
       pv, 
       min(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       min(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       min(pv) over (partition by cookieid) as pv3, --分组内所有行
       min(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       min(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       min(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
    from cookie1;

    查询结果 

    玩一玩MAX

    查询语句

    select 
       cookieid, 
       createtime, 
       pv, 
       max(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
       max(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
       max(pv) over (partition by cookieid) as pv3, --分组内所有行
       max(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
       max(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
       max(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
    from cookie1;

    查询结果

  • 相关阅读:
    其实说起来要国庆节回去,心里有点担心再最后买不到票
    现在快要中秋节,之后又是国庆节
    天气开始降温,今天会下雨
    昨天晚上控制家里的电脑,与家里人视频
    有时候手机打开微信或者其它应用真的是非常慢
    手机有时候卡的不行,还是自己照顾着使用吧
    这几天晚上都是再看小别离,还是有点意思
    函数初识
    Python文件操作
    Python字符编码
  • 原文地址:https://www.cnblogs.com/qingyunzong/p/8782794.html
Copyright © 2011-2022 走看看