zoukankan      html  css  js  c++  java
  • Hive 窗口函数

    表数据如下

    select * from business;
    
    business.name	business.orderdate	business.cost
    jack	2017-01-01	10
    tony	2017-01-02	15
    jack	2017-02-03	23
    tony	2017-01-04	29
    jack	2017-01-05	46
    jack	2017-04-06	42
    tony	2017-01-07	50
    jack	2017-01-08	55
    mart	2017-04-08	62
    mart	2017-04-09	68
    neil	2017-05-10	12
    mart	2017-04-11	75
    neil	2017-06-12	80
    mart	2017-04-13	94
    

      

    一、聚合函数开窗

      1.全局范围

      查询顾客姓名及总人数(多次购买只算一人)

    select name,count(*) over() from business group by name;
    
    name	count_window_0
    jack	4
    mart	4
    neil	4
    tony	4
    

      2.排序后的范围

      对所有人的消费明细,将 cost 按照日期进行累加

    select *,sum(cost) over(order by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    tony	2017-01-02	15	25
    tony	2017-01-04	29	54
    jack	2017-01-05	46	100
    tony	2017-01-07	50	150
    jack	2017-01-08	55	205
    jack	2017-02-03	23	228
    jack	2017-04-06	42	270
    mart	2017-04-08	62	332
    mart	2017-04-09	68	400
    mart	2017-04-11	75	475
    mart	2017-04-13	94	569
    neil	2017-05-10	12	581
    neil	2017-06-12	80	661
    

      3.分区加排序后的范围。分区排序搭配:distribute by + sort by或者partition by + order by。

      将不同人的消费明细,按日期累加

    select *,sum(cost) over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    jack	2017-01-05	46	56
    jack	2017-01-08	55	111
    jack	2017-02-03	23	134
    jack	2017-04-06	42	176
    mart	2017-04-08	62	62
    mart	2017-04-09	68	130
    mart	2017-04-11	75	205
    mart	2017-04-13	94	299
    neil	2017-05-10	12	12
    neil	2017-06-12	80	92
    tony	2017-01-02	15	15
    tony	2017-01-04	29	44
    tony	2017-01-07	50	94
    

      4.其他范围

      ①开始到当前行

    over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )
    

      ②当前行到最后

    over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING)
    

      ③前一行、当前行和下一行

    over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING)
    

      

    二、其他函数开窗

      1.lag()函数

      查询每个顾客的购买明细及上一次购买时间

    select *,lag(orderdate,1,'1970-01-01') over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	lag_window_0
    jack	2017-01-01	10	1970-01-01
    jack	2017-01-05	46	2017-01-01
    jack	2017-01-08	55	2017-01-05
    jack	2017-02-03	23	2017-01-08
    jack	2017-04-06	42	2017-02-03
    mart	2017-04-08	62	1970-01-01
    mart	2017-04-09	68	2017-04-08
    mart	2017-04-11	75	2017-04-09
    mart	2017-04-13	94	2017-04-11
    neil	2017-05-10	12	1970-01-01
    neil	2017-06-12	80	2017-05-10
    tony	2017-01-02	15	1970-01-01
    tony	2017-01-04	29	2017-01-02
    tony	2017-01-07	50	2017-01-04
    

      2.lead()函数

      查询每个顾客的购买明细及下一次购买时间

    select *,lead(orderdate,1,'9999-99-99') over(distribute by name sort by orderdate) from business;
    
    jack	2017-01-01	10	2017-01-05
    jack	2017-01-05	46	2017-01-08
    jack	2017-01-08	55	2017-02-03
    jack	2017-02-03	23	2017-04-06
    jack	2017-04-06	42	9999-99-99
    mart	2017-04-08	62	2017-04-09
    mart	2017-04-09	68	2017-04-11
    mart	2017-04-11	75	2017-04-13
    mart	2017-04-13	94	9999-99-99
    neil	2017-05-10	12	2017-06-12
    neil	2017-06-12	80	9999-99-99
    tony	2017-01-02	15	2017-01-04
    tony	2017-01-04	29	2017-01-07
    tony	2017-01-07	50	9999-99-99
    

      3.ntile()函数

    把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。即把数据分成n份,n为编号。

      查询前20%时间的订单信息

    select * from (select *,ntile(5) over(order by orderdate) nt from business) t1 where t1.nt = 1;
    
    t1.name	t1.orderdate	t1.cost	t1.nt
    jack	2017-01-01	10	1
    tony	2017-01-02	15	1
    tony	2017-01-04	29	1
    

    总结:所谓开窗,就是把数据先查出来,然后增加一个窗口列,对已经查出的数据进行再加工。

  • 相关阅读:
    HDU 2089 不要62
    HDU 5038 Grade(分级)
    FZU 2105 Digits Count(位数计算)
    FZU 2218 Simple String Problem(简单字符串问题)
    FZU 2221 RunningMan(跑男)
    FZU 2216 The Longest Straight(最长直道)
    FZU 2212 Super Mobile Charger(超级充电宝)
    FZU 2219 StarCraft(星际争霸)
    FZU 2213 Common Tangents(公切线)
    FZU 2215 Simple Polynomial Problem(简单多项式问题)
  • 原文地址:https://www.cnblogs.com/noyouth/p/12733030.html
Copyright © 2011-2022 走看看