zoukankan      html  css  js  c++  java
  • Hive 窗口函数

    表数据如下

    select * from business;
    
    business.name	business.orderdate	business.cost
    jack	2017-01-01	10
    tony	2017-01-02	15
    jack	2017-02-03	23
    tony	2017-01-04	29
    jack	2017-01-05	46
    jack	2017-04-06	42
    tony	2017-01-07	50
    jack	2017-01-08	55
    mart	2017-04-08	62
    mart	2017-04-09	68
    neil	2017-05-10	12
    mart	2017-04-11	75
    neil	2017-06-12	80
    mart	2017-04-13	94
    

      

    一、聚合函数开窗

      1.全局范围

      查询顾客姓名及总人数(多次购买只算一人)

    select name,count(*) over() from business group by name;
    
    name	count_window_0
    jack	4
    mart	4
    neil	4
    tony	4
    

      2.排序后的范围

      对所有人的消费明细,将 cost 按照日期进行累加

    select *,sum(cost) over(order by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    tony	2017-01-02	15	25
    tony	2017-01-04	29	54
    jack	2017-01-05	46	100
    tony	2017-01-07	50	150
    jack	2017-01-08	55	205
    jack	2017-02-03	23	228
    jack	2017-04-06	42	270
    mart	2017-04-08	62	332
    mart	2017-04-09	68	400
    mart	2017-04-11	75	475
    mart	2017-04-13	94	569
    neil	2017-05-10	12	581
    neil	2017-06-12	80	661
    

      3.分区加排序后的范围。分区排序搭配:distribute by + sort by或者partition by + order by。

      将不同人的消费明细,按日期累加

    select *,sum(cost) over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    jack	2017-01-05	46	56
    jack	2017-01-08	55	111
    jack	2017-02-03	23	134
    jack	2017-04-06	42	176
    mart	2017-04-08	62	62
    mart	2017-04-09	68	130
    mart	2017-04-11	75	205
    mart	2017-04-13	94	299
    neil	2017-05-10	12	12
    neil	2017-06-12	80	92
    tony	2017-01-02	15	15
    tony	2017-01-04	29	44
    tony	2017-01-07	50	94
    

      4.其他范围

      ①开始到当前行

    over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )
    

      ②当前行到最后

    over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING)
    

      ③前一行、当前行和下一行

    over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING)
    

      

    二、其他函数开窗

      1.lag()函数

      查询每个顾客的购买明细及上一次购买时间

    select *,lag(orderdate,1,'1970-01-01') over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	lag_window_0
    jack	2017-01-01	10	1970-01-01
    jack	2017-01-05	46	2017-01-01
    jack	2017-01-08	55	2017-01-05
    jack	2017-02-03	23	2017-01-08
    jack	2017-04-06	42	2017-02-03
    mart	2017-04-08	62	1970-01-01
    mart	2017-04-09	68	2017-04-08
    mart	2017-04-11	75	2017-04-09
    mart	2017-04-13	94	2017-04-11
    neil	2017-05-10	12	1970-01-01
    neil	2017-06-12	80	2017-05-10
    tony	2017-01-02	15	1970-01-01
    tony	2017-01-04	29	2017-01-02
    tony	2017-01-07	50	2017-01-04
    

      2.lead()函数

      查询每个顾客的购买明细及下一次购买时间

    select *,lead(orderdate,1,'9999-99-99') over(distribute by name sort by orderdate) from business;
    
    jack	2017-01-01	10	2017-01-05
    jack	2017-01-05	46	2017-01-08
    jack	2017-01-08	55	2017-02-03
    jack	2017-02-03	23	2017-04-06
    jack	2017-04-06	42	9999-99-99
    mart	2017-04-08	62	2017-04-09
    mart	2017-04-09	68	2017-04-11
    mart	2017-04-11	75	2017-04-13
    mart	2017-04-13	94	9999-99-99
    neil	2017-05-10	12	2017-06-12
    neil	2017-06-12	80	9999-99-99
    tony	2017-01-02	15	2017-01-04
    tony	2017-01-04	29	2017-01-07
    tony	2017-01-07	50	9999-99-99
    

      3.ntile()函数

    把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。即把数据分成n份,n为编号。

      查询前20%时间的订单信息

    select * from (select *,ntile(5) over(order by orderdate) nt from business) t1 where t1.nt = 1;
    
    t1.name	t1.orderdate	t1.cost	t1.nt
    jack	2017-01-01	10	1
    tony	2017-01-02	15	1
    tony	2017-01-04	29	1
    

    总结:所谓开窗,就是把数据先查出来,然后增加一个窗口列,对已经查出的数据进行再加工。

  • 相关阅读:
    mysql用户
    mysql字符集
    tidb之一致性算法raft学习
    更新港资股票数据
    php中的时区设置
    PHP 中的注释
    python下如何处理windows的路径名
    安装第三方模块
    偏函数
    装饰器没学明白,记录一下,以后再学
  • 原文地址:https://www.cnblogs.com/noyouth/p/12733030.html
Copyright © 2011-2022 走看看