zoukankan      html  css  js  c++  java
  • Hive 窗口函数

    表数据如下

    select * from business;
    
    business.name	business.orderdate	business.cost
    jack	2017-01-01	10
    tony	2017-01-02	15
    jack	2017-02-03	23
    tony	2017-01-04	29
    jack	2017-01-05	46
    jack	2017-04-06	42
    tony	2017-01-07	50
    jack	2017-01-08	55
    mart	2017-04-08	62
    mart	2017-04-09	68
    neil	2017-05-10	12
    mart	2017-04-11	75
    neil	2017-06-12	80
    mart	2017-04-13	94
    

      

    一、聚合函数开窗

      1.全局范围

      查询顾客姓名及总人数(多次购买只算一人)

    select name,count(*) over() from business group by name;
    
    name	count_window_0
    jack	4
    mart	4
    neil	4
    tony	4
    

      2.排序后的范围

      对所有人的消费明细,将 cost 按照日期进行累加

    select *,sum(cost) over(order by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    tony	2017-01-02	15	25
    tony	2017-01-04	29	54
    jack	2017-01-05	46	100
    tony	2017-01-07	50	150
    jack	2017-01-08	55	205
    jack	2017-02-03	23	228
    jack	2017-04-06	42	270
    mart	2017-04-08	62	332
    mart	2017-04-09	68	400
    mart	2017-04-11	75	475
    mart	2017-04-13	94	569
    neil	2017-05-10	12	581
    neil	2017-06-12	80	661
    

      3.分区加排序后的范围。分区排序搭配:distribute by + sort by或者partition by + order by。

      将不同人的消费明细,按日期累加

    select *,sum(cost) over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	sum_window_0
    jack	2017-01-01	10	10
    jack	2017-01-05	46	56
    jack	2017-01-08	55	111
    jack	2017-02-03	23	134
    jack	2017-04-06	42	176
    mart	2017-04-08	62	62
    mart	2017-04-09	68	130
    mart	2017-04-11	75	205
    mart	2017-04-13	94	299
    neil	2017-05-10	12	12
    neil	2017-06-12	80	92
    tony	2017-01-02	15	15
    tony	2017-01-04	29	44
    tony	2017-01-07	50	94
    

      4.其他范围

      ①开始到当前行

    over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )
    

      ②当前行到最后

    over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING)
    

      ③前一行、当前行和下一行

    over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING)
    

      

    二、其他函数开窗

      1.lag()函数

      查询每个顾客的购买明细及上一次购买时间

    select *,lag(orderdate,1,'1970-01-01') over(distribute by name sort by orderdate) from business;
    
    business.name	business.orderdate	business.cost	lag_window_0
    jack	2017-01-01	10	1970-01-01
    jack	2017-01-05	46	2017-01-01
    jack	2017-01-08	55	2017-01-05
    jack	2017-02-03	23	2017-01-08
    jack	2017-04-06	42	2017-02-03
    mart	2017-04-08	62	1970-01-01
    mart	2017-04-09	68	2017-04-08
    mart	2017-04-11	75	2017-04-09
    mart	2017-04-13	94	2017-04-11
    neil	2017-05-10	12	1970-01-01
    neil	2017-06-12	80	2017-05-10
    tony	2017-01-02	15	1970-01-01
    tony	2017-01-04	29	2017-01-02
    tony	2017-01-07	50	2017-01-04
    

      2.lead()函数

      查询每个顾客的购买明细及下一次购买时间

    select *,lead(orderdate,1,'9999-99-99') over(distribute by name sort by orderdate) from business;
    
    jack	2017-01-01	10	2017-01-05
    jack	2017-01-05	46	2017-01-08
    jack	2017-01-08	55	2017-02-03
    jack	2017-02-03	23	2017-04-06
    jack	2017-04-06	42	9999-99-99
    mart	2017-04-08	62	2017-04-09
    mart	2017-04-09	68	2017-04-11
    mart	2017-04-11	75	2017-04-13
    mart	2017-04-13	94	9999-99-99
    neil	2017-05-10	12	2017-06-12
    neil	2017-06-12	80	9999-99-99
    tony	2017-01-02	15	2017-01-04
    tony	2017-01-04	29	2017-01-07
    tony	2017-01-07	50	9999-99-99
    

      3.ntile()函数

    把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。即把数据分成n份,n为编号。

      查询前20%时间的订单信息

    select * from (select *,ntile(5) over(order by orderdate) nt from business) t1 where t1.nt = 1;
    
    t1.name	t1.orderdate	t1.cost	t1.nt
    jack	2017-01-01	10	1
    tony	2017-01-02	15	1
    tony	2017-01-04	29	1
    

    总结:所谓开窗,就是把数据先查出来,然后增加一个窗口列,对已经查出的数据进行再加工。

  • 相关阅读:
    哦!Python爬虫的这4种方法优缺点你知道吗?
    当你认为Python程序慢时!几个方法你使用了吗!
    Python数据可视化你了解多少!看一下这些是你了解的吗 ?
    王者英雄你喜欢那个!利用Python网络爬虫教你抓取喜欢的英雄图片!
    使用Python时多少有人走过的坑!你是否也遇到过呢!
    Python的68个内置函数你真正的了解多少!建议合理的运用起来!
    使用Python 进行字串格式化的几种方式!你平时是否使用过呢!
    在找工作吗?今天教你使用Python网络爬虫获取招聘信息!来体验一下
    Azure上A/D系列虚拟机到DS系列迁移(1)
    超大文件上传到Azure Linux虚拟机最佳实践
  • 原文地址:https://www.cnblogs.com/noyouth/p/12733030.html
Copyright © 2011-2022 走看看