zoukankan      html  css  js  c++  java
  • hive窗口函数

    1.相关函数说明

    OVER()指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的变而变化

    CURRENT ROW:当前行

    n PRECEDING:往前n行数据

    n FOLLOWING:往后n行数据

    UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING表示到后面的终点

    LAG(col,n):往前第n行数据

    LEAD(col,n):往后第n行数据

    NTILE(n):把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。

    2.数据准备:name,orderdate,cost

    jack,2017-01-01,10
    tony,2017-01-02,15
    jack,2017-02-03,23
    tony,2017-01-04,29
    jack,2017-01-05,46
    jack,2017-04-06,42
    tony,2017-01-07,50
    jack,2017-01-08,55
    mart,2017-04-08,62
    mart,2017-04-09,68
    neil,2017-05-10,12
    mart,2017-04-11,75
    neil,2017-06-12,80
    mart,2017-04-13,94

    3.需求

    (1)查询在20174月份购买过的顾客及总人数

    (2)查询顾客的购买明细及月购买总额

    (3)上述的场景,要将cost按照日期进行累加

    (4)查询顾客上次的购买时间

    (5)查询前20%时间的订单信息

    4.创建数据库并将文件的数据导入

    create table business(
        > name string,
        > orderdate string,
        > cost int)
        > row format delimited fields terminated by ',';//列分割符
    
    load data local inpath "/home/hadoop/file/business" into table business;
    

    5.编码实现及结果

    (1)查询在20174月份购买过的顾客及总人数

    select name,count(*) over () 
        > 
        > from business 
        > 
        > where substring(orderdate,1,7) = '2017-04' 
        > 
        > group by name;
    

     

    (2)查询顾客的购买明细及月购买总额

    select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from business;
    

     

    (3)上述的场景,要将cost按照日期进行累加

    select name,orderdate,cost, 
    sum(cost) over() as sample1,--所有行相加 
    sum(cost) over(partition by name) as sample2,--按name分组,组内数据相加 
    sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加 
    sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合 
    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合 
    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行 
    sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行 
    from business;

    (4)查询顾客上次的购买时间

     select name,orderdate,cost,
        > lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate) as time1,
        > lag(orderdate,2) over (partition by name order by orderdate) as time2
        > from business;
    

     

     (5)查询前20%时间的订单信息

     select * from(
        > select name,orderdate,cost,ntile(5) over (order by orderdate) sorted from business)t
        > where sorted=1;
    

     

    注:

    lag 和lead 可以 获取结果集中,按一定排序所排列的当前行的上下相邻若干offset 的某个行的某个列(不用结果集的自关联);
    lag ,lead 分别是向前,向后;
    lag 和lead 有三个参数,第一个参数是列名,第二个参数是偏移的offset,第三个参数是 超出记录窗口时的默认值)

  • 相关阅读:
    Mac下tomcat的安装与配置
    jquery中的属性和css
    jquery中的选择器
    数组对象元素的添加,String对象,BOM对象以及文档对象的获取
    js中的函数,Date对象,Math对象和数组对象
    js中的循环语句
    js中的运算符和条件语句
    js中的数据类型及其转换
    js的意义,引用方法及变量
    移动端网页项目总结
  • 原文地址:https://www.cnblogs.com/837634902why/p/11468773.html
Copyright © 2011-2022 走看看