zoukankan      html  css  js  c++  java
  • hive之窗口函数

    窗口函数

    1.相关函数说明

    COVER():指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的变而变化
    
    CURRENT ROW:当前行
    
    n PRECEDING:往前n行数据
    
    n FOLLOWING:往后n行数据
    
    UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING表示到后面的终点
    
    LAG(col,n):往前第n行数据
    
    LEAD(col,n):往后第n行数据

    NTILE(n):把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。

    2.数据准备:name,orderdate,cost

    jack,2017-01-01,10
    
    tony,2017-01-02,15
    
    jack,2017-02-03,23
    
    tony,2017-01-04,29
    
    jack,2017-01-05,46
    
    jack,2017-04-06,42
    
    tony,2017-01-07,50
    
    jack,2017-01-08,55
    
    mart,2017-04-08,62
    
    mart,2017-04-09,68
    
    neil,2017-05-10,12
    
    mart,2017-04-11,75
    
    neil,2017-06-12,80
    
    mart,2017-04-13,94

    3.需求

    (1)查询在20174月份购买过的顾客及总人数

    (2)查询顾客的购买明细及月购买总额

    (3)上述的场景,要将cost按照日期进行累加

    (4)查询顾客上次的购买时间

    (5)查询前20%时间的订单信息

    4.创建本地business.txt,导入数据

    [atguigu@hadoop102 datas]$ vi business.txt

    5.创建hive表并导入数据

    create table business(

    name string,

    orderdate string,

    cost int

    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

     

    load data local inpath "/opt/module/datas/business.txt" into table business;

    6.按需求查询数据

    (1)查询在20174月份购买过的顾客及总人数

    select name,count(*) over ()

    from business 

    where substring(orderdate,1,7) = '2017-04'

    group by name;

    (2)查询顾客的购买明细及月购买总额

    select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from

     business;

    (3)上述的场景,要将cost按照日期进行累加

    select name,orderdate,cost,

    sum(cost) over() as sample1,--所有行相加

    sum(cost) over(partition by name) as sample2,--name分组,组内数据相加

    sum(cost) over(partition by name order by orderdate) as sample3,--name分组,组内数据累加

    sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--sample3一样,由起点到当前行的聚合

    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合

    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行

    sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行

    from business;

    (4)查看顾客上次的购买时间

    select name,orderdate,cost,

    lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1, lag(orderdate,2) over (partition by name order by orderdate) as time2

    from business;

    (5)查询前20%时间的订单信息

    select * from (

        select name,orderdate,cost, ntile(5) over(order by orderdate) sorted

        from business

    ) t

    where sorted = 1;

  • 相关阅读:
    ViewController生命周期
    Core Data 基本数据操作 增删改查 排序
    Core data 如何查看ObjectId
    NSArary自定义对象排序 NSComparator, compare
    tcp/ip协议学习笔记一
    常用mac/unix/linux命令
    IOS pin约束问题 存在间隙
    IOS 常用View属性设置
    ubuntu 16.04菜单栏不显示
    linux实用命令备忘
  • 原文地址:https://www.cnblogs.com/alexzhang92/p/10650990.html
Copyright © 2011-2022 走看看