zoukankan      html  css  js  c++  java
  • hive的窗口函数ntile、row_number、rank

    一、ntile

    序列函数不支持window子句

     数据准备:

    cookie1,2015-04-10,1
    cookie1,2015-04-11,5
    cookie1,2015-04-12,7
    cookie1,2015-04-13,3
    cookie1,2015-04-14,2
    cookie1,2015-04-15,4
    cookie1,2015-04-16,4
    cookie2,2015-04-10,2
    cookie2,2015-04-11,3
    cookie2,2015-04-12,5
    cookie2,2015-04-13,6
    cookie2,2015-04-14,3
    cookie2,2015-04-15,9
    cookie2,2015-04-16,7

    把数据load到表中

    hive (default)> select * from cookie;
    OK
    cookie.cookieid    cookie.createtime    cookie.pv
    cookie1    2015-04-10    1
    cookie1    2015-04-11    5
    cookie1    2015-04-12    7
    cookie1    2015-04-13    3
    cookie1    2015-04-14    2
    cookie1    2015-04-15    4
    cookie1    2015-04-16    4
    cookie2    2015-04-10    2
    cookie2    2015-04-11    3
    cookie2    2015-04-12    5
    cookie2    2015-04-13    6
    cookie2    2015-04-14    3
    cookie2    2015-04-15    9
    cookie2    2015-04-16    7
    Time taken: 0.086 seconds, Fetched: 14 row(s)

    NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值
    NTILE不支持ROWS BETWEEN,比如 NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
    如果切片不均匀,默认增加第一个切片的分布

    select cookieid,createtime,pv,
    ntile(2) over(partition by cookieid order by createtime) as rn1,--分组内将数据分为2片
    ntile(3) over(partition by cookieid order by createtime) as rn2,--分组内将数据分为3片
    ntile(4) over(order by createtime) as rn3
    from cookie order by cookieid,createtime;

    结果是:

    OK
    cookieid    createtime    pv    rn1    rn2    rn3
    cookie1        2015-04-10    1    1    1    1
    cookie1        2015-04-11    5    1    1    1
    cookie1        2015-04-12    7    1    1    2
    cookie1        2015-04-13    3    1    2    2
    cookie1        2015-04-14    2    2    2    3
    cookie1        2015-04-15    4    2    3    4
    cookie1        2015-04-16    4    2    3    4
    cookie2        2015-04-10    2    1    1    1
    cookie2        2015-04-11    3    1    1    1
    cookie2        2015-04-12    5    1    1    2
    cookie2        2015-04-13    6    1    2    2
    cookie2        2015-04-14    3    2    2    3
    cookie2        2015-04-15    9    2    3    3
    cookie2        2015-04-16    7    2    3    4
    Time taken: 76.302 seconds, Fetched: 14 row(s)

    比如,统计一个cookie,pv数最多的前1/3的天

    select cookieid,createtime,pv,
    ntile(3) over(partition by cookieid order by pv desc) as rn
    from cookie;

    结果是:

    OK
    cookieid    createtime    pv    rn
    cookie1        2015-04-12    7    1
    cookie1        2015-04-11    5    1
    cookie1        2015-04-16    4    1
    cookie1        2015-04-15    4    2
    cookie1        2015-04-13    3    2
    cookie1        2015-04-14    2    3
    cookie1        2015-04-10    1    3
    cookie2        2015-04-15    9    1
    cookie2        2015-04-16    7    1
    cookie2        2015-04-13    6    1
    cookie2        2015-04-12    5    2
    cookie2        2015-04-11    3    2
    cookie2        2015-04-14    3    3
    cookie2        2015-04-10    2    3
    Time taken: 22.654 seconds, Fetched: 14 row(s)

     rn=1就是我们想要的结果

     二、ROW_NUMBER

    ROW_NUMBER()–从1开始,按照顺序,生成分组内记录的序列
    –比如,按照pv降序排列,生成分组内每天的pv名次
    ROW_NUMBER() 的应用场景非常多,再比如,获取分组内排序第一的记录;获取一个session中的第一条refer等。

    select cookieid,createtime,pv,
    row_number() over(partition by cookieid order by pv desc) as rn
    from cookie;

    结果:

    OK
    cookieid    createtime    pv    rn
    cookie1    2015-04-12    7    1
    cookie1    2015-04-11    5    2
    cookie1    2015-04-16    4    3
    cookie1    2015-04-15    4    4
    cookie1    2015-04-13    3    5
    cookie1    2015-04-14    2    6
    cookie1    2015-04-10    1    7
    cookie2    2015-04-15    9    1
    cookie2    2015-04-16    7    2
    cookie2    2015-04-13    6    3
    cookie2    2015-04-12    5    4
    cookie2    2015-04-11    3    5
    cookie2    2015-04-14    3    6
    cookie2    2015-04-10    2    7
    Time taken: 22.657 seconds, Fetched: 14 row(s)

    三、rank和dense_rank

    —RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
    —DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位

    select cookieid,createtime,pv,
    rank() over(partition by cookieid order by pv desc) as rn1,
    dense_rank() over(partition by cookieid order by pv desc) as rn2,
    row_number() over(partition by cookieid order by pv desc) as rn3
    from cookie
    where cookieid='cookie1';

    结果:

    OK
    cookieid    createtime    pv    rn1    rn2    rn3
    cookie1    2015-04-12    7    1    1    1
    cookie1    2015-04-11    5    2    2    2
    cookie1    2015-04-16    4    3    3    3
    cookie1    2015-04-15    4    3    3    4
    cookie1    2015-04-13    3    5    4    5
    cookie1    2015-04-14    2    6    5    6
    cookie1    2015-04-10    1    7    6    7
    Time taken: 23.841 seconds, Fetched: 7 row(s)

    rn1: 15号和16号并列第3, 13号排第5
    rn2: 15号和16号并列第3, 13号排第4
    rn3: 如果相等,则按记录值排序,生成唯一的次序,如果所有记录值都相等,或许会随机排吧。

  • 相关阅读:
    Minimum Depth of Binary Tree leetcode java
    Maximum Depth of Binary Tree leetcode java
    Symmetric Tree leetcode java
    Same Tree leetcode java
    Binary Tree Postorder Traversal leetcode java
    Binary Tree Preorder Traversal leetcode java
    Binary Tree Inorder Traversal leetcode java
    Combinations leetcode java
    一键清除Centos iptables 防火墙所有规则
    阿里云centos7.7x64安装open,并配置ip转发和nat伪装
  • 原文地址:https://www.cnblogs.com/dongdone/p/5737617.html
Copyright © 2011-2022 走看看