zoukankan      html  css  js  c++  java
  • Hive—简单窗口分析函数

    hive 窗口分析函数
    
    0: jdbc:hive2://localhost:10000> select * from t_access;
    +----------------+---------------------------------+-----------------------+--------------+--+
    |  t_access.ip   |          t_access.url           | t_access.access_time  | t_access.dt  |
    +----------------+---------------------------------+-----------------------+--------------+--+
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20   | 20170804     |
    | 192.168.33.3   | http://www.xxx.ccc.aa/teach      | 2017-08-04 15:35:20   | 20170804     |
    | 192.168.33.4   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20   | 20170804     |
    | 192.168.33.4   | http://www.xxx.ccc.aa/job        | 2017-08-04 16:30:20   | 20170804     |
    | 192.168.33.5   | http://www.xxx.ccc.aa/job        | 2017-08-04 15:40:20   | 20170804     |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20   | 20170805     |
    | 192.168.44.3   | http://www.xxx.ccc.aa/teach      | 2017-08-05 15:35:20   | 20170805     |
    | 192.168.33.44  | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20   | 20170805     |
    | 192.168.33.46  | http://www.xxx.ccc.aa/job        | 2017-08-05 16:30:20   | 20170805     |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-05 15:40:20   | 20170805     |
    | 192.168.133.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:30:20   | 20170806     |
    | 192.168.111.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:35:20   | 20170806     |
    | 192.168.34.44  | http://www.xxx.ccc.aa/pay        | 2017-08-06 15:30:20   | 20170806     |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20   | 20170806     |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20   | 20170806     |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20   | 20170806     |
    | 192.168.33.25  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20   | 20170806     |
    | 192.168.33.36  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20   | 20170806     |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20   | 20170806     |
    +----------------+---------------------------------+-----------------------+--------------+--+
    
    ## LAG函数
    select ip,url,access_time,
    row_number() over(partition by ip order by access_time) as rn,
    lag(access_time,1,0) over(partition by ip order by access_time)as last_access_time
    from t_access;
    
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    |       ip       |               url               |     access_time      | rn  |   last_access_time   |
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    | 192.168.111.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:35:20  | 1   | 0                    |
    | 192.168.133.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:30:20  | 1   | 0                    |
    | 192.168.33.25  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 1   | 0                    |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | 0                    |
    | 192.168.33.3   | http://www.xxx.ccc.aa/teach      | 2017-08-04 15:35:20  | 2   | 2017-08-04 15:30:20  |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 3   | 2017-08-04 15:35:20  |
    | 192.168.33.36  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 1   | 0                    |
    | 192.168.33.4   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | 0                    |
    | 192.168.33.4   | http://www.xxx.ccc.aa/job        | 2017-08-04 16:30:20  | 2   | 2017-08-04 15:30:20  |
    | 192.168.33.44  | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 1   | 0                    |
    | 192.168.33.46  | http://www.xxx.ccc.aa/job        | 2017-08-05 16:30:20  | 1   | 0                    |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 2   | 2017-08-05 16:30:20  |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 3   | 2017-08-06 16:30:20  |
    | 192.168.33.5   | http://www.xxx.ccc.aa/job        | 2017-08-04 15:40:20  | 1   | 0                    |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-05 15:40:20  | 1   | 0                    |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 2   | 2017-08-05 15:40:20  |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 3   | 2017-08-06 15:40:20  |
    | 192.168.34.44  | http://www.xxx.ccc.aa/pay        | 2017-08-06 15:30:20  | 1   | 0                    |
    | 192.168.44.3   | http://www.xxx.ccc.aa/teach      | 2017-08-05 15:35:20  | 1   | 0                    |
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    
    
    ## LEAD函数
    select ip,url,access_time,
    row_number() over(partition by ip order by access_time) as rn,
    lead(access_time,1,0) over(partition by ip order by access_time)as last_access_time
    from t_access;
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    |       ip       |               url               |     access_time      | rn  |   last_access_time   |
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    | 192.168.111.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:35:20  | 1   | 0                    |
    | 192.168.133.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:30:20  | 1   | 0                    |
    | 192.168.33.25  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 1   | 0                    |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | 2017-08-04 15:35:20  |
    | 192.168.33.3   | http://www.xxx.ccc.aa/teach      | 2017-08-04 15:35:20  | 2   | 2017-08-05 15:30:20  |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 3   | 0                    |
    | 192.168.33.36  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 1   | 0                    |
    | 192.168.33.4   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | 2017-08-04 16:30:20  |
    | 192.168.33.4   | http://www.xxx.ccc.aa/job        | 2017-08-04 16:30:20  | 2   | 0                    |
    | 192.168.33.44  | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 1   | 0                    |
    | 192.168.33.46  | http://www.xxx.ccc.aa/job        | 2017-08-05 16:30:20  | 1   | 2017-08-06 16:30:20  |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 2   | 2017-08-06 16:30:20  |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 3   | 0                    |
    | 192.168.33.5   | http://www.xxx.ccc.aa/job        | 2017-08-04 15:40:20  | 1   | 0                    |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-05 15:40:20  | 1   | 2017-08-06 15:40:20  |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 2   | 2017-08-06 15:40:20  |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 3   | 0                    |
    | 192.168.34.44  | http://www.xxx.ccc.aa/pay        | 2017-08-06 15:30:20  | 1   | 0                    |
    | 192.168.44.3   | http://www.xxx.ccc.aa/teach      | 2017-08-05 15:35:20  | 1   | 0                    |
    +----------------+---------------------------------+----------------------+-----+----------------------+--+
    
    
    ## FIRST_VALUE 函数
    例:取每个用户访问的第一个页面
    select ip,url,access_time,
    row_number() over(partition by ip order by access_time) as rn,
    first_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
    from t_access;
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    |       ip       |               url               |     access_time      | rn  |        last_access_time         |
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    | 192.168.111.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:35:20  | 1   | http://www.xxx.ccc.aa/register   |
    | 192.168.133.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:30:20  | 1   | http://www.xxx.ccc.aa/register   |
    | 192.168.33.25  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/teach      | 2017-08-04 15:35:20  | 2   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 3   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.36  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 1   | http://www.xxx.ccc.aa/excersize  |
    | 192.168.33.4   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.4   | http://www.xxx.ccc.aa/job        | 2017-08-04 16:30:20  | 2   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.44  | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/job        | 2017-08-05 16:30:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 2   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 3   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.5   | http://www.xxx.ccc.aa/job        | 2017-08-04 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-05 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 2   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 3   | http://www.xxx.ccc.aa/job        |
    | 192.168.34.44  | http://www.xxx.ccc.aa/pay        | 2017-08-06 15:30:20  | 1   | http://www.xxx.ccc.aa/pay        |
    | 192.168.44.3   | http://www.xxx.ccc.aa/teach      | 2017-08-05 15:35:20  | 1   | http://www.xxx.ccc.aa/teach      |
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    
    ## LAST_VALUE 函数
    例:取每个用户访问的最后一个页面
    select ip,url,access_time,
    row_number() over(partition by ip order by access_time) as rn,
    last_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
    from t_access;
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    |       ip       |               url               |     access_time      | rn  |        last_access_time         |
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    | 192.168.111.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:35:20  | 1   | http://www.xxx.ccc.aa/register   |
    | 192.168.133.3  | http://www.xxx.ccc.aa/register   | 2017-08-06 15:30:20  | 1   | http://www.xxx.ccc.aa/register   |
    | 192.168.33.25  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/teach      | 2017-08-04 15:35:20  | 2   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.3   | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 3   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.36  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 1   | http://www.xxx.ccc.aa/excersize  |
    | 192.168.33.4   | http://www.xxx.ccc.aa/stu        | 2017-08-04 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.4   | http://www.xxx.ccc.aa/job        | 2017-08-04 16:30:20  | 2   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.44  | http://www.xxx.ccc.aa/stu        | 2017-08-05 15:30:20  | 1   | http://www.xxx.ccc.aa/stu        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/job        | 2017-08-05 16:30:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 2   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.46  | http://www.xxx.ccc.aa/excersize  | 2017-08-06 16:30:20  | 3   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.5   | http://www.xxx.ccc.aa/job        | 2017-08-04 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-05 15:40:20  | 1   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 2   | http://www.xxx.ccc.aa/job        |
    | 192.168.33.55  | http://www.xxx.ccc.aa/job        | 2017-08-06 15:40:20  | 3   | http://www.xxx.ccc.aa/job        |
    | 192.168.34.44  | http://www.xxx.ccc.aa/pay        | 2017-08-06 15:30:20  | 1   | http://www.xxx.ccc.aa/pay        |
    | 192.168.44.3   | http://www.xxx.ccc.aa/teach      | 2017-08-05 15:35:20  | 1   | http://www.xxx.ccc.aa/teach      |
    +----------------+---------------------------------+----------------------+-----+---------------------------------+--+
    
    
    /*
        累计报表--分析函数实现版
    */
    -- sum() over() 函数
    select id
    ,month
    ,sum(amount) over(partition by id order by month rows between unbounded preceding and current row)
    from
    (select id,month,
    sum(fee) as amount
    from t_test
    group by id,month) tmp;
  • 相关阅读:
    元宇宙通证
    高性能公链
    区块链不可能三角
    搭建自己的在线API文档系统
    windows 安装python环境
    golang beego项目的正确开启方法
    人生发财靠康波
    蒙代尔不可能三角
    Kubernetes 部署Dashboard UI
    Kubernetes 使用kubeadm创建集群
  • 原文地址:https://www.cnblogs.com/arjenlee/p/9724035.html
Copyright © 2011-2022 走看看