zoukankan      html  css  js  c++  java
  • Hive之累计报表生成

    Hive之累计报表生成

    1. 原始数据

    u01 2019/1/21 5
    u02 2019/1/23 6
    u03 2019/1/22 8
    u04 2019/1/20 3
    u01 2019/1/23 6
    u01 2019/2/21 8
    u02 2019/1/23 6
    u01 2019/2/22 4

    2. 建表映射上述数据

    create table action (userId string, visitDate string, visitCount int) row format delimited fields terminated by " ";

    3. 按照用户和月份分组生成某用户的当月总访问次数

    create table action_amount
    as
    select tmp.userid,tmp.month,sum(tmp.visitcount) amount from (select userid,from_unixtime(unix_timestamp(visitdate,'yyyy/mm/dd'),'yyyy-mm') month,visitcount from action) tmp group by tmp.userid,tmp.month;

    4. 通过两个表的自连接,建立临时表

    create table action_tmp
    as
    select a.amount as a_amount,b.*
    from action_amount a join action_amount b on a.userid=b.userid
    where a.month <= b.month;

    5. 将上述表按照userid和month分组

    select userid,month,max(amount) as amount,sum(a_amount) as accumulate
    from action_tmp
    group by userid,month;

    6. 使用加窗函数完成累计报表生成

    select userid, month,amount,
    sum(amount) over(partition by userid order by month rows between unbounded preceding and current row) as accumulate
    from action_amount;

  • 相关阅读:
    监督学习——AdaBoost元算法提高分类性能
    监督学习——logistic进行二分类(python)
    监督学习——朴素贝叶斯分类理论与实践
    nrm 工具的使用
    Linux下安装MySQL
    Node环境配置
    07.移动端类库
    06.网页布局
    05.预处理器
    04.触屏事件
  • 原文地址:https://www.cnblogs.com/zhangchenchuan/p/11973764.html
Copyright © 2011-2022 走看看