zoukankan      html  css  js  c++  java
  • Hive练习题1-累计报表

    如下测试数据

    A,2015-01,5
    A,2015-01,15
    B,2015-01,5
    A,2015-01,8
    B,2015-01,25
    A,2015-01,5
    C,2015-01,10
    C,2015-01,20
    A,2015-02,4
    A,2015-02,6
    C,2015-02,30
    C,2015-02,10
    B,2015-02,10
    B,2015-02,5
    A,2015-03,14
    A,2015-03,6
    B,2015-03,20
    B,2015-03,25
    C,2015-03,10
    C,2015-03,20
    View Code

    需求:

    求单月访问次数和总访问次数,截至当前最大访问次数

     

    建表

    create table t_access(name string, month string, time int) row format delimited fields terminated by ',';

    导数

    load data local inpath "/home/access_time.txt" into table t_access;

    查询

    select * from t_access;

    传统做法

    求每个用户,每月的总访问次数,数据放到一张临时表

    create table t_access_tmp as
    select name, month, sum(time) as month_times from t_access group by name,month;

     临时表自连接

    select a.*, b.*
    from t_access_tmp a join t_access_tmp b on a.name = b.name
    where a.month<=b.month

    按照b.name,b.month聚合,求最大值和求和。

    select
    t.bname,
    t.bmonth,
    sum(t.month_times) as accu_times,
    max(t.month_times) as max_times
    from(
    select b.name as bname, b.month as bmonth, a.month_times as month_times from t_access_tmp a 
    join t_access_tmp b on a.name = b.name where a.month<=b.month) as t group by t.bname,t.bmonth;

    或者简写成

    select a.name, a.month, sum(b.times) as accumulate 
    from (select name,month,sum(time) as times from t_access group by name,month) a join (select name,month,sum(time) as times from t_access group by name,month) b on a.name = b.name where a.month >= b.month group by a.name, a.month;

     


     

    用开窗函数做法,效率更高。

    curent_month_times是当前用户,当前月的访问次数

    accu_times是当前用户,截至当前月的累计访问次数

    max_times是当前用户,截至当前月的最大的月访问次数

     

    select
    t.name,
    t.month,
    t.month_times as curent_month_times,
    sum(t.month_times)over(partition by t.name order by t.month rows between unbounded preceding and current row ) as accu_times,
    max(t.month_times)over(partition by t.name order by t.month rows between unbounded preceding and current row ) as max_times
    from(
    select name, month, sum(time) as month_times from t_access group by name,month
    ) t;
    首先对原始数据,根据name,month聚合。聚合后的结果用分析函数求累计
    在这个题中,需要加rows between unbounded preceding and current row,意思是从这一组的第一行开始到当前行需要进行聚合计算。

     

     

     

     

     

  • 相关阅读:
    centos7.3部署memcached服务
    tomcat一闪而过,无法开启
    Windows系统因“CredSSP加密Oracle修正”无法远程连接
    猴子和打字机
    特修斯之船
    爱因斯坦的光线
    10个著名的思想实验(2)
    10个著名的思想实验1
    快速排序
    快速排序的c++实现
  • 原文地址:https://www.cnblogs.com/lucas-zhao/p/11905278.html
Copyright © 2011-2022 走看看