zoukankan      html  css  js  c++  java
  • [Hive_11] Hive 的高级聚合函数


    0. 说明

      Hive 的高级聚合函数 union all | grouping sets | cube | rollup 

      pv //page view 页面访问量
      uv //user view 访问人数


    1. union all

      表联合操作

      1.0 准备数据

      pv.txt

    2015-03    2015-03-10    cookie1
    2015-03    2015-03-10    cookie5
    2015-03    2015-03-12    cookie7
    2015-04    2015-04-12    cookie3
    2015-04    2015-04-13    cookie2
    2015-04    2015-04-13    cookie4
    2015-04    2015-04-16    cookie4
    2015-03    2015-03-10    cookie2
    2015-03    2015-03-10    cookie3
    2015-04    2015-04-12    cookie5
    2015-04    2015-04-13    cookie6
    2015-04    2015-04-15    cookie3
    2015-04    2015-04-15    cookie2
    2015-04    2015-04-16    cookie1
    2015-02    2015-02-16    cookie2
    2015-02    2015-02-16    cookie3

      1.1 建表

    create table uv(month string,day string, id string) row format delimited fields terminated by '	';

      1.2 加载数据

    load data local inpath '/home/centos/files/pv.txt' into table uv;

      1.3 设置本地模式

        SET hive.exec.mode.local.auto=true;

      1.4 统计每月用户访问量

    select month ,count(distinct id) from uv group by month;

      1.5 统计每天用户访问量

    select day ,count(distinct id) from uv group by day;

      1.6 联合查询每月每天用户访问量

    select month ,count(distinct id) from uv group by month union all select day ,count(distinct id) from uv group by day;

       

      1.7 使用分组集(grouping sets)实现查询

    select month, day,count(distinct id), grouping__id from uv group by month,day grouping sets(month,day);

      grouping__id //分组的组号

      

      month

      day

      1.8 使用 cube 进行查询

    select month, day,count(distinct id), grouping__id from uv group by month,day with cube order by grouping__id;

       

      null
      year
      month
      day
      year,month
      year,day
      month,day
      year,momth,day

      1.9 使用 rollup 进行查询

    select month, day,count(distinct id), grouping__id from uv group by month,day with rollup order by grouping__id;

      

      null
      year
      year month
      year month day


  • 相关阅读:
    C++ 扩展 Op
    Python 扩展 Op
    VS Code 调试 OneFlow
    运行时数据获取
    OFRecord 图片文件制数据集
    OFRecord 数据集加载
    OFRecord 数据格式
    OneFlow 并行特色
    Consistent 与 Mirrored 视角
    作业函数的定义与调用
  • 原文地址:https://www.cnblogs.com/share23/p/10319921.html
Copyright © 2011-2022 走看看