zoukankan      html  css  js  c++  java
  • Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

    参考:lxw大数据田地:http://lxw1234.com/archives/2015/04/193.htm

    数据准备:

    CREATE EXTERNAL TABLE test_data (
    month STRING,
    day STRING, 
    cookieid STRING 
    ) ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    stored as textfile location '/user/jc_rc_ftp/test_data';
    
    select * from test_data l;
    +----------+-------------+-------------+--+
    | l.month  |    l.day    | l.cookieid  |
    +----------+-------------+-------------+--+
    | 2015-03  | 2015-03-10  | cookie1     |
    | 2015-03  | 2015-03-10  | cookie5     |
    | 2015-03  | 2015-03-12  | cookie7     |
    | 2015-04  | 2015-04-12  | cookie3     |
    | 2015-04  | 2015-04-13  | cookie2     |
    | 2015-04  | 2015-04-13  | cookie4     |
    | 2015-04  | 2015-04-16  | cookie4     |
    | 2015-03  | 2015-03-10  | cookie2     |
    | 2015-03  | 2015-03-10  | cookie3     |
    | 2015-04  | 2015-04-12  | cookie5     |
    | 2015-04  | 2015-04-13  | cookie6     |
    | 2015-04  | 2015-04-15  | cookie3     |
    | 2015-04  | 2015-04-15  | cookie2     |
    | 2015-04  | 2015-04-16  | cookie1     |
    +----------+-------------+-------------+--+
    14 rows selected (0.249 seconds)

    GROUPING SETS

    在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL

    SELECT 
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID 
    FROM test_data 
    GROUP BY month,day 
    GROUPING SETS (month,day) 
    ORDER BY GROUPING__ID;
    
    等价于 
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
    UNION ALL 
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
    
    +----------+-------------+-----+---------------+--+
    |  month   |     day     | uv  | grouping__id  |
    +----------+-------------+-----+---------------+--+
    | 2015-04  | NULL        | 6   | 1             |
    | 2015-03  | NULL        | 5   | 1             |
    | NULL     | 2015-04-16  | 2   | 2             |
    | NULL     | 2015-04-15  | 2   | 2             |
    | NULL     | 2015-04-13  | 3   | 2             |
    | NULL     | 2015-04-12  | 2   | 2             |
    | NULL     | 2015-03-12  | 1   | 2             |
    | NULL     | 2015-03-10  | 4   | 2             |
    +----------+-------------+-----+---------------+--+
    8 rows selected (177.299 seconds)
    
    SELECT 
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID 
    FROM test_data 
    GROUP BY month,day 
    GROUPING SETS (month,day,(month,day)) 
    ORDER BY GROUPING__ID;
    
    等价于
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
    UNION ALL 
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
    UNION ALL 
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
    +----------+-------------+-----+---------------+--+
    |  month   |     day     | uv  | grouping__id  |
    +----------+-------------+-----+---------------+--+
    | 2015-04  | NULL        | 6   | 1             |
    | 2015-03  | NULL        | 5   | 1             |
    | NULL     | 2015-03-10  | 4   | 2             |
    | NULL     | 2015-04-16  | 2   | 2             |
    | NULL     | 2015-04-15  | 2   | 2             |
    | NULL     | 2015-04-13  | 3   | 2             |
    | NULL     | 2015-04-12  | 2   | 2             |
    | NULL     | 2015-03-12  | 1   | 2             |
    | 2015-04  | 2015-04-16  | 2   | 3             |
    | 2015-04  | 2015-04-12  | 2   | 3             |
    | 2015-04  | 2015-04-13  | 3   | 3             |
    | 2015-03  | 2015-03-12  | 1   | 3             |
    | 2015-03  | 2015-03-10  | 4   | 3             |
    | 2015-04  | 2015-04-15  | 2   | 3             |
    +----------+-------------+-----+---------------+--+

    备注:其中的 GROUPING__ID,表示结果属于哪一个分组集合。

    CUBE

    根据GROUP BY的维度的所有组合进行聚合。

    SELECT 
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID 
    FROM test_data 
    GROUP BY month,day 
    WITH CUBE 
    ORDER BY GROUPING__ID;
    
    等价于
    SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data
    UNION ALL 
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
    UNION ALL 
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
    UNION ALL 
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
    +----------+-------------+-----+---------------+--+
    |  month   |     day     | uv  | grouping__id  |
    +----------+-------------+-----+---------------+--+
    | NULL     | NULL        | 7   | 0             |
    | 2015-03  | NULL        | 5   | 1             |
    | 2015-04  | NULL        | 6   | 1             |
    | NULL     | 2015-04-16  | 2   | 2             |
    | NULL     | 2015-04-15  | 2   | 2             |
    | NULL     | 2015-04-13  | 3   | 2             |
    | NULL     | 2015-04-12  | 2   | 2             |
    | NULL     | 2015-03-12  | 1   | 2             |
    | NULL     | 2015-03-10  | 4   | 2             |
    | 2015-04  | 2015-04-12  | 2   | 3             |
    | 2015-04  | 2015-04-16  | 2   | 3             |
    | 2015-03  | 2015-03-12  | 1   | 3             |
    | 2015-03  | 2015-03-10  | 4   | 3             |
    | 2015-04  | 2015-04-15  | 2   | 3             |
    | 2015-04  | 2015-04-13  | 3   | 3             |
    +----------+-------------+-----+---------------+--+

    ROLLUP

    是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

    比如,以month维度进行层级聚合:
    SELECT 
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM test_data 
    GROUP BY month,day
    WITH ROLLUP 
    ORDER BY GROUPING__ID;
    可以实现这样的上钻过程:月天的UV->月的UV->总UV
    +----------+-------------+-----+---------------+--+
    |  month   |     day     | uv  | grouping__id  |
    +----------+-------------+-----+---------------+--+
    | NULL     | NULL        | 7   | 0             |
    | 2015-04  | NULL        | 6   | 1             |
    | 2015-03  | NULL        | 5   | 1             |
    | 2015-04  | 2015-04-16  | 2   | 3             |
    | 2015-04  | 2015-04-15  | 2   | 3             |
    | 2015-04  | 2015-04-13  | 3   | 3             |
    | 2015-04  | 2015-04-12  | 2   | 3             |
    | 2015-03  | 2015-03-12  | 1   | 3             |
    | 2015-03  | 2015-03-10  | 4   | 3             |
    +----------+-------------+-----+---------------+--+
     
    --把month和day调换顺序,则以day维度进行层级聚合: 
    SELECT 
    day,
    month,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM test_data 
    GROUP BY day,month 
    WITH ROLLUP 
    ORDER BY GROUPING__ID;
    +-------------+----------+-----+---------------+--+
    |     day     |  month   | uv  | grouping__id  |
    +-------------+----------+-----+---------------+--+
    | NULL        | NULL     | 7   | 0             |
    | 2015-04-12  | NULL     | 2   | 1             |
    | 2015-04-15  | NULL     | 2   | 1             |
    | 2015-03-12  | NULL     | 1   | 1             |
    | 2015-04-16  | NULL     | 2   | 1             |
    | 2015-03-10  | NULL     | 4   | 1             |
    | 2015-04-13  | NULL     | 3   | 1             |
    | 2015-04-16  | 2015-04  | 2   | 3             |
    | 2015-04-15  | 2015-04  | 2   | 3             |
    | 2015-04-13  | 2015-04  | 3   | 3             |
    | 2015-03-12  | 2015-03  | 1   | 3             |
    | 2015-03-10  | 2015-03  | 4   | 3             |
    | 2015-04-12  | 2015-04  | 2   | 3             |
    +-------------+----------+-----+---------------+--+

    可以实现这样的上钻过程:
    天月的UV->天的UV->总UV
    (这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)

  • 相关阅读:
    Excel—TIME函数简介与用法
    Excel—LEFT、RIGHT、MID函数提取不同位置的字段
    $scope.triggerSave $scope.createForm.dayType.$dirty = false;
    SVN clean up的作用
    js 获取当年到今日的时间区间
    jersey
    vector
    AngularJS 2
    URL 字符介绍
    JS factory
  • 原文地址:https://www.cnblogs.com/yy3b2007com/p/8583181.html
Copyright © 2011-2022 走看看