zoukankan      html  css  js  c++  java
  • Hive之cube和roolup

     

    Cube

    hive (hdata)> select * from test;

     

    test.f1     test.f2     test.f3     test.cnt

    A            A            B            1

    B            B            A            1

    A            A            A            2

     

    hive (hdata)> SELECT f1,

                >        f2,

                >        f3,

                >        sum(cnt)

                > FROM test

                > GROUP BY f1,

                >          f2,

                >          f3;

    f1    f2    f3    _c3

    A     A     A     2

    A     A     B     1

    B     B     A     1

     

    hive (hdata)> SELECT f1,

                >        f2,

                >        f3,

                >        sum(cnt),

                >        GROUPING__ID,

                >        rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')

                > FROM test

                > GROUP BY f1,

                >          f2,

                >          f3 WITH CUBE;

     

    f1           f2           f3           _c3         grouping__id        _c5

    NULL      NULL      NULL      4            7                          000

    NULL      NULL      A            3            6                          001

    NULL      NULL      B            1            6                          001

    NULL      A            NULL      3            5                          010

    NULL      A            A            2            4                          011

    NULL      A            B            1            4                          011

    NULL      B            NULL      1            5                          010

    NULL      B            A            1            4                          011

    A            NULL      NULL      3            3                          100

    A            NULL      A            2            2                          101

    A            NULL      B            1            2                          101

    A            A            NULL      3            1                          110

    A            A            A            2            0                          111

    A            A            B            1            0                          111

    B            NULL      NULL      1            3                          100

    B            NULL      A            1            2                          101

    B            B            NULL      1            1                          110

    B            B            A            1            0                          111

     

    cube简称数据魔方,可以实现hive多个任意维度的查询,cube(a,b,c)则首先会对(a,b,c)进行group by,然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by,他会统计所选列中值的所有组合的聚合。

     

    GROUPING SETS

    GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统计选项,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来,下面是几个实例可以帮助我们了解,

    select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id))

    等价于

    SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id

     

    select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id),(device_id))

    等价于

    SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id 
    UNION ALL 
    SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id

     

    Rollup

    可以实现从右到左递减多级的统计,显示统计某一层次结构的聚合。

    SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id 
    UNION ALL 
    SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id

    等价于

    select device_id,os_id,app_id,client_version,from_id,count(user_id) 

    from test_xinyan_reg 

    group by device_id,os_id,app_id,client_version,from_id 

    grouping sets ((device_id,os_id,app_id,client_version,from_id),(device_id,os_id,app_id,client_version),(device_id,os_id,app_id),(device_id,os_id),(device_id),());

     

    Grouping_ID函数

    当我们没有统计某一列时,它的值显示为null,这可能与列本身就有null值冲突,这就需要一种方法区分是没有统计还是值本来就是null。(写一个排列组合的算法,就马上理解了,grouping_id其实就是所统计各列二进制和)

     

    https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

  • 相关阅读:
    基本HAL库操作函数整理
    oled(iic协议)
    Uart串口中断收发
    博主回来啦
    博主的冒泡1
    AFO

    起床困难综合症
    费解的开关
    数独
  • 原文地址:https://www.cnblogs.com/BetterThanEver_Victor/p/9364081.html
Copyright © 2011-2022 走看看