zoukankan      html  css  js  c++  java
  • Hive高级聚合GROUPING SETS,ROLLUP以及CUBE


    scala> import org.apache.spark.sql.hive.HiveContext
    import org.apache.spark.sql.hive.HiveContext

    scala> val hcon=new HiveContext(sc)
    warning: there was one deprecation warning; re-run with -deprecation for details
    hcon: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@dd102ea

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex").show
    +---+---+--------+
    |age|sex|count(1)|
    +---+---+--------+
    | 56| 0| 7|
    | 32| 1| 7|
    | 20| 1| 7|
    | 50| 1| 7|
    | 5| 1| 4|
    | 47| 0| 7|
    | 85| 1| 7|
    |100| 0| 5|
    +---+---+--------+

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,())").show
    +----+----+--------+
    | age| sex|count(1)|
    +----+----+--------+
    | 56| 0| 7|
    |null| 1| 32|
    | 20| 1| 7|
    |null|null| 51|
    | 32| 1| 7|
    | 5| 1| 4|
    | 85| 1| 7|
    | 47| 0| 7|
    | 100| 0| 5|
    |null| 0| 19|
    | 50| 1| 7|
    +----+----+--------+

    GROUPING SETS

    在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL,SETS的子句中如果包含()数据集,则表示整体聚合

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,()) order by age,sex").show
    +----+----+--------+
    | age| sex|count(1)|
    +----+----+--------+
    |null|null| 51|
    |null| 0| 19|
    |null| 1| 32|
    | 5| 1| 4|
    | 20| 1| 7|
    | 32| 1| 7|
    | 47| 0| 7|
    | 50| 1| 7|
    | 56| 0| 7|
    | 85| 1| 7|
    | 100| 0| 5|
    +----+----+--------+

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,age,()) order by age,sex").show
    +----+----+--------+
    | age| sex|count(1)|
    +----+----+--------+
    |null|null| 51|
    |null| 0| 19|
    |null| 1| 32|
    | 5|null| 4|
    | 5| 1| 4|
    | 20|null| 7|
    | 20| 1| 7|
    | 32|null| 7|
    | 32| 1| 7|
    | 47|null| 7|
    | 47| 0| 7|
    | 50|null| 7|
    | 50| 1| 7|
    | 56|null| 7|
    | 56| 0| 7|
    | 85|null| 7|
    | 85| 1| 7|
    | 100|null| 5|
    | 100| 0| 5|
    +----+----+--------+

    CUBE

    根据GROUP BY的维度的所有组合进行聚合。

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex with cube order by age,sex").show
    +----+----+--------+
    | age| sex|count(1)|
    +----+----+--------+
    |null|null| 51|
    |null| 0| 19|
    |null| 1| 32|
    | 5|null| 4|
    | 5| 1| 4|
    | 20|null| 7|
    | 20| 1| 7|
    | 32|null| 7|
    | 32| 1| 7|
    | 47|null| 7|
    | 47| 0| 7|
    | 50|null| 7|
    | 50| 1| 7|
    | 56|null| 7|
    | 56| 0| 7|
    | 85|null| 7|
    | 85| 1| 7|
    | 100|null| 5|
    | 100| 0| 5|
    +----+----+--------+

    ROLLUP

    是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

    scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex with rollup order by age,sex").show
    +----+----+--------+
    | age| sex|count(1)|
    +----+----+--------+
    |null|null| 51|
    | 5|null| 4|
    | 5| 1| 4|
    | 20|null| 7|
    | 20| 1| 7|
    | 32|null| 7|
    | 32| 1| 7|
    | 47|null| 7|
    | 47| 0| 7|
    | 50|null| 7|
    | 50| 1| 7|
    | 56|null| 7|
    | 56| 0| 7|
    | 85|null| 7|
    | 85| 1| 7|
    | 100|null| 5|
    | 100| 0| 5|
    +----+----+--------+

  • 相关阅读:
    2019春第一次实验报告
    2019春第二次实验报告
    第十二周编程总结
    第十一周编程总结
    第十周作业
    C语言II博客作业04
    C语言II博客作业03
    C语言II博客作业02
    C语言II博客作业01
    学期总结
  • 原文地址:https://www.cnblogs.com/playforever/p/9336445.html
Copyright © 2011-2022 走看看