zoukankan      html  css  js  c++  java
  • hive里的group by和distinct

    hive里的group by和distinct

    前言

    今天才明确知道group by实际上还是有去重读作用的,其实细想一下,按照xx分类,肯定相同的就算是一类了,也就相当于去重来,详细的看一下。

    group by
    • 看一下实例1:
    hive> select * from test;
    OK
    zhao	15	20170807
    zhao	14	20170809
    zhao	15	20170809
    zhao	16	20170809
    
    hive> select name from test;
    OK
    zhao
    zhao
    zhao
    zhao
    
    hive> select name from test group by name;
    
    ...
    
    OK
    zhao
    Time taken: 40.273 seconds, Fetched: 1 row(s)
    

    按照这个去分类,最后结果只有一个,达到了去重的效果;实际上,所谓去重,肯定是两个一样的才可以去重,下面试一下两列的效果:

    hive> select name,age from test group by name,age;
    ...
    
    OK
    zhao	14
    zhao	15
    zhao	16
    Time taken: 36.943 seconds, Fetched: 3 row(s)
    
    hive> select name,age from test group by name;
    FAILED: SemanticException [Error 10025]: Line 1:12 Expression not in GROUP BY key 'age'
    

    只group by name就会出错,想一下只用name去做那么age不同就没法处理了,也合情合理。

    distinct

    这个也比较简单,就是去重:

    hive> select distinct name from test;
    ...
    
    OK
    zhao
    Time taken: 37.047 seconds, Fetched: 1 row(s)
    
    hive> select distinct name,age from test;
    OK
    zhao	14
    zhao	15
    zhao	16
    Time taken: 39.131 seconds, Fetched: 3 row(s)
    
    hive> select distinct(name),age from test;
    OK
    zhao	14
    zhao	15
    zhao	16
    Time taken: 37.739 seconds, Fetched: 3 row(s)
    
    区别
    • 如果数据较多,distinct效率会更低一些,一般推荐使用group by。
    • 至于原因,推荐这篇文章
  • 相关阅读:
    2019.5.28
    蓝桥杯2017Java B组---分巧克力and承压计算
    看似忙碌的背后我都干了点什么
    3.9个人总结
    3.2个人总结
    2.23个人总结
    2.16个人总结
    2019.01.27个人总结
    1.19个人总结
    12.22个人总结
  • 原文地址:https://www.cnblogs.com/wswang/p/7718085.html
Copyright © 2011-2022 走看看