zoukankan      html  css  js  c++  java
  • Hive ERROR: Out of memory due to hash maps used in map-side aggregation .

    当hive在执行大数据量的统计查询语句时,经常会出现下面OOM错误,具体错误提示如下:

    Possible error: Out of memory due to hash maps used in map-side aggregation.
    
    Solution: Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;'

    查看task的失败信息为:

    Error:GC overhead limit exceeded

    对于这个错误,通常是由两种情况造成的:(1) hive sql写的不合理,导致执行时hash map过大;(2)hive sql没有优化的余地了(要得到想要的数据只能写这样的sql)。

    对于(1)则改变sql语句,从而降低hash map的大小。对于(2)则可以调整参数。

    下面分别说明(1)和(2)的情况:

    (1)改变sql语句

    select count(distinct v) from tbl;
    可以改为select count(1) from (select v from tbl group by v) t;

    说明:减少了hash map的key个数 

    select collect_set(messageDate)[0],count(*) from incidents_hive group by substr(messageDate,8,2);
    可以改为select hourNum, count(1) from (select substr(messageDate,9,2) as hourNum from incidents_hive ) t group by hourNum;

    说明:没有减少hash map的key个数,但是减少了value的大小

    (2)调整参数

    对于这个sql语句,是没办法进行优化(因为keywords的重复率很低,导致map阶段里面维护的一个内存Map对象非常巨大)来降低hash map大小的:

    INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)="$yesterday" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;
    与mapjoin和map aggregate相关的优化参数有:

    hive.map.aggr

    hive.groupby.mapaggr.checkinterval

    hive.map.aggr.hash.min.reduction

    hive.map.aggr.hash.percentmemory

    hive.groupby.skewindata

    以上参数可以查看配置文件说明即文档进行调整。如果需求确实没法通过调整这些参数来达到,那么set hive.map.aggr=false便是最终的方案,它肯定能满足你需求,只是执行速度比map join 和 map aggr慢些,但通过实际跑数据你很可能发现其实它也不慢哈。

    参考文章:

    http://blog.csdn.net/macyang/article/details/9260777
    http://www.myexception.cn/open-source/1487747.html
    http://blog.csdn.net/lixucpf/article/details/20458617

     

    INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)="$yesterday" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;

     转自 http://blog.csdn.net/xyls12345/article/details/25418671

  • 相关阅读:
    118/119. Pascal's Triangle/II
    160. Intersection of Two Linked Lists
    168. Excel Sheet Column Title
    167. Two Sum II
    172. Factorial Trailing Zeroes
    169. Majority Element
    189. Rotate Array
    202. Happy Number
    204. Count Primes
    MVC之Model元数据
  • 原文地址:https://www.cnblogs.com/xd502djj/p/3852921.html
Copyright © 2011-2022 走看看