zoukankan      html  css  js  c++  java
  • error in shuffle in fetcher 分析及方案

    error in shuffle in fetcher 分析及方案

    ShuffleError 错误信息:

    Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
    Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
    

    Cause 原因:reduce会在map执行到一定比例启动多个fetch线程去拉取map的输出结果,放到reduce的内存、磁盘中,然后进行merge。当数据量大时,拉取到内存的数据就会引起OOM,所以此时要减少fetch占内存的百分比,将fetch的数据直接放在磁盘上。
    有关参数:mapreduce.reduce.shuffle.memory.limit.percent

    Default Configuration 默认参数:

    <property>
       <name>mapreduce.reduce.shuffle.memory.limit.percent</name>
      <value>0.25</value>
      <description>Expert: Maximum percentage of the in-memory limit that a
      single shuffle can consume</description>
    </property>
    

    OR 或者

    ## hive
    hive>set mapreduce.reduce.shuffle.memory.limit.percent;
    mapreduce.reduce.shuffle.memory.limit.percent=0.15
    

    Solution 处理方案:限制reduce的shuffle内存使用

    如果是hive sql,在sql执行之前,增加如下语句:

    set mapreduce.reduce.shuffle.memory.limit.percent=0.15;
    

    如果是 MapReduce 程序,在job conf中设置如下:

    job.getConfiguration().setStrings("mapreduce.reduce.shuffle.memory.limit.percent", "0.15");
    

    参考:http://www.sqlparty.com/yarn在shuffle阶段内存不足问题error-in-shuffle-in-fetcher/

  • 相关阅读:
    inline-block 文字与图片不对齐
    js去除数组重复项
    react2
    kfaka windows安装
    sigar 监控服务器硬件信息
    Disruptor
    Servlet 3特性:异步Servlet
    jvmtop 监控
    eclipse如何debug调试jdk源码
    一致性hash算法
  • 原文地址:https://www.cnblogs.com/myblog1900/p/10031873.html
Copyright © 2011-2022 走看看