zoukankan      html  css  js  c++  java
  • 2.28 MapReduce在实际应用中常见的优化

    一、优化的点

    • Reduce Task Number
    • Map Task输出压缩
    • Shuffle Phase 参数
    • map、reduce分配的虚拟CPU

    二、Reduce Task Number

    Reduce Task 默认是一个;

    Reduce Task的数目也不是越多越好,实际中需要测试调整,以调整到最优的个数, 如下;

    job.setNumReduceTasks(2);

     

    三、Map Task输出压缩

    上一节已经讲到了;

    四、Shuffle Phase 参数

    具体可参考:mapred-default.xml

    可调的有如下几点:

    mapreduce.task.io.sort.factor:

    <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>10</value>
      <description>The number of streams to merge at once while sorting
      files.  This determines the number of open file handles.</description>
    </property>

    mapreduce.task.io.sort.mb:

    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>100</value>
      <description>The total amount of buffer memory to use while sorting 
      files, in megabytes.  By default, gives each merge stream 1MB, which
      should minimize seeks.</description>
    </property>

    mapreduce.map.sort.spill.percent:

    <property>
      <name>mapreduce.map.sort.spill.percent</name>
      <value>0.80</value>
      <description>The soft limit in the serialization buffer. Once reached, a
      thread will begin to spill the contents to disk in the background. Note that
      collection will not block if this threshold is exceeded while a spill is
      already in progress, so spills may be larger than this threshold when it is
      set to less than .5</description>
    </property>

    五、map、reduce分配的虚拟CPU

    默认都是一个虚拟CPU,实际中也可以调整;

    1、map

    mapreduce.map.cpu.vcores:

    <property>
      <name>mapreduce.map.cpu.vcores</name>
      <value>1</value>
      <description>
          The number of virtual cores required for each map task.
      </description>
    </property>

    2、reduce

    mapreduce.reduce.cpu.vcores:

    <property>
      <name>mapreduce.reduce.cpu.vcores</name>
      <value>1</value>
      <description>
          The number of virtual cores required for each reduce task.
      </description>
    </property>
  • 相关阅读:
    nginx -s reload 时报错 [error] open() "/run/nginx.pid" failed (2: No such file or directory)
    系统调用(四):SSDT
    系统调用(三): 分析KiFastCallEntry(二)
    系统调用(二): 分析KiFastCallEntry(一)
    系统调用(一): 重写WriteProcessMemory
    CVE-2009-0927分析
    CVE-2008-0015分析
    CVE-2006-4868分析
    在linux上使用impdp命令时提示ORA-12154: TNS:could not resolve the connect identifier specified的问题
    破解myeclipse10失败的一个奇葩原因
  • 原文地址:https://www.cnblogs.com/weiyiming007/p/10717143.html
Copyright © 2011-2022 走看看