zoukankan      html  css  js  c++  java
  • Hadoop-2.2.0中国文献——MapReduce 下一代 —配置单节点集群

    Mapreduce 包

    你需从公布页面获得MapReduce tar包。若不能。你要将源代码打成tar包。

    $ mvn clean install -DskipTests
    $ cd hadoop-mapreduce-project
    $ mvn clean install assembly:assembly -Pnative

    注意:你须要安装有protoc 2.5.0。

    忽略本地建立mapreduce。你能够在maven中省略-Pnative參数。

    tar包应该在target/directory。

    配置环境

    如果你已经安装hadoop-common/hadoop-hdfs,而且输出了$HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME,解压hadoop mapreduce 包,配置环境变量$HADOOP_MAPRED_HOME到要安装的文件夹。$HADOOP_YARN_HOME的配置和 $HADOOP_MAPRED_HOME一样.

    注意:以下的操作如果你已经执行了hdfs。

    设置配置信息

    要启动ResourceManager and NodeManager, 你必须升级配置。如果你的 $HADOOP_CONF_DIR是配置文件夹。而且已经安装了HDFS和core-site.xml。还有2个配置文件你必须设置 mapred-site.xml 和yarn-site.xml.

    设置 mapred-site.xml

    加入以下的配置到你的mapred-site.xml.

    <property>
        <name>mapreduce.cluster.temp.dir</name>
        <value></value>
        <description>No description</description>
        <final>true</final>
      </property>
    
      <property>
        <name>mapreduce.cluster.local.dir</name>
        <value></value>
        <description>No description</description>
        <final>true</final>
      </property>

    设置 yarn-site.xml

    加入以下的配置到你的yarn-site.xml.

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resource manager and 
        port is the port on which the NodeManagers contact the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resourcemanager and port is the port
        on which the Applications in the cluster talk to the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>host:port</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
        which the clients can talk to the Resource Manager. </description>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value></value>
        <description>the local directories used by the nodemanager</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:port</value>
        <description>the nodemanagers bind to this port</description>
      </property>  
    
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
      </property>
     
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/app-logs</value>
        <description>directory on hdfs where the application logs are moved to </description>
      </property>
    
       <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value></value>
        <description>the directories used by Nodemanagers as log directories</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>


    设置 capacity-scheduler.xml

    确保你放置根队列到capacity-scheduler.xml.

     <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>unfunded,default</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
        <value>50</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>50</value>
      </property>

    执行守护进程

    如果环境变量 $HADOOP_COMMON_HOME$HADOOP_HDFS_HOME$HADOO_MAPRED_HOME$HADOOP_YARN_HOME,$JAVA_HOME 和 $HADOOP_CONF_DIR 已经设置正确。$$YARN_CONF_DIR 的设置同 $HADOOP_CONF_DIR。

    执行ResourceManager 和 NodeManager 例如以下:

    $ cd $HADOOP_MAPRED_HOME
    $ sbin/yarn-daemon.sh start resourcemanager
    $ sbin/yarn-daemon.sh start nodemanager

    你应该启动和执行。你能够执行randomwriter例如以下:

    $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out

    祝你好运。

  • 相关阅读:
    五十七、linux 编程——UDP 编程 域名解析
    浅谈数学建模
    数值分析实验之线性方程组的迭代求解(MATLAB实现)
    数值分析实验之线性方程组的迭代求解(Python实现)
    数值分析实验之线性方程组的迭代求解(java实现)
    数值分析实验之最小二乘拟合 含有噪声扰动(MATLAB实现)
    数值分析实验之最小二乘拟合 含有噪声扰动(python实现)
    数值计算方法实验之newton多项式插值 (Python 代码)
    数值计算方法实验之Hermite 多项式插值 (Python 代码)
    数值计算方法实验之按照按三弯矩方程及追赶法的三次样条插值 (MATLAB 代码)
  • 原文地址:https://www.cnblogs.com/yxwkf/p/5037435.html
Copyright © 2011-2022 走看看