zoukankan      html  css  js  c++  java
  • Hadoop HA on Yarn——集群配置

    集群搭建

    因为服务器数量有限,这里服务器开启的进程有点多:

    机器名   安装软件   运行进程  
    hadoop001    Hadoop,Zookeeper  

    NameNode, DFSZKFailoverController, ResourceManager

    DataNode, NodeManager

    QuorumPeerMain

    JournalNode

    hadoop002 Hadoop,Zookeeper

    NameNode, DFSZKFailoverController, ResourceManager

    DataNode, NodeManager

    QuorumPeerMain 

    JournalNode

    hadoop003 Hadoop,Zookeeper

    DataNode, NodeManager

    QuorumPeerMain

     
     
     
    说明[2]:
    在hadoop2.X中通常由两个NameNode组成,一个处于active状态,另一个处于standby状态。Active NameNode对外提供服务,而Standby NameNode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败时快速进行切换。
    hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM(由cloudra提出,原理类似zookeeper)。这里我使用QJM完成。主备NameNode之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode
     
    这里略去jdk,Hadoop,Zookeeper的安装过程和环境变量配置。
     

    无密码登陆

    这里要非常注意无密码登陆的配置:
    ssh-keygen -t rsa

    在~/.ssh/目录中生成两个文件id_rsa和id_rsa.pub

    如果想从hadoop001免密码登录到hadoop002中要在hadoop001中执行

    ssh-copy-id -i ~/.ssh/id_rsa.pub [用户名]@hadoop002

    这里为了实现任何机器之间都可以免密码登陆,所以在hadoop001中再执行两遍上面的操作(把@后面的机器名分别改成hadoop001和hadoop003),最后把生成的authorized_keys复制所有的节点上

    Hadoop配置 

    core-site.xml

    <configuration>
    <!--   -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://appcluster</value>
    </property>
    
    <!-- 指定hadoop临时目录 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/data/hadoop/storage/tmp</value>
    </property>
    
    <!-- 指定zookeeper地址 -->
    <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
    
    <property>
    <name>ha.zookeeper.session-timeout.ms</name>
    <value>2000</value>
    </property>
    </configuration>

    hdfs-site.xml

    <configuration>
    <!--指定namenode名称空间的存储地址-->
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/hadoop/storage/hdfs/name</value>
    </property>
    
    <!--指定datanode数据存储地址-->
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/hadoop/storage/hdfs/data</value>
    </property>
    
    <!--指定数据冗余份数-->
    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>
    
    <!--指定hdfs的nameservice为appcluster,需要和core-site.xml中的保持一致 -->
    <property> 
    <name>dfs.nameservices</name> 
    <value>appcluster</value> 
    </property>
    
    <!-- appcluster下面有两个NameNode,分别是nn1,nn2 -->
    <property> 
    <name>dfs.ha.namenodes.appcluster</name> 
    <value>nn1,nn2</value> 
    </property> 
    
    <!-- nn1的RPC通信地址 -->
    <property> 
    <name>dfs.namenode.rpc-address.appcluster.nn1</name> 
    <value>hadoop001:8020</value> 
    </property> 
    
    <!-- nn2的RPC通信地址 -->
    <property> 
    <name>dfs.namenode.rpc-address.appcluster.nn2</name> 
    <value>hadoop002:8020</value> 
    </property> 
    
    <!-- nn1的http通信地址 -->
    <property> 
    <name>dfs.namenode.http-address.appcluster.nn1</name> 
    <value>hadoop001:50070</value> 
    </property> 
    
    <!-- nn2的http通信地址 -->
    <property> 
    <name>dfs.namenode.http-address.appcluster.nn2</name> 
    <value>hadoop002:50070</value> 
    </property> 
    
    <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
    <property> 
    <name>dfs.namenode.shared.edits.dir</name> 
    <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/appcluster</value> 
    </property> 
    
    <property> 
    <name>dfs.ha.automatic-failover.enabled.appcluster</name> 
    <value>true</value> 
    </property> 
    
    <!-- 配置失败自动切换实现方式 -->
    <property> 
    <name>dfs.client.failover.proxy.provider.appcluster</name> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
    </property> 
    
    <!-- 配置隔离机制 -->
    <property> 
    <name>dfs.ha.fencing.methods</name> 
    <value>sshfence</value> 
    </property>
    
    <!-- 使用隔离机制时需要ssh免密码登陆 -->
    <property> 
    <name>dfs.ha.fencing.ssh.private-key-files</name> 
    <value>/home/[用户名]/.ssh/id_rsa</value> 
    </property> 
    
    <!-- -->
    <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/data/hadoop/tmp/journal</value>
    </property>
    </configuration>

    mapred-site.xml

    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    
    <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>0.0.0.0:10020</value>
    </property>
    
    <!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>0.0.0.0:19888</value>
    </property>
    </configuration>  

    yarn-site.xml

    <?xml version="1.0"?>
    <configuration>
    <!--rm失联后重新链接的时间-->
    <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
    </property>
    
    <!--开启resourcemanagerHA,默认为false-->
    <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
    </property>
    
    <!--配置resourcemanager-->
    <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
    </property>
    
    <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
    
    <!--开启故障自动切换-->
    <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop001</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop002</value>
    </property>
    
    <!--
    在hadoop001上配置rm1,在hadoop002上配置rm2,
    注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改
    -->
    <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm1</value>
    <description>If we want to launch more than one RM in single node,we need this configuration</description>
    </property>
    
    <!--开启自动恢复功能-->
    <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
    </property>
    
    <!--配置与zookeeper的连接地址-->
    <property>
    <name>yarn.resourcemanager.zk-state-store.address</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>appcluster-yarn</value>
    </property>
    
    <!--schelduler失联等待连接时间-->
    <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
    </property>
    
    <!--配置rm1-->
    <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>hadoop001:8032</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>hadoop001:8030</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop001:8088</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>hadoop001:8031</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>hadoop001:8033</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.ha.admin.address.rm1</name>
    <value>hadoop001:23142</value>
    </property>
    
    <!--配置rm2-->
    <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>hadoop002:8032</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>hadoop002:8030</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop002:8088</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>hadoop002:8031</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>hadoop002:8033</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.ha.admin.address.rm2</name>
    <value>hadoop002:23142</value>
    </property>
    
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
    <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/hadoop/yarn/local</value>
    </property>
    
    <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/data/hadoop/yarn/log</value>
    </property>
    
    <property>
    <name>mapreduce.shuffle.port</name>
    <value>23080</value>
    </property>
    
    <!--故障处理类-->
    <property>
    <name>yarn.client.failover-proxy-provider</name>
    <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
    <value>/yarn-leader-election</value>
    <description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description>
    </property>
    </configuration>

    hadoop-env.sh & mapred-env.sh & yarn-env.sh

    export JAVA_HOME=/usr/java/jdk1.7.0_60 
    export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
      
    export HADOOP_HOME=/data/hadoop-2.6.0
    export HADOOP_PID_DIR=/data/hadoop/pids 
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
    export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$HADOOP_HOME/lib/native" 
      
    export HADOOP_PREFIX=$HADOOP_HOME 
      
    export HADOOP_MAPRED_HOME=$HADOOP_HOME 
    export HADOOP_COMMON_HOME=$HADOOP_HOME 
    export HADOOP_HDFS_HOME=$HADOOP_HOME 
    export YARN_HOME=$HADOOP_HOME 
      
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
    export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop 
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
      
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native 
      
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

      

    参考文献

    [1] hdfs-site.xml:http://www.21ops.com/front-tech/10744.html  

    [2] yarn-site.xml: http://www.aboutyun.com/thread-10572-1-1.html 评论也值得参考

    仅参考这两篇配置后报错:

    15/07/17 13:58:55 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at hadoop001/**.**.**.**:8020. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.

    再参考

    [3]http://www.cnblogs.com/meiyuanbao/p/3545929.html (没有做到Yarn的HA)

    发现需要在hdfs-site.xml添加配置:

    <property> 
    <name>dfs.ha.automatic-failover.enabled.appcluster</name> 
    <value>true</value> 
    </property> 
  • 相关阅读:
    -bash: fork: Cannot allocate memory 问题的处理
    Docker top 命令
    docker常见问题修复方法
    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
    What's the difference between encoding and charset?
    hexcode of é î Latin-1 Supplement
    炉石Advanced rulebook
    炉石bug反馈
    Sidecar pattern
    SQL JOIN
  • 原文地址:https://www.cnblogs.com/captainlucky/p/4654923.html
Copyright © 2011-2022 走看看