zoukankan      html  css  js  c++  java
  • HDFS集群YARN集群高可用配置随笔

    集群HDFS/YARN高可用配置(zookeeper):

    [hadoop@master01 hadoop]$ vi core-site.xml
    配置:
    ----
    <configuration>

    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1</value>
    </property>

    <property>
    <name>ha.zookeeper.quorum</name>
    <value>slaver01:2181,slaver02:2181,slaver03:2181</value>
    </property>

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/software/hadoop-2.7.3/work</value>
    </property>

    </configuration>

    [hadoop@master01 hadoop]$ vi hdfs-site.xml
    配置含有(qjournal集群):
    ------
    <configuration>

    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>

    <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
    </property>

    <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
    </property>

    <property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>master01:9000</value>
    </property>

    <property>
    <name>dfs.namenode.http-address.ns1.nn1</name>
    <value>master01:50070</value>
    </property>

    <property>
    <name>dfs.namenode.rpc-address.ns1.nn2</name>
    <value>master02:9000</value>
    </property>

    <property>
    <name>dfs.namenode.http-address.ns1.nn2</name>
    <value>master02:50070</value>
    </property>

    <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://slaver01:8485;slaver02:8485;slaver03:8485/QJID</value>
    </property>

    <!--配置Qjournal集群节点在本地存放数据的位置-->

    <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/software/hadoop-2.7.3/QJMateData</value>
    </property>

    <!--开启NN节点进程断掉后自动切换-->

    <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    </property>

    <!--配置故障转移代理类-->

    <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!--确认断掉后执行ensure.sh脚本-->
    <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
    sshfence
    shell(/software/hadoop-2.7.3/ensure.sh)
    </value>
    </property>

    <!--公私-->
    <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
    </property>

    <!--判断超时-->
    <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
    </property>

    </configuration>

    <!--配置YARN集群高可用-->
    [hadoop@master01 hadoop]$ vi yarn-site.xml
    配置:
    <!--HA Config-->
    <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
    </property>

    <!--集群ID值可以自己取-->
    <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>RMHA</value>
    </property>
    <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
    </property>

    <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>master01</value>
    </property>

    <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>master02</value>
    </property>
    <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>slave01:2181,slave02:2181,slave03:2181</value>
    </property>


    <!--将以上配置复制到所有节点上-->
    [hadoop@master01 hadoop]$ scp -r core-site.xml hdfs-site.xml yarn-site.xml master02:/software/hadoop-
    2.7.3/etc/hadoop/


    在slaver节点上启动zookeeper集群:
    --------
    [hadoop@slaver01 hadoop-2.7.3]$ cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps


    [在slaver节点上启动Qjournal集群:
    ----------
    1、在所有slave节点上配置ZK集群
    [hadoop@slaver01 hadoop-2.7.3]$ hadoop-daemon.sh start journalnode

    2、格式化HDFS:
    [hadoop@master01 software]$ hdfs namenode -format

    3、拷贝work到master02对应节点下:
    [hadoop@master01 software]$ scp -r hadoop-2.7.3/work/ master02:/software/hadoop-2.7.3/

    4、格式化ZKFC客户端:
    [hadoop@master01 software]$ hdfs zkfc -formatZK

    5、启动HDFS集群:
    [hadoop@master01 software]$ start-dfs.sh

    6、启动YARN集群(在master02上手动启动RM进程):
    [hadoop@master01 hadoop]$ start-yarn.sh
    [hadoop@master02 ~]$ yarn-daemon.sh start resourcemanager

    7、查看master那个正在服务(使用web终端查看端口:50070):
    [hadoop@master01 software]$ hdfs haadmin -getServiceState nn1
    [hadoop@master01 hadoop]$ yarn rmadmin -getServiceState rm1

    小结:
    HDFS集群的高可用:
    ---------------
    客户端连接zookeeper管理下的处于active状态的master节点上的Datenode进程进行数据交互,如果master节点断掉
    ,处于standby状态的master节点自动接管并告知客户端,数据交互正常进行。
    YARN集群的高可用:
    ---------------
    客户端提交一个Job连接处于active状态下的master节点上的Resourcemanager进程进行数据交互,如果此节点断掉,
    处于standby状态的master节点只能接管之后的Job处理,对于当前的Job以失败结束。

    <!--因各种原因不想使用高可用时的配置-->
    <!--开启NN节点进程断掉后故障自动转移改为false-->
    <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>false</value>
    </property>
    若出现开启的master节点处于standby状态执行,强行active状态,弊端:会使自动故障转移转为手动,之后的操作都只能用手
    动:
    hdfs haadmin -transitionToActive --forceactive nn1

    <!--裂脑状态-->
    -----master节点状态同时处于active或者standby状态!

  • 相关阅读:
    SP6779 GSS7
    P2218 [HAOI2007]覆盖问题
    day10-包的定义和内部类
    day09-final、多态、抽象类、接口
    day08-代码块和继承
    day07-变量,封装
    day05-方法、数组
    day04-switch、循环语句
    day03-运算符、键盘录入
    day02-基本概念
  • 原文地址:https://www.cnblogs.com/pandazhao/p/8074993.html
Copyright © 2011-2022 走看看