zoukankan      html  css  js  c++  java
  • Hadoop HA架构搭建

    Hadoop HA架构搭建

    共七台服务器,节点角色分配如下:

    192.168.133.21 (BFLN-01):namenode  zookeeper  journalnade DFSZKFailoverController
    192.168.133.23 (BFLN-02):namenode resourcemanager zookeeper  journalnade DFSZKFailoverController
    192.168.133.24 (BFLN-03):resourcemanager zookeeper  journalnade DFSZKFailoverController
    192.168.133.25 (BFLN-04):datanode,nodemanager
    192.168.133.26 (BFLN-05):datanode,nodemanager
    192.168.133.27 (BFLN-06):datanode,nodemanager
    192.168.133.28 (BFLN-07):datanode,nodemanager

    HA优势:双namedata和resourcemanager能防止hadoop核心组件单点故障导致集群不可用情况的发生。

    配置步骤:

    环境配置

    1、集群间需实现时间同步:

     ntpdate

    2、配置7台服务器的主机名解析/etc/hosts(每台都要配置):

    192.168.133.21  BFLN-01
    192.168.133.23  BFLN-02
    192.168.133.24  BFLN-03
    192.168.133.25  BFLN-04
    192.168.133.26  BFLN-05
    192.168.133.27  BFLN-06
    192.168.133.28  BFLN-07

    3、配置ssh服务/etc/ssh/sshd.conf

    StrictHostKeyChecking no

    UserKnownHostsFile /dev/null

    不然启动hdfs服务的时候可能会异常:

    Starting namenodes on [BFLN-01 BFLN-02]
    The authenticity of host 'BFLN-02 (192.168.133.23)' can't be established.
    ECDSA key fingerprint is 79:d1:ec:82:d3:1c:50:8a:17:c2:2d:f0:87:20:53:44.
    Are you sure you want to continue connecting (yes/no)? The authenticity of host 'BFLN-01 (192.168.133.21)' can't be established.
    ECDSA key fingerprint is 30:75:04:10:93:d2:57:d7:3d:b1:cc:31:92:30:1a:a1.
    Are you sure you want to continue connecting (yes/no)? yes

    4、每台服务器实现ssh无密钥认证,包括本机与本机的免密钥认证:

    ssh-keygren :生成一对密钥

    ssh-copy-id : 把公钥发给对方服务器

    5、配置安装JAVA环境并配置JAVA和hadoop环境变量:

    export JAVA_HOME=/usr/java/jdk1.8.0_51/
    

    export HADOOP_HOME=/opt/hadoop-spark/hadoop/hadoop-2.9.1
    

    PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    安装zookeeper集群:

    7、解压zookeeper压缩包。

    8、修改zookeeper配置文件:

    # The number of milliseconds of each tick
    

    tickTime=2000
    

    # The number of ticks that the initial
    

    # synchronization phase can take
    

    initLimit=10
    

    # The number of ticks that can pass between
    

    # sending a request and getting an acknowledgement
    

    syncLimit=5
    

    # the directory where the snapshot is stored.
    

    # do not use /tmp for storage, /tmp here is just
    

    # example sakes.
    

    dataDir=/data/zookeeper
    

    # the port at which the clients will connect
    

    clientPort=2181
    

    server.1=192.168.133.21:2888:3888
    

    server.2=192.168.133.23:2888:3888
    

    server.3=192.168.133.24:2888:3888
    

    # the maximum number of client connections.
    

    # increase this if you need to handle more clients
    

    #maxClientCnxns=60
    

    #
    

    # Be sure to read the maintenance section of the
    

    # administrator guide before turning on autopurge.
    

    #
    

    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    

    #
    

    # The number of snapshots to retain in dataDir
    

    #autopurge.snapRetainCount=3
    

    # Purge task interval in hours
    

    # Set to "0" to disable auto purge feature
    

    9、zookeeper数据路径文件下添加个代表zookeeper节点id的myid文件(本配置文件的数据路径为/data/zookeeper,节点id分别为1,2,3)

    10、启动zookeeper集群

    ./zkServer.sh start

    安装配置Hadoop-HA:

    11、下载hadoop-spark压缩包,解压,尽量保持7台服务器的hadoop安装路径是一致的。

    在192.168.133.21上配置:

    cd $HADOOP_HOME/etc/hadoop/

    vi core-site.xml

    <configuration>
    

        <property>
    

            <name>fs.defaultFS</name>
    

            <value>hdfs://BFLN</value>   <!--#BFLN为nodename集群的代理名字,此名字要和hdfs-site.xml配置的dfs.nameservices集群名字一致-->
    

        </property>
    

        <property>
    

            <name>hadoop.tmp.dir</name>
    

            <value>/data/hadoop-spark/hadoop/tmp</value>    <!--#指定hdfs目录-->
    

        </property>
    

     <property>
    

          <name>ha.zookeeper.quorum</name>
    

          <value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value>  <!--配置zookeeper集群的地址-->
    

     </property>
    

    </configuration>
    

     

    vi hdfs-site.xml

    <configuration>
    

        <!-- #BFLN为nodename集群的代理名字,此名字要和core-site.xml配置的fs.defaultFS集群名字一致 -->
    

        <property>
    

            <name>dfs.nameservices</name>
    

            <value>BFLN</value>
    

        </property>
    

        
    

        <!-- BFLN集群下有两个namenode节点,分别为BFLN1,BFLN2 -->
    

        <property>
    

           <name>dfs.ha.namenodes.BFLN</name>
    

           <value>BFLN1,BFLN2</value>
    

        </property>
    

        
    

        <!-- 配置namenode第一节点的rpc通信端口 -->
    

        <property>
    

           <name>dfs.namenode.rpc-address.BFLN.BFLN1</name>
    

           <value>BFLN-01:9000</value>
    

        </property>
    

        
    

        <!-- 配置namenode第一节点的http通信端口 -->
    

        <property>
    

            <name>dfs.namenode.http-address.BFLN.BFLN1</name>
    

            <value>BFLN-01:50070</value>
    

        </property>
    

        
    

        <!-- 配置namenode第二节点的rpc通信端口 -->
    

        <property>
    

            <name>dfs.namenode.rpc-address.BFLN.BFLN2</name>
    

            <value>BFLN-02:9000</value>
    

        </property>
    

        
    

        <!-- 配置namenode第二节点的http通信端口 -->
    

        <property>
    

            <name>dfs.namenode.http-address.BFLN.BFLN2</name>
    

            <value>BFLN-02:50070</value>
    

        </property>
    

        
    

        <!-- 配置journalnade互连的地址及端口,官网建议journalnade节点数为奇数 -->
    

        <property>
    

            <name>dfs.namenode.shared.edits.dir</name>
    

            <value>qjournal://BFLN-01:8485;BFLN-02:8485/BFLN</value>
    

        </property>
    

        
    

        <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
    

        <property>
    

              <name>dfs.journalnode.edits.dir</name>
    

              <value>/data/hadoop-spark/hadoop/tmp/jn</value>
    

        </property>
    

        
    

        <!-- 开启NameNode故障时自动切换 -->
    

        <property>
    

              <name>dfs.ha.automatic-failover.enabled</name>
    

              <value>true</value>
    

        </property>
    

        
    

        <!--配置失败自动切换实现方式-->
    

        <property>
    

                <name>dfs.client.failover.proxy.provider.BFLN</name>
    

                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    

        </property>
    

        
    

        <!--配置当namenode出现脑裂时,hdfs对其处理的方式,sshfenc会自动通过ssh到old-active将其杀掉,将standby切换为active-->
    

        <property>
    

                 <name>dfs.ha.fencing.methods</name>
    

                 <value>sshfence</value>
    

        </property>
    

     

        <!--配置HA namenode通信公钥的地址-->
    

        <property>
    

                <name>dfs.ha.fencing.ssh.private-key-files</name>
    

                <value>/root/.ssh/id_rsa</value>
    

        </property>
    

     

        <!--配置启动集群代理,如果此选项没有配置,后期启动的时候hadoop会把集群名称BFLN当成主机名与之通信,导致报错-->
    

        <property> 
    

            <name>dfs.client.failover.proxy.provider.BFLN</name>
    

            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    

        </property>
    

     

        <!--配置副本数-->
    

        <property>
    

            <name>dfs.replication</name>
    

            <value>3</value>
    

        </property> 
    

     

        <!--配置是否检查权限-->
    

        <property>
    

            <name>dfs.permissions</name>
    

            <value>false</value>
    

        </property>
    

    </configuration>
    


    vi yarn-site.conf

    <configuration>
    

      <!-- 开启resourcemanager HA服务,默认是false -->
    

      <property>
    

        <name>yarn.resourcemanager.ha.enabled</name>
    

        <value>true</value>
    

      </property>
    

      <!-- 开启RM重启的功能,作用:当yarn中有任务在跑时,如果rm宕机,设置成ture,rm重启时会恢复原来没有跑完的application -->
    

      <property>
    

        <name>yarn.resourcemanager.recovery.enabled</name>
    

        <value>true</value>
    

      </property>
    

     

      <!--  配置RM集群ID  -->
    

      <property>
    

        <name>yarn.resourcemanager.cluster-id</name>
    

        <value>BFLN-yarn</value>
    

      </property>
    

     

     

      <!--RM集群下的两个RM节点名称  -->
    

      <property>
    

        <name>yarn.resourcemanager.ha.rm-ids</name>
    

        <value>BFLN-yarn1,BFLN-yarn2</value>
    

      </property>
    

     

     

      <!--  BFLN-yarn1节点的地址  -->
    

      <property>
    

        <name>yarn.resourcemanager.hostname.BFLN-yarn1</name>
    

        <value>BFLN-02</value>
    

      </property>
    

     

     

      <!--  BFLN-yarn2节点的地址  -->
    

      <property>
    

        <name>yarn.resourcemanager.hostname.BFLN-yarn2</name>
    

        <value>BFLN-03</value>
    

      </property>
    

     

     

      <!-- zookeeper集群的地址  -->
    

      <property>
    

        <name>yarn.resourcemanager.zk-address</name>
    

        <value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value>
    

      </property>
    

     

     

      <!-- 用于状态存储的类,默认是基于Hadoop 文件系统的实现(FileSystemStateStore)  -->
    

      <property>
    

        <name>yarn.resourcemanager.store.class</name>
    

        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    

      </property>
    

     

      <!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序  -->
    

      <property>
    

        <name>yarn.nodemanager.aux-services</name>
    

        <value>mapreduce_shuffle</value>
    

        </property>
    

    </configuration>
    


    vi slaves:

    (配置datanode节点)

    192.168.133.25
    

    192.168.133.26
    

    192.168.133.27
    

    192.168.133.28
    

     

    vi hadoop-env.sh:

    export JAVA_HOME=/usr/java/jdk1.8.0_51/ 
    

     

    12:至此,所有的配置以及配置完成,需要将这几个文件复制发送给其他服务器。

    启动HDFS服务:

    注意:启动顺序很重要,顺序错了会导致后期频繁报错!

    1、在BFLN-01上启动journalnode,命令:./sbin/hadoop-daemon.sh start journalnode # 启动 journalnode

    2、在BFLN-01上格式化namenode,命令:./bin/hdfs namenode -format  # 格式化namemode路径

    3、在BFLN-01上注册zookeeper,命令:./bin/hdfs zkfc -formatZK    # 向zookeeper集群注册hdfs

    4、在BFLN-01上启动namenode,命令:./sbin/start-dfs.sh   # 启动hdfs服务,注意,此时只会启动BFLN-01上的namenode

    5、在BFLN-02上同步namenode,命令:./bin/hdfs namenode -bootstrapStandby  # BFLN-02节点的namenode从BFLN-01上的namenode同步元数据。

    6、在BFLN-02上启动namenode,命令:./sbin/hadoop-daemon.sh start namenode   # 在BFLN-02上启动namenode节点

    7、在BFLN-02上启动resourcemanager,命令:./sbin/start-yarn.sh  #启动RM,NM服务

    8、在0BFLN-02上启动resourcemanager,命令:./sbin/yarn-daemon.sh start resourcemanager  #启动备用RM服务。

     

    测试:kill一个为active的namenode/resourcemanager节点,查看另外一个standby节点是否转化成active节点:

    查看namenode节点状态的命令:

    ./bin/hdfs  haadmin -getServiceState BFLN1

    ./bin/hdfs  haadmin -getServiceState BFLN2

    查看resourcemanager节点状态的命令:

    ./bin/yarn  rmadmin -getServiceState BFLN-yarn1

    ./bin/yarn  rmadmin -getServiceState BFLN-yarn2

    如果kill active节点后standby节点无法切换成active节点,可能系统需要安装一个软件:

    psmisc

     

     

  • 相关阅读:
    QT自定义控件插件化简要概述
    wildfly9 配置SSL单向认证/https
    wildfly-9.0.2 web项目部署详细步骤
    SQL Server 2008 数据库日志文件丢失处理方法
    win7 64位系统 pl/sql 无法解析指定的连接标识符解决办法
    mybatis 应用参考
    去除浏览器下jquey easyui datagrid、combotree 缓存问题
    java 页面url传值中文乱码的解决方法
    jasperreports-5.6 + jaspersoftstudio-5.6 生成pdf 文件中文无法正常显示问题
    HTML5实现在线抓拍
  • 原文地址:https://www.cnblogs.com/hel7512/p/12350634.html
Copyright © 2011-2022 走看看