zoukankan      html  css  js  c++  java
  • Hadoop学习笔记

    Hadoop学习笔记-HA验证

    集群规划

    HOST NN NN JNN DN ZKFC ZK
    node01 * * *
    node02 * * * * *
    node03 * * * *
    node04 * *

    配置文件

    hdfs-site.xml

    • 由于是多个NameNode节点,所以不能单独直接配置NameNode对应的地址,相应的要配置一个映射集

    • dfs.nameservices - the logical name for this new nameservice (相当与自己给集合取的名称)

      例如

      <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
      </property>
      
    • dfs.ha.namenodes.[nameservice ID] - unique identifiers for each NameNode in the nameservic (对应的NameNode有哪些),例如

      <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2, nn3</value>
      </property>
      

      表明有三个NameNode节点,其分别为nn1,nn2,nn3 (官方说数量为2-5,推荐为3)

    • dfs.namenode.rpc-address.[nameservice ID].[name node ID] - the fully-qualified RPC address for each NameNode to listen on (每台NameNode所对应的监听RPC地址和端口号)

      例如

      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>node01:8020</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>node02:8020</value>
      </property>
      
    • dfs.namenode.http-address.[nameservice ID].[name node ID] - the fully-qualified HTTP address for each NameNode to listen on (Web地址以及端口号)

      <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>node01:9870</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>node02:9870</value>
      </property>
      
    • dfs.namenode.shared.edits.dir - the URI which identifies the group of JNs where the NameNodes will write/read edits (NameNode读写JN组的uri。配置JournalNode)

      <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
      </property>
      

      其中 /mycluster 在JournalNode上写入的目录,当以后搭建新的集群时,可以与当前集群共用一组JourNalNode,但是为了避免元数据覆盖,其后面的这一串要设定成不相同的目录

    • dfs.journalnode.edits.dir - the path where the JournalNode daemon will store its local state (JournalNode中文件的绝对路径)

      <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/var/bigdata/hadoop/ha/dfs/jn</value>
      </property>
      
    • dfs.client.failover.proxy.provider.[nameservice ID] - the Java class that HDFS clients use to contact the Active NameNode

      <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      
    • dfs.ha.fencing.methods - a list of scripts or Java classes which will be used to fence the Active NameNode during a failover (免密方式以及秘钥路径)

          <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
          </property>
      
          <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/root/.ssh/id_rsa</value>
          </property>
      
    • ZKFC配置 (automatic failover 自动失效转移)

       <property>
         <name>dfs.ha.automatic-failover.enabled</name>
         <value>true</value>
       </property>
      

    core-site.xml

    • fs.defaultFS - the default path prefix used by the Hadoop FS client when none is given

      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
      </property>
      
    • Configuring automatic failover (ZooKeeper 节点)

       <property>
         <name>ha.zookeeper.quorum</name>
         <value>node02:2181,node03:2181,znode04:2181</value>
       </property>
      

    初始化准备

    1. ssh免密

      • 启动start-dfs.sh的机器要将自己的公钥分发给其他的节点
      • 在HA模式下,每一个NameNode会启动ZKFC,ZKFC会用免密的ssh控制自己和其他NameNode节点的NameNode状态
    2. 先启动JN:hadoop-daemon.sh start journalnode

    3. 选择一个NN做格式化: hdfs namenode -format

    4. 启动这个格式化的NN,以备另一台同步:hadoop-daemon.sh start namenode

    5. 在另外一台机器中:hdfs namenode -bootstrapStandby

    6. 格式化Zookeeper:hdfs zkfc -formatZK

    7. start-dfs.sh

  • 相关阅读:
    Pandas注意事项&窍门
    Pandas稀疏数据
    Pandas IO工具
    (bc 1002)hdu 6016 count the sheep
    (bc 1001) hdu 6015 skip the class
    hdu 1874 畅通工程续(迪杰斯特拉优先队列,floyd,spfa)
    克鲁斯卡尔(并查集)hdu 1233
    克鲁斯卡尔算法(最短路算法详解)
    最小生成树(普利姆算法、克鲁斯卡尔算法)
    pair 对组
  • 原文地址:https://www.cnblogs.com/xun-/p/14337375.html
Copyright © 2011-2022 走看看