zoukankan      html  css  js  c++  java
  • Hadoop集群搭建

    一、hdfs安装:

    1、上传hadoop安装包到hdp-01

    2、修改配置文件

    要点提示

    核心配置参数:

    1)         指定hadoop的默认文件系统为:hdfs

    2)         指定hdfs的namenode节点为哪台机器

    3)         指定namenode软件存储元数据的本地目录

    4)         指定datanode软件存放文件块的本地目录

    hadoop的配置文件在:/root/apps/hadoop安装目录/etc/hadoop/

    1) 修改hadoop-env.sh

    export JAVA_HOME=/root/apps/jdk1.8.0_60

    2) 修改core-site.xml

    <configuration>

    <property>

    <name>fs.defaultFS</name>

    <value>hdfs://hdp-01:9000</value>

    </property>

    </configuration>

    3) 修改hdfs-site.xml

    <configuration>

    <property>

    <name>dfs.namenode.name.dir</name>

    <value>/root/dfs/name</value>

    </property>

    <property>

    <name>dfs.datanode.data.dir</name>

    <value>/root/dfs/data</value>

    </property>

    </configuration>

     4) 拷贝整个hadoop安装目录到其他机器

    scp -r /root/apps/hadoop-2.8.0  hdp-02:/root/apps/

    scp -r /root/apps/hadoop-2.8.0  hdp-03:/root/apps/

    scp -r /root/apps/hadoop-2.8.0  hdp-04:/root/apps/

    5) 启动HDFS

    所谓的启动HDFS,就是在对的机器上启动对的软件

    要点

    提示:

    要运行hadoop的命令,需要在linux环境中配置HADOOP_HOME和PATH环境变量

    vi /etc/profile

    export JAVA_HOME=/root/apps/jdk1.8.0_60

    export HADOOP_HOME=/root/apps/hadoop-2.8.0

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

     首先,初始化namenode的元数据目录

    要在hdp-01上执行hadoop的一个命令来初始化namenode的元数据存储目录

    hadoop namenode -format

    l  创建一个全新的元数据存储目录

    l  生成记录元数据的文件fsimage

    l  生成集群的相关标识:如:集群id——clusterID

    然后,启动namenode进程(在hdp-01上)

    hadoop-daemon.sh start namenode

    启动完后,首先用jps查看一下namenode的进程是否存在

    然后,在windows中用浏览器访问namenode提供的web端口:50070

    http://hdp-01:50070

    然后,启动众datanode们(在任意地方)

    hadoop-daemon.sh start datanode

    6) 用自动批量启动脚本来启动HDFS

    1)         先配置hdp-01到集群中所有机器(包含自己)的免密登陆

    2)         配完免密后,可以执行一次  ssh 0.0.0.0

    3)         修改hadoop安装目录中/etc/hadoop/slaves(把需要启动datanode进程的节点列入)

    hdp-01

    hdp-02

    hdp-03

    hdp-04

    4)         在hdp-01上用脚本:start-dfs.sh 来自动启动整个集群

    5)         如果要停止,则用脚本:stop-dfs.sh

    二、yarn安装:

    yarn集群中有两个角色:

    主节点:Resource Manager  1台

    从节点:Node Manager   N台

    Resource Manager一般安装在一台专门的机器上

    Node Manager应该与HDFS中的data node重叠在一起

     (1)修改slaves文件:配置好slaves的ip地址,用于批量启动nodemanager

    (2)修改 hadoop-env.sh,添加JAVA_HOME

    (3)修改配置文件:

    yarn-site.xml

    <configuration>

    <!-- Site specific YARN configuration properties -->

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>

    <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
    </property>

    <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster-yarn1</value>
    </property>

    <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
    </property>

    <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>x.xxx.xxx.xxx</value>
    </property>

    <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>x.xxx.xxx.xxx</value>
    </property>

    <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>x.xxx.xxx.xxx:2181,x.xxx.xxx.xxx:2181,x.xxx.xxx.xxx:2181</value>
    </property>

    <!--启用自动恢复-->
    <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
    </property>

    <!--指定resourcemanager的状态信息存储在zookeeper集群-->
    <property>
    <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>

    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>29696</value>
    </property>

    <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>16</value>
    </property>

    <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>29696</value>
    </property>
    <property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
    </property>
    <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    </property>

    </configuration>

    然后复制到每一台机器上 

    yarn容量调度配置

    <configuration>
    
      <property>
        <name>yarn.scheduler.capacity.maximum-applications</name>
        <value>10</value>
        <description>
          Maximum number of applications that can be pending and running.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>0.1</value>
        <description>
          Maximum percent of resources in the cluster which can be used to run
          application masters i.e. controls number of concurrent running
          applications.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
        <description>
          The ResourceCalculator implementation to be used to compare
          Resources in the scheduler.
          The default i.e. DefaultResourceCalculator only uses Memory while
          DominantResourceCalculator uses dominant-resource to compare
          multi-dimensional resources such as Memory, CPU etc.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>prod,dev</value>
        <description>
          The queues at the this level (root is the root queue).
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.capacity</name>
        <value>80</value>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.dev.capacity</name>
        <value>20</value>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.user-limit-factor</name>
        <value>1</value>
        <description>
          prod queue user limit a percentage from 0.0 to 1.0.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.dev.user-limit-factor</name>
        <value>1</value>
        <description>
          dev queue user limit a percentage from 0.0 to 1.0.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.maximum-capacity</name>
        <value>100</value>
        <description>
          The maximum capacity of the prod queue.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.dev.maximum-capacity</name>
        <value>100</value>
        <description>
          The maximum capacity of the dev queue.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.state</name>
        <value>RUNNING</value>
        <description>
          The state of the prod queue. State can be one of RUNNING or STOPPED.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.dev.state</name>
        <value>RUNNING</value>
        <description>
          The state of the dev queue. State can be one of RUNNING or STOPPED.
        </description>
      </property>
    
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.acl_submit_applications</name>
        <value>*</value>
        <description>
          The ACL of who can submit jobs to the default queue.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.root.prod.acl_administer_queue</name>
        <value>*</value>
        <description>
          The ACL of who can administer jobs on the default queue.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.node-locality-delay</name>
        <value>40</value>
        <description>
          Number of missed scheduling opportunities after which the CapacityScheduler
          attempts to schedule rack-local containers.
          Typically this should be set to number of nodes in the cluster, By default is setting
          approximately number of nodes in one rack which is 40.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.queue-mappings</name>
        <value></value>
        <description>
          A list of mappings that will be used to assign jobs to queues
          The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
          Typically this list will be used to map users to queues,
          for example, u:%user:%user maps all users to queues with the same name
          as the user.
        </description>
      </property>
    
      <property>
        <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
        <value>false</value>
        <description>
          If a queue mapping is present, will it override the value specified
          by the user? This can be used by administrators to place jobs in queues
          that are different than the one specified by the user.
          The default is false.
        </description>
      </property>
    
    </configuration>

    (4)启动yarn集群

    然后在hdp-04上,修改hadoop的slaves文件,列入要启动nodemanager的机器,然后,就可以用脚本启动yarn集群:

    sbin/start-yarn.sh

    停止:

    sbin/stop-yarn.sh

     单独启动另一个resourcemanager:yarn-daemon.sh start resourcemanager

    启动完成后,可以在windows上用浏览器访问resourcemanager的web端口:

    http://hdp-04:8088

    看resource mananger是否认出了所有的node manager节点

    (5)查看两个resourcemanager的状态,验证高可用

    ./bin/yarn rmadmin -getServiceState rm1

    active

    ./bin/yarn rmadmin -getServiceState rm2

    standby

    三、相关环境问题

    1、修改配置文件后重启hdfs集群,出现了两个namenode都是standby的问题。

    经过不断地排查,发现在自己进行关闭和重启namenode的组件的时候,没有通过pip文件正常关闭,只能挨个关闭,这个也不是重点。

    经过修改pip文件的路径,解决了这个批量关闭启动的问题,原因是/tmpe目录系统会定期清理,导致进程号对不上了。

    批量启动的时候发现zkfc一直启动不了,手动启动,还是没解决问题。

    尝试重新格式化zookeeper上的namenode信息。关闭再启动hdfs集群,注意zkfc的启动状态,这样成功了。

    2、hbase上的一个表获取hdfs上的数据块失败,主要也是由于自己新增了datanode,有两个没启动,手动启动一下,问题还是没解决。

    执行hbase检查命令,还是那个表的region找不到。直接强制删除表,这样zookeeper上的meat信息都会删掉,再重新建表,问题解决。

  • 相关阅读:
    ViewPager+Fragmrnt最简单结合方法
    Microsoft SQL Server Version List(SQL Server 版本)
    hdu 2795 Billboard(线段树单点更新)
    面向对象程序设计的思想的长处
    iOS 友盟分享
    使用Broadcast实现android组件之间的通信
    jquery ui 分页插件 传入后台的连个參数名
    android adb常见问题的解决方法!
    UVa 11015
    优秀程序猿学习方法
  • 原文地址:https://www.cnblogs.com/guoyu1/p/13321073.html
Copyright © 2011-2022 走看看