zoukankan      html  css  js  c++  java
  • Hadoop集群搭建

    Hadoop集群搭建

    1.创建三台虚拟机,本次使用的是centos7,关闭所有机器的防火墙。

    1. 关闭防火墙:

      [hadoop@localhost ~]$ systemctl stop firewalld.service
      
    2. 修改主机名,方便对虚拟机进行区分。

      主节点名称设为master,其他两个节点名称设置为slave1,slave2。

      查看主机名并修改:

      [hadoop@localhost ~]$ hostname
      localhost.localdomain
      [hadoop@localhost ~]$ hostnamectl set-hostname master
      [hadoop@localhost ~]$ hostname
      master
      

      修改完重启机器:

      [hadoop@localhost ~]$ reboot
      

    2.编辑ip与hostname的映射表 /etc/hosts

    将所有机器的ip与hostname的对应关系添加到hosts文件中,所有的节点都要添加,相当于DNS

    172.16.46.161	master
    172.16.46.163	slave1
    172.16.46.162	slave2
    

    3.ssh免密登陆

    请查看ssh免密登陆

    4.安装jdk

    请查看jdk安装

    5.安装hadoop

    下载地址hadoop,下载.tar.gz格式的包

    解压

    [hadoop@master ~]$ tar -zxvf hadoop-2.9.2.tar.gz
    

    设置环境变量,在/etc/profile最下方添加

    export HADOOP_HOME=/home/hadoop/hadoop-2.9.2
    export PATH=.:$HADOOP_HOME/bin:$PATH
    

    加载环境变量

    source /etc/profile
    

    验证hadoop是否安装成功

    [hadoop@master ~]$ hadoop
    Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME            run the class named CLASSNAME
     or
      where COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
                           note: please use "yarn jar" to launch
                                 YARN applications, not this command.
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
                           Hadoop jar and the required libraries
      credential           interact with credential providers
      daemonlog            get/set the log level for each daemon
      trace                view and modify Hadoop tracing settings
    
    Most commands print help when invoked w/o parameters.
    

    出现上述输出,证明安装成功。

    如果未输出,则可以尝试重新启动机器加载环境变量。

    6.配置hadoop

    进入hadoop安装目录

    6.1配置etc/hadoop/hadoop-env.sh

    修改JAVA_HOME为jdk安装目录的绝对路径

    6.2配置etc/hadoop/core-site.xml

    设置hdfs的Namenode地址,设置hadoop运行时临时文件的存储路径

    <configuration>
        <property>
            <name>fs.defaultFS</name>
           <value>hdfs://172.16.46.161:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
           <value>file:/home/hadoop/hadoop-2.9.2/tmp</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>131702</value>
        </property>
    </configuration>
    

    如果没有配置hadoop.tmp.dir,默认存储在/tmp/hadoop-username目录下

    6.3配置etc/hadoop/hdfs-site.xml

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>4</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/hadoop-2.9.2/hdfs/name</value>
        <final>true</final>
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/hadoop-2.9.2/hdfs/data</value>
        <final>true</final>
      </property>
      <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>172.16.46.161:9001</value>
      </property>
      <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>dfs.permissions</name>
        <value>false</value>
      </property>
    </configuration>
    

    6.4配置etc/hadoop/mapred-site.xml

    将mapper-site.xml.template重命名为mapper-site.xml

    [hadoop@master hadoop-2.9.2]$ mv etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
    

    修改mapper-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    

    指定mr运行在yarn上

    6.5配置etc/hadoop/slaves

    删除原有内容,写入所有节点的ip地址

    172.16.46.161
    172.16.46.163
    172.16.46.162
    

    6.6配置etc/hadoop/yarn-env.sh和etc/hadoop/mapred-env.sh

    将JAVA_HOME配置成jdk安装目录的绝对路径

    6.7配置etc/hadoop/yarn-site.xml

    <configuration>
    
    <!-- Site specific YARN configuration properties -->
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>172.16.46.161:18040</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>172.16.46.161:18030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>172.16.46.161:18088</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>172.16.46.161:18025</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>172.16.46.161:18141</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
    </configuration>
    

    7.将修改后的配置拷贝到其他节点

    scp -r etc/ hadoop@slave1:~/hadoop-2.9.2/
    

    8.启动集群

    8.1格式化namenode

    集群搭建好了,将磁盘格式化一下,后面要存数据,避免有脏数据,同时创建一些东西。

    只有第一次启动需要格式化

    namenode设置在哪个节点上就在哪个节点上执行下面的命令

    bin/hdfs namenode -format
    

    8.2启动集群前必须保证namenode和datanode已经启动

    单节点启动namenode

    [hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start namenode
    starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
    [hadoop@master hadoop-2.9.2]# jps
    3877 NameNode
    3947 Jps
    

    单节点启动datanode

    [hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start datanode
    starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
    [hadoop@master hadoop-2.9.2]# jps
    3877 NameNode
    4060 Jps
    3982 DataNode
    

    在 其他节点 依次启动 datanode

    这样启动hdfs很麻烦,而且发现SecondaryNameNode并没有启动,素有hadoop提供了其他的启动方式

    一步启动hdfs集群:Namenode、Datanode、SecondaryNameNode

    [hadoop@master hadoop-2.9.2]$ sbin/start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [master]
    master: starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-namenode-master.out
    172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
    172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
    172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
    Starting secondary namenodes [master]
    master: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-secondarynamenode-master.out
    starting yarn daemons
    starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-resourcemanager-master.out
    172.16.46.163: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave1.out
    172.16.46.162: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave2.out
    172.16.46.161: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-master.out
    [hadoop@master hadoop-2.9.2]$ jps
    4192 Jps
    3237 NameNode
    3543 SecondaryNameNode
    3374 DataNode
    

    8.3启动yarn

    看yarn要设置在哪个节点,就在哪个节点执行下面的命令。

    [hadoop@master hadoop-2.9.2]# sbin/start-yarn.sh 
    starting yarn daemons
    starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-master.out
    172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
    172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
    172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
    [hadoop@master hadoop-2.9.2]$ jps
    4192 Jps
    3237 NameNode
    3814 NodeManager
    3543 SecondaryNameNode
    3374 DataNode
    3695 ResourceManager
    

    ResourceManager和NodeManager都启动了。

    8.4hadoop集群启动成功,包括hdfs、yarn、mapreduce

    上面这种启动方式很麻烦,hadoop还提供了一键启动和一键关闭。

    sbin/start-all.sh 
    sbin/stop-all.sh
    

    9.远程访问hadoop集群

    访问hdfs:http://172.16.46.161:50070/

    10.简单测试

    在hdfs文件系统中创建目录,两种方式。

    bin/hdfs dfs -mkdir -p /usr/input
    bin/hadoop fs -mkdir -p /usr/output
    

    集群部署规划

    上述步骤已经将hadoop集群搭建完成,但是我们将Namenode、SecondaryNamenode、ResourceManager都部署到一台机器上。

    这样会增大服务器的压力,而且组件的资源都被压缩了。所以可以部署到三台机器。

    hadoop11 hadoop12 hadoop13
    HDFS NameNode、DataNode DataNode SecondaryNameNode
    YARN NodeManager ResourceManager、NodeManager NodeManager

    三个核心组件分布到三台机器。

    异常记录

    • 找不到jps

      jps是查看java进程的,找不到说明java没有装好,需要设置java环境变量

    • 重启后无法启动datanode

      通常在第一次搭建时可以成功,但是重启后不能成功,datanode 无法启动,原因是 datanode 无法被 namenode 识别。

      namenode 在 format 时会形成两个标识,blockPoolId 和 clusterId;

      当有 datanode 加入时,会获取这两个标识作为从属 这个 namenode 的标识,这样才能组成集群;

      一旦 namenode 被重新 format,会更新这两个标识;

      然而 datanode 还拿原来的标识过来接头,自然被拒之门外

      解决方法:删除所有节点的数据,即 tmp,包括 namenode 的数据,重新格式化,再启动

    • 各种操作都会有如下 警告

      WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      

      无需理会,只是警告,确实想解决,参考 解决办法

  • 相关阅读:
    学习笔记
    核心网概要学习
    python基础知识
    python_基础知识_py运算符
    python_基础知识
    将博客搬至CSDN
    poj1182测试数据过了,但A不了,暂时放在这,以后再看
    score——3354
    杭电1241
    杭电1010(WA)
  • 原文地址:https://www.cnblogs.com/technicianafei/p/13852631.html
Copyright © 2011-2022 走看看