zoukankan      html  css  js  c++  java
  • Hadoop 2.7.2 集群安装配置

    一、虚拟机环境配置

    准备三台虚拟机:

    Linux系统

    IP

    hostname

    Centos 7

    192.168.107.2

    Hadoop01

    Centos 7

    192.168.107.3

    Hadoop02

    Centos 7

    192.168.107.4

    Hadoop03

    (1)下载安装VMware

    链接:https://pan.baidu.com/s/1a_9pW6-nesgl_GUINFfqsA 提取码:6b6b

    (2)在VMware上安装虚拟机

    (3)克隆虚拟机 (百度一下)

    (4)配置虚拟机

    4.1 修改克隆虚拟机的静态IP

    4.2 修改主机名

    4.3 关闭防火墙

    4.4 创建hadoop用户

    4.5 配置hadoop用户具有root权限

      修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行

      ## Allow root to run any commands anywhere

      root ALL=(ALL) ALL

      hadoop ALL=(ALL) ALL

    (5)在/opt目录下创建module、software文件夹

    [hadoop@hadoop01 ~]$ sudo mkdir -p /opt/module

    [hadoop@hadoop01 ~]$ sudo mkdir -p /opt/software

    (6)修改module、software文件夹的所有者为hadoop

    [hadoop@hadoop01 opt]$ sudo chown hadoop:hadoop /opt/module/ /opt/software/ 

    (7)配置SSH免密登录

    生成公钥和私钥:

    [hadoop@hadoop01 ~]$ ssh-keygen -t rsa

    然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)

    将公钥拷贝到要免密登录的目标机器上

    [hadoop@hadoop01 ~]$ ssh-copy-id hadoop01

    [hadoop@hadoop01 ~]$ ssh-copy-id hadoop02

    验证

    [hadoop@hadoop01 ~]$ ssh hadoop02

    Last login: Tue Oct 27 19:10:02 2020 from gateway

    [hadoop@hadoop02 ~]$

     

    (8)编写集群分发脚本xsync

    (a)在/home/hadoop目录下创建bin目录,并在bin目录下xsync创建文件,文件内容如下:

    [hadoop@hadoop01 ~]$ mkdir bin

    [hadoop@hadoop01 ~]$ cd bin/

    [hadoop@hadoop01 bin]$ touch xsync

    [hadoop@hadoop01 bin]$ vi xsync

    在该文件中编写如下代码

    #!/bin/bash
    
    #1 获取输入参数个数,如果没有参数,直接退出
    pcount=$#
    if((pcount==0)); then
    echo no args;
    exit;
    fi
    
    #2 获取文件名称
    p1=$1
    fname=`basename $p1`
    echo fname=$fname
    
    #3 获取上级目录到绝对路径
    pdir=`cd -P $(dirname $p1); pwd`
    echo pdir=$pdir
    
    #4 获取当前用户名称
    user=`whoami`
    
    #5 循环
    for((host=2; host<=3; host++)); do
            echo ------------------- hadoop$host --------------
            rsync -rvl $pdir/$fname $user@hadoop0$host:$pdir
    done

    (b)修改脚本 xsync 具有执行权限

    [hadoop@hadoop01 bin]$ chmod 777 xsync

    (c)调用脚本形式:xsync 文件名称

    [hadoop@hadoop01 bin]$ xsync /home/hadoop/bin

     注意:如果将xsync放到/home/hadoop/bin目录下仍然不能实现全局使用,可以将xsync移动到/usr/local/bin目录下。

     

    二、安装JDK

    (1)下载JDK

    链接:https://pan.baidu.com/s/1wuR2FQe_RYO5mBpAWytGLw 提取码:td3y

    (2)用FTP/SFTP工具将下载的JDK压缩包上传至/opt/software/

    (3)解压JDK到/opt/module目录下

    [hadoop@hadoop01 software]$ tar -zvxf jdk-8u144-linux-x64.tar.gz -C /opt/module/

    (4) 配置JDK

       4.1 获取JDK路径

    [hadoop@hadoop01 software]$ cd /opt/module/jdk1.8.0_144/

    [hadoop@hadoop01 jdk1.8.0_144]$ pwd

         /opt/module/jdk1.8.0_144

            4.2 编辑 /etc/profile文件source /etc/profile

              [hadoop@hadoop01 jdk1.8.0_144]$ sudo vi /etc/profile

                添加:

    #JAVA_HOME

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    export PATH=$PATH:$JAVA_HOME/bin

            4.3 让修改后的 /etc/profile文件生效

    [hadoop@hadoop01 jdk1.8.0_144]$ source /etc/profile

    4.4 测试JDK安装是否成功

    [hadoop@hadoop01 jdk1.8.0_144]$ java -version

    java version "1.8.0_144"

            

    三、安装Hadoop

    (1)下载Hadoop

    链接:https://pan.baidu.com/s/1Bb3h5OwAymfBtF0jKGwhXg  提取码:o061

    (2)用FTP/SFTP工具将下载的Hadoop压缩包上传至/opt/software/

    (3)解压Hadoop压缩包到/opt/module目录下

    [hadoop@hadoop01 software]$ tar -zvxf hadoop-2.7.2.tar.gz -C /opt/module/

    (4)将Hadoop添加到环境变量

            4.1 获取Hadoop_home路径

    [hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/

    [hadoop@hadoop01 hadoop-2.7.2]$ pwd

    /opt/module/hadoop-2.7.2

            4.2 编辑 /etc/profile 文件

    [hadoop@hadoop01 hadoop-2.7.2]$ vi /etc/profile

    添加:

    # Hadoop

    export HADOOP_HOME=/opt/module/hadoop-2.7.2

    export PATH=$PATH:$HADOOP_HOME/bin

    export PATH=$PATH:$HADOOP_HOME/sbin

            4.3 让修改后的 /etc/profile文件生效

      [hadoop@hadoop01 hadoop-2.7.2]$ source /etc/profile

    (5)Hadoop目录结构

    drwxr-xr-x. 2 hadoop hadoop   194 May 22  2017    bin

    drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    etc

    drwxr-xr-x. 2 hadoop hadoop   106 May 22  2017    include

    drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    lib

    drwxr-xr-x. 2 hadoop hadoop   239 May 22  2017    libexec

    -rw-r--r--. 1 hadoop hadoop 15429 May 22  2017   LICENSE.txt

    -rw-r--r--. 1 hadoop hadoop   101 May 22  2017    NOTICE.txt

    -rw-r--r--. 1 hadoop hadoop  1366 May 22  2017   README.txt

    drwxr-xr-x. 2 hadoop hadoop  4096 May 22  2017   sbin

    drwxr-xr-x. 4 hadoop hadoop    31 May 22  2017    share

    bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本

    etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件

    lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)

    sbin目录:存放启动或停止Hadoop相关服务的脚本

    share目录:存放Hadoop的依赖jar包、文档、和官方案例

    四、配置集群

    hadoop01(192.168.107.2)

    hadoop02(192.168.107.3)

    hadoop03(192.168.107.4)

    NameNode

    Resourcemanager

    SecondaryNameNode

    DataNode

    DataNode

    DataNode

    NodeManager

    NodeManager

    NodeManager



      [hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/

    (1)配置HDFS相关文件

    配置core-site.xml

    [hadoop@hadoop01 hadoop]$ vi core-site.xml

    在该文件中编写如下配置:

    <configuration>

      <property>    <!-- 指定HDFS中NameNode的地址 -->

         <name>fs.defaultFS</name>

         <value>hdfs://hadoop01:9000</value>

      </property>

      <property>    <!-- 指定Hadoop运行时产生文件的存储目录 -->

         <name>hadoop.tmp.dir</name>

         <value>/opt/module/hadoop-2.7.2/data/tmp</value>

      </property>

    </configuration>

    配置hadoop-env.sh

    [hadoop@hadoop01 hadoop]$ vi hadoop-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    配置hdfs-site.xml

    [hadoop@hadoop01 hadoop]$ vi hdfs-site.xml

      在该文件中编写如下配置:

     <configuration>

        <property>  <!--设置备份数量-->

         <name>dfs.replication</name>

         <value>3</value>

        </property>

        <property>  <!-- 指定HDFS中SecondaryNameNode的配置 -->

         <name>dfs.namenode.secondary.http-address</name>

         <value>hadoop03:9001</value>

        </property>

        <property>

         <name>dfs.permissions.enabled</name>

         <value>false</value>

        </property>

        <property> 

         <name>dfs.datanode.max.xcievers</name>

         <value>4096</value>

         <dedication> Datanode 有一个同时处理文件的上限,至少要有4096</dedication>

        </property>

        <property>  <!--设置为true,可以在浏览器中IP+port查看-->

         <name>dfs.webhdfs.enabled</name>

         <value>true</value>

        </property>

    </configuration>

     (2)配置MapReduce

    配置mapred-env.sh

    [hadoop@hadoop01 hadoop]$ vi mapred-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

      配置mapred-site.xml

        [hadoop@hadoop01 hadoop]$ cp mapred-site.xml.template mapred-site.xml

        [hadoop@hadoop01 hadoop]$ vi mapred-site.xml

        在该文件中编写如下配置:

    <configuration>

        <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

        </property>

        <property>  <!--配置实际的主机名和端口-->

           <name>mapreduce.jobhistory.address</name>

           <value>hadoop02:10020</value>

        </property>

        <property>

           <name>mapreduce.jobhistory.webapp.address</name>

           <value>hadoop02:19888</value>

        </property>

    </configuration>

    (3)配置Yarn

    配置yarn-env.sh

    [hadoop@hadoop01 hadoop]$ vi yarn-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    配置yarn-site.xml

    [hadoop@hadoop01 hadoop]$ vi yarn-site.xml

    在该文件中编写如下配置:

    <configuration>

        <property>  <!-- Reducer获取数据的方式 -->

            <name>yarn.nodemanager.aux-services</name>

            <value>mapreduce_shuffle</value>

        </property>

        <property>

           <name>yarn.log-aggregation-enable</name>

           <value>true</value>

        </property>

        <property>  <!--日志保存时间 默认保存3-7-->  

           <name>yarn.log-aggregation.retain-seconds</name>

           <value>86400</value>

        </property>

      <property>    <!-- 指定YARN的ResourceManager的地址 -->

         <name>yarn.resourcemanager.hostname</name>

         <value>hadoop02</value>

      </property>

      <property> <!--ResourceManager 对外web暴露的地址,可在浏览器查看-->  

         <name>yarn.resourcemanager.webapp.address</name>

         <value>hadoop02:8088</value>

      </property>

      <property> <!--ResourceManager 对ApplicationMaster暴露的地址--> 

         <name>yarn.resourcemanager.scheduler.address</name>

         <value>hadoop02:8030</value>

      </property>

      <property> <!--ResourceManager 对NodeManager暴露的地址-->

         <name>yarn.resourcemanager.resource-tracker.address</name> 

         <value>hadoop02:8031</value>

      </property>

      <property> <!--ResourceManager 对客户端暴露的地址-->

         <name>yarn.resourcemanager.address</name>

         <value>hadoop02:8032</value>

      </property>

      <property> <!--ResourceManager 对管理员暴露的地址-->

         <name>yarn.resourcemanager.admin.address</name>  

         <value>hadoop02:8033</value>

      </property>

    </configuration>

    (4)配置slaves

    [hadoop@hadoop01 hadoop]$ vi slaves

    在该文件中增加如下内容:

    Hadoop01

    Hadoop02

    Hadoop03

    (5)拷贝配置好的Hadoop到其他节点

    [hadoop@hadoop01 ~]$ xsync /opt/module/hadoop-2.7.2/

    (6)拷贝 /etc/profile 文件到其他节点

    [hadoop@hadoop01 ~]$ su root

    Password:

    [root@hadoop01 hadoop]# /home/hadoop/bin/xsync /etc/profile

    五、启动关闭集群

    (1)如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)

    [hadoop@hadoop01 hadoop-2.7.2]$ bin/hdfs namenode -format

    (2) 启动HDFS  (在NameNode节点启动)

    [hadoop@hadoop01 hadoop-2.7.2]$ sbin/start-dfs.sh

    [hadoop@hadoop01 hadoop-2.7.2]$ jps

    3716 DataNode

    3941 Jps

    3574 NameNode

    (3)启动Yarn (在Resourcemanager节点启动)

    [hadoop@hadoop02 hadoop-2.7.2]$ sbin/start-yarn.sh

    [hadoop@hadoop02 hadoop-2.7.2]$ jps

    3876 Jps

    3271 DataNode

    3422 ResourceManager

    3535 NodeManager

    启动顺序:HDFS > Yarn

    关闭顺序:Yarn > HDFS

    启动HDFS:start-dfs.sh

    启动Yarn:start-yarn.sh

    关闭HDFS:stop-dfs.sh

    关闭Yarn:stop-yarn.sh

    Web查看HDFS:http://192.168.107.2:50070

    Web查看SecondaryNameNode:http://192.168.107.4:9001/status.html

    Web查看Yarn:http://192.168.107.3:8088

     

  • 相关阅读:
    python学习笔记番外:linux文件拷贝程序
    好看视频爬虫2.0
    Serverless 如何在阿里巴巴实现规模化落地?
    开放下载!2021 解锁 Serverless 从入门到实战大“橙”就
    这位硬核程序员,想好怎么过春节了吗?
    mysql error 1130 hy000:Host 'localhost' is not allowed to connect to this mysql server 解决方案
    DockerFile
    一种移动端的token设计方案(适合app+h5)
    elk7.5搭建详解
    关于rabbitmq与kafka的异同
  • 原文地址:https://www.cnblogs.com/smandar/p/13901448.html
Copyright © 2011-2022 走看看