zoukankan      html  css  js  c++  java
  • Hadoop 2.7.2 集群安装配置

    一、虚拟机环境配置

    准备三台虚拟机:

    Linux系统

    IP

    hostname

    Centos 7

    192.168.107.2

    Hadoop01

    Centos 7

    192.168.107.3

    Hadoop02

    Centos 7

    192.168.107.4

    Hadoop03

    (1)下载安装VMware

    链接:https://pan.baidu.com/s/1a_9pW6-nesgl_GUINFfqsA 提取码:6b6b

    (2)在VMware上安装虚拟机

    (3)克隆虚拟机 (百度一下)

    (4)配置虚拟机

    4.1 修改克隆虚拟机的静态IP

    4.2 修改主机名

    4.3 关闭防火墙

    4.4 创建hadoop用户

    4.5 配置hadoop用户具有root权限

      修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行

      ## Allow root to run any commands anywhere

      root ALL=(ALL) ALL

      hadoop ALL=(ALL) ALL

    (5)在/opt目录下创建module、software文件夹

    [hadoop@hadoop01 ~]$ sudo mkdir -p /opt/module

    [hadoop@hadoop01 ~]$ sudo mkdir -p /opt/software

    (6)修改module、software文件夹的所有者为hadoop

    [hadoop@hadoop01 opt]$ sudo chown hadoop:hadoop /opt/module/ /opt/software/ 

    (7)配置SSH免密登录

    生成公钥和私钥:

    [hadoop@hadoop01 ~]$ ssh-keygen -t rsa

    然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)

    将公钥拷贝到要免密登录的目标机器上

    [hadoop@hadoop01 ~]$ ssh-copy-id hadoop01

    [hadoop@hadoop01 ~]$ ssh-copy-id hadoop02

    验证

    [hadoop@hadoop01 ~]$ ssh hadoop02

    Last login: Tue Oct 27 19:10:02 2020 from gateway

    [hadoop@hadoop02 ~]$

     

    (8)编写集群分发脚本xsync

    (a)在/home/hadoop目录下创建bin目录,并在bin目录下xsync创建文件,文件内容如下:

    [hadoop@hadoop01 ~]$ mkdir bin

    [hadoop@hadoop01 ~]$ cd bin/

    [hadoop@hadoop01 bin]$ touch xsync

    [hadoop@hadoop01 bin]$ vi xsync

    在该文件中编写如下代码

    #!/bin/bash
    
    #1 获取输入参数个数,如果没有参数,直接退出
    pcount=$#
    if((pcount==0)); then
    echo no args;
    exit;
    fi
    
    #2 获取文件名称
    p1=$1
    fname=`basename $p1`
    echo fname=$fname
    
    #3 获取上级目录到绝对路径
    pdir=`cd -P $(dirname $p1); pwd`
    echo pdir=$pdir
    
    #4 获取当前用户名称
    user=`whoami`
    
    #5 循环
    for((host=2; host<=3; host++)); do
            echo ------------------- hadoop$host --------------
            rsync -rvl $pdir/$fname $user@hadoop0$host:$pdir
    done

    (b)修改脚本 xsync 具有执行权限

    [hadoop@hadoop01 bin]$ chmod 777 xsync

    (c)调用脚本形式:xsync 文件名称

    [hadoop@hadoop01 bin]$ xsync /home/hadoop/bin

     注意:如果将xsync放到/home/hadoop/bin目录下仍然不能实现全局使用,可以将xsync移动到/usr/local/bin目录下。

     

    二、安装JDK

    (1)下载JDK

    链接:https://pan.baidu.com/s/1wuR2FQe_RYO5mBpAWytGLw 提取码:td3y

    (2)用FTP/SFTP工具将下载的JDK压缩包上传至/opt/software/

    (3)解压JDK到/opt/module目录下

    [hadoop@hadoop01 software]$ tar -zvxf jdk-8u144-linux-x64.tar.gz -C /opt/module/

    (4) 配置JDK

       4.1 获取JDK路径

    [hadoop@hadoop01 software]$ cd /opt/module/jdk1.8.0_144/

    [hadoop@hadoop01 jdk1.8.0_144]$ pwd

         /opt/module/jdk1.8.0_144

            4.2 编辑 /etc/profile文件source /etc/profile

              [hadoop@hadoop01 jdk1.8.0_144]$ sudo vi /etc/profile

                添加:

    #JAVA_HOME

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    export PATH=$PATH:$JAVA_HOME/bin

            4.3 让修改后的 /etc/profile文件生效

    [hadoop@hadoop01 jdk1.8.0_144]$ source /etc/profile

    4.4 测试JDK安装是否成功

    [hadoop@hadoop01 jdk1.8.0_144]$ java -version

    java version "1.8.0_144"

            

    三、安装Hadoop

    (1)下载Hadoop

    链接:https://pan.baidu.com/s/1Bb3h5OwAymfBtF0jKGwhXg  提取码:o061

    (2)用FTP/SFTP工具将下载的Hadoop压缩包上传至/opt/software/

    (3)解压Hadoop压缩包到/opt/module目录下

    [hadoop@hadoop01 software]$ tar -zvxf hadoop-2.7.2.tar.gz -C /opt/module/

    (4)将Hadoop添加到环境变量

            4.1 获取Hadoop_home路径

    [hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/

    [hadoop@hadoop01 hadoop-2.7.2]$ pwd

    /opt/module/hadoop-2.7.2

            4.2 编辑 /etc/profile 文件

    [hadoop@hadoop01 hadoop-2.7.2]$ vi /etc/profile

    添加:

    # Hadoop

    export HADOOP_HOME=/opt/module/hadoop-2.7.2

    export PATH=$PATH:$HADOOP_HOME/bin

    export PATH=$PATH:$HADOOP_HOME/sbin

            4.3 让修改后的 /etc/profile文件生效

      [hadoop@hadoop01 hadoop-2.7.2]$ source /etc/profile

    (5)Hadoop目录结构

    drwxr-xr-x. 2 hadoop hadoop   194 May 22  2017    bin

    drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    etc

    drwxr-xr-x. 2 hadoop hadoop   106 May 22  2017    include

    drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    lib

    drwxr-xr-x. 2 hadoop hadoop   239 May 22  2017    libexec

    -rw-r--r--. 1 hadoop hadoop 15429 May 22  2017   LICENSE.txt

    -rw-r--r--. 1 hadoop hadoop   101 May 22  2017    NOTICE.txt

    -rw-r--r--. 1 hadoop hadoop  1366 May 22  2017   README.txt

    drwxr-xr-x. 2 hadoop hadoop  4096 May 22  2017   sbin

    drwxr-xr-x. 4 hadoop hadoop    31 May 22  2017    share

    bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本

    etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件

    lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)

    sbin目录:存放启动或停止Hadoop相关服务的脚本

    share目录:存放Hadoop的依赖jar包、文档、和官方案例

    四、配置集群

    hadoop01(192.168.107.2)

    hadoop02(192.168.107.3)

    hadoop03(192.168.107.4)

    NameNode

    Resourcemanager

    SecondaryNameNode

    DataNode

    DataNode

    DataNode

    NodeManager

    NodeManager

    NodeManager



      [hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/

    (1)配置HDFS相关文件

    配置core-site.xml

    [hadoop@hadoop01 hadoop]$ vi core-site.xml

    在该文件中编写如下配置:

    <configuration>

      <property>    <!-- 指定HDFS中NameNode的地址 -->

         <name>fs.defaultFS</name>

         <value>hdfs://hadoop01:9000</value>

      </property>

      <property>    <!-- 指定Hadoop运行时产生文件的存储目录 -->

         <name>hadoop.tmp.dir</name>

         <value>/opt/module/hadoop-2.7.2/data/tmp</value>

      </property>

    </configuration>

    配置hadoop-env.sh

    [hadoop@hadoop01 hadoop]$ vi hadoop-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    配置hdfs-site.xml

    [hadoop@hadoop01 hadoop]$ vi hdfs-site.xml

      在该文件中编写如下配置:

     <configuration>

        <property>  <!--设置备份数量-->

         <name>dfs.replication</name>

         <value>3</value>

        </property>

        <property>  <!-- 指定HDFS中SecondaryNameNode的配置 -->

         <name>dfs.namenode.secondary.http-address</name>

         <value>hadoop03:9001</value>

        </property>

        <property>

         <name>dfs.permissions.enabled</name>

         <value>false</value>

        </property>

        <property> 

         <name>dfs.datanode.max.xcievers</name>

         <value>4096</value>

         <dedication> Datanode 有一个同时处理文件的上限,至少要有4096</dedication>

        </property>

        <property>  <!--设置为true,可以在浏览器中IP+port查看-->

         <name>dfs.webhdfs.enabled</name>

         <value>true</value>

        </property>

    </configuration>

     (2)配置MapReduce

    配置mapred-env.sh

    [hadoop@hadoop01 hadoop]$ vi mapred-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

      配置mapred-site.xml

        [hadoop@hadoop01 hadoop]$ cp mapred-site.xml.template mapred-site.xml

        [hadoop@hadoop01 hadoop]$ vi mapred-site.xml

        在该文件中编写如下配置:

    <configuration>

        <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

        </property>

        <property>  <!--配置实际的主机名和端口-->

           <name>mapreduce.jobhistory.address</name>

           <value>hadoop02:10020</value>

        </property>

        <property>

           <name>mapreduce.jobhistory.webapp.address</name>

           <value>hadoop02:19888</value>

        </property>

    </configuration>

    (3)配置Yarn

    配置yarn-env.sh

    [hadoop@hadoop01 hadoop]$ vi yarn-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    配置yarn-site.xml

    [hadoop@hadoop01 hadoop]$ vi yarn-site.xml

    在该文件中编写如下配置:

    <configuration>

        <property>  <!-- Reducer获取数据的方式 -->

            <name>yarn.nodemanager.aux-services</name>

            <value>mapreduce_shuffle</value>

        </property>

        <property>

           <name>yarn.log-aggregation-enable</name>

           <value>true</value>

        </property>

        <property>  <!--日志保存时间 默认保存3-7-->  

           <name>yarn.log-aggregation.retain-seconds</name>

           <value>86400</value>

        </property>

      <property>    <!-- 指定YARN的ResourceManager的地址 -->

         <name>yarn.resourcemanager.hostname</name>

         <value>hadoop02</value>

      </property>

      <property> <!--ResourceManager 对外web暴露的地址,可在浏览器查看-->  

         <name>yarn.resourcemanager.webapp.address</name>

         <value>hadoop02:8088</value>

      </property>

      <property> <!--ResourceManager 对ApplicationMaster暴露的地址--> 

         <name>yarn.resourcemanager.scheduler.address</name>

         <value>hadoop02:8030</value>

      </property>

      <property> <!--ResourceManager 对NodeManager暴露的地址-->

         <name>yarn.resourcemanager.resource-tracker.address</name> 

         <value>hadoop02:8031</value>

      </property>

      <property> <!--ResourceManager 对客户端暴露的地址-->

         <name>yarn.resourcemanager.address</name>

         <value>hadoop02:8032</value>

      </property>

      <property> <!--ResourceManager 对管理员暴露的地址-->

         <name>yarn.resourcemanager.admin.address</name>  

         <value>hadoop02:8033</value>

      </property>

    </configuration>

    (4)配置slaves

    [hadoop@hadoop01 hadoop]$ vi slaves

    在该文件中增加如下内容:

    Hadoop01

    Hadoop02

    Hadoop03

    (5)拷贝配置好的Hadoop到其他节点

    [hadoop@hadoop01 ~]$ xsync /opt/module/hadoop-2.7.2/

    (6)拷贝 /etc/profile 文件到其他节点

    [hadoop@hadoop01 ~]$ su root

    Password:

    [root@hadoop01 hadoop]# /home/hadoop/bin/xsync /etc/profile

    五、启动关闭集群

    (1)如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)

    [hadoop@hadoop01 hadoop-2.7.2]$ bin/hdfs namenode -format

    (2) 启动HDFS  (在NameNode节点启动)

    [hadoop@hadoop01 hadoop-2.7.2]$ sbin/start-dfs.sh

    [hadoop@hadoop01 hadoop-2.7.2]$ jps

    3716 DataNode

    3941 Jps

    3574 NameNode

    (3)启动Yarn (在Resourcemanager节点启动)

    [hadoop@hadoop02 hadoop-2.7.2]$ sbin/start-yarn.sh

    [hadoop@hadoop02 hadoop-2.7.2]$ jps

    3876 Jps

    3271 DataNode

    3422 ResourceManager

    3535 NodeManager

    启动顺序:HDFS > Yarn

    关闭顺序:Yarn > HDFS

    启动HDFS:start-dfs.sh

    启动Yarn:start-yarn.sh

    关闭HDFS:stop-dfs.sh

    关闭Yarn:stop-yarn.sh

    Web查看HDFS:http://192.168.107.2:50070

    Web查看SecondaryNameNode:http://192.168.107.4:9001/status.html

    Web查看Yarn:http://192.168.107.3:8088

     

  • 相关阅读:
    14.4.9 Configuring Spin Lock Polling 配置Spin lock 轮询:
    14.4.8 Configuring the InnoDB Master Thread IO Rate 配置InnoDB Master Thread I/O Rate
    14.4.7 Configuring the Number of Background InnoDB IO Threads 配置 后台InnoDB IO Threads的数量
    14.4.7 Configuring the Number of Background InnoDB IO Threads 配置 后台InnoDB IO Threads的数量
    14.4.6 Configuring Thread Concurrency for InnoDB 配置Thread 并发
    14.4.6 Configuring Thread Concurrency for InnoDB 配置Thread 并发
    14.4.5 Configuring InnoDB Change Buffering 配置InnoDB Change Buffering
    14.4.5 Configuring InnoDB Change Buffering 配置InnoDB Change Buffering
    14.4.4 Configuring the Memory Allocator for InnoDB InnoDB 配置内存分配器
    14.4.4 Configuring the Memory Allocator for InnoDB InnoDB 配置内存分配器
  • 原文地址:https://www.cnblogs.com/smandar/p/13901448.html
Copyright © 2011-2022 走看看