zoukankan      html  css  js  c++  java
  • Linux: hadoop部署

    0. 准备工作

    • 3台主机
    • java环境
    • host配置
    • 免密登陆

    1. 安装包下载

    我们先在Master节点下载Hadoop包,然后修改配置,随后复制到其他Slave节点稍作修改就可以了。

    1. 下载安装包,创建Hadoop目录
    #下载  
    wget http://http://apache.claz.org/hadoop/common/hadoop-3.2.1//hadoop-3.2.1.tar.gz
    #解压到 /usr/local 目录
    sudo tar -xzvf  hadoop-3.2.1.tar.gz    -C /usr/local 
    #修改hadoop的文件权限
    sudo chown -R ubuntu:ubuntu hadoop-3.2.1.tar.gz
    #重命名文件夹   
    sudo mv  hadoop-3.2.1  hadoop
    #建议使用软连接
    ln -s hadoop-3.2.1 hadoop
    

    2. 配置Master节点的Hadoop环境变量

    2.1 全局环境变量

    [root@hadoop01 hadoop]# vi /etc/profile
    
    export HADOOP_HOME=/apprun/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin 
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    
    [root@hadoop01 apprun]# source /etc/profile
    [root@hadoop01 ~]# echo $HADOOP_HOME
    /apprun/hadoop
    

    2.2 hadoop 环境变量

    hadoop-env.sh中的JAVA_HOME参数,请指向自己系统的jdk路径。该参数是hadoop运行时读取环境变量的路径。(整个脚本就这一行,其他都是注释掉的)

    [root@hadoop01 hadoop]# vi etc/hadoop/hadoop-env.sh
    export JAVA_HOME=/apprun/jdk
    export HADOOP_HOME=/apprun/hadoop
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
    

    3. 配置Master节点

    Hadoop 的各个组件均用XML文件进行配置, 配置文件都放在 /usr/local/hadoop/etc/hadoop 目录中:

    • core-site.xml:配置通用属性,例如HDFS和MapReduce常用的I/O设置等
    • hdfs-site.xml:Hadoop守护进程配置,包括namenode、辅助namenode和datanode等
    • mapred-site.xml:MapReduce守护进程配置
    • yarn-site.xml:资源调度相关配置

    3.1 编辑core-site.xml文件,修改内容如下:

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/apprun/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop01:9000</value>
        </property>
    </configuration>
    

    参数说明:

    • fs.defaultFS:默认文件系统,HDFS的客户端访问HDFS需要此参数
    • hadoop.tmp.dir:指定Hadoop数据存储的临时目录,其它目录会基于此路径, 建议设置到一个足够空间的地方,而不是默认的/tmp下

    如没有配置hadoop.tmp.dir参数,系统使用默认的临时目录:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。

    3.2 编辑hdfs-site.xml,修改内容如下:

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.name.dir</name>
            <value>/apprun/hadoop/hdfs/name</value>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/apprun/hadoop/hdfs/data</value>
        </property>
        <!--namenode和second-->
        <property>
        	<name>dfs.http.address</name>
        	<value>hadoop01:50070</value>
      	</property>
        <property>
            <name>dfs.secondary.http.address</name>
            <value>hadoop02:50090</value>
      	</property>
    </configuration>
    

    参数说明:

    • dfs.replication:数据块副本数
    • dfs.name.dir:指定namenode节点的文件存储目录
    • dfs.data.dir:指定datanode节点的文件存储目录

    3.3 编辑mapred-site.xml,修改内容如下:

    <configuration>
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
      </property>
      <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
      </property>
    </configuration>
    

    3.4 编辑yarn-site.xml,修改内容如下:

    <configuration>
    <!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
        </property>
    </configuration>
    

    3.5 编辑workers, 修改内容如下:

    hadoop01
    hadoop02
    hadoop03
    

    4. 配置worker 的 Slave节点

    将Master节点配置好的Hadoop打包,发送到其他两个节点:

    # 打包hadoop包
    tar -czf hadoop.tar.gz /apprun/hadoop
    # 拷贝到其他两个节点
    scp hadoop.tar.gz root@hadoop02:~
    scp hadoop.tar.gz root@hadoop03:~
    

    在其他节点加压Hadoop包到/usr/local目录

    sudo tar -xzvf hadoop.tar.gz -C /apprun/
    

    配置Slave1和Slaver2两个节点的Hadoop环境变量:

    [root@hadoop01 hadoop]# vi /etc/profile
    
    export HADOOP_HOME=/apprun/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin 
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    
    [root@hadoop01 apprun]# source /etc/profile
    [root@hadoop01 ~]# echo $HADOOP_HOME
    /apprun/hadoop
    

    启动集群

    4.1 格式化HDFS文件系统

    进入Master节点的Hadoop目录,执行一下操作:

    bin/hadoop namenode -format
    

    格式化namenode,第一次启动服务前执行的操作,以后不需要执行。

    如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。

    common.Storage: Storage directory /apprun/elk/hadoop_repo/dfs/name has been successfully formatted.
    

    4.2 启动Hadoop集群

    服务启动

    [root@hadoop01 hadoop]# sbin/start-all.sh
    Starting namenodes on [hadoop01]
    Starting datanodes
    Starting secondary namenodes [hadoop02]
    2020-08-21 14:08:09,698 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting resourcemanager
    Starting nodemanagers
    [root@hadoop01 hadoop]#
    [root@hadoop01 hadoop]# jps
    31188 NodeManager
    30439 NameNode
    30615 DataNode
    31531 Jps
    

    服务状态

    [root@hadoop01 hadoop]# hadoop dfsadmin -report
    WARNING: Use of this script to execute dfsadmin is deprecated.
    WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
    
    2020-08-21 14:13:48,047 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Configured Capacity: 303759667200 (282.90 GB)
    Present Capacity: 261995925504 (244.00 GB)
    DFS Remaining: 261995827200 (244.00 GB)
    DFS Used: 98304 (96 KB)
    DFS Used%: 0.00%
    Replicated Blocks:
            Under replicated blocks: 0
            Blocks with corrupt replicas: 0
            Missing blocks: 0
            Missing blocks (with replication factor 1): 0
            Low redundancy blocks with highest priority to recover: 0
            Pending deletion blocks: 0
    Erasure Coded Block Groups:
            Low redundancy block groups: 0
            Block groups with corrupt internal blocks: 0
            Missing block groups: 0
            Low redundancy blocks with highest priority to recover: 0
            Pending deletion blocks: 0
    
    -------------------------------------------------
    Live datanodes (3):
    
    Name: 10.20.1.188:9866 (hadoop01)
    Hostname: hadoop01
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 6546034688 (6.10 GB)
    DFS Remaining: 89563734016 (83.41 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 88.46%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    Name: 10.20.1.189:9866 (hadoop02)
    Hostname: hadoop02
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 10085851136 (9.39 GB)
    DFS Remaining: 86023917568 (80.12 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 84.96%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    Name: 10.20.1.190:9866 (hadoop03)
    Hostname: hadoop03
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 9701593088 (9.04 GB)
    DFS Remaining: 86408175616 (80.47 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 85.34%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    [root@hadoop01 hadoop]#
    

    服务关闭

    [root@hadoop01 hadoop]# sbin/stop-all.sh
    Stopping namenodes on [hadoop01]
    Stopping datanodes
    Stopping secondary namenodes [hadoop02]
    2020-08-21 14:06:07,290 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Stopping nodemanagers
    hadoop01: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    hadoop02: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    hadoop03: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    Stopping resourcemanager
    [root@hadoop01 hadoop]#
    

    参考

    Hadoop3.2.1版本的环境搭建

  • 相关阅读:
    Yield Usage Understanding
    Deadclock on calling async methond
    How to generate file name according to datetime in bat command
    Run Unit API Testing Which Was Distributed To Multiple Test Agents
    druid的关键参数+数据库连接池运行原理
    修改idea打开新窗口的默认配置
    spring boot -thymeleaf-url
    @pathvariable和@RequestParam的区别
    spring boot -thymeleaf-域对象操作
    spring boot -thymeleaf-遍历list和map
  • 原文地址:https://www.cnblogs.com/renzhuo/p/13554088.html
Copyright © 2011-2022 走看看