zoukankan      html  css  js  c++  java
  • Linux: hadoop部署

    0. 准备工作

    • 3台主机
    • java环境
    • host配置
    • 免密登陆

    1. 安装包下载

    我们先在Master节点下载Hadoop包,然后修改配置,随后复制到其他Slave节点稍作修改就可以了。

    1. 下载安装包,创建Hadoop目录
    #下载  
    wget http://http://apache.claz.org/hadoop/common/hadoop-3.2.1//hadoop-3.2.1.tar.gz
    #解压到 /usr/local 目录
    sudo tar -xzvf  hadoop-3.2.1.tar.gz    -C /usr/local 
    #修改hadoop的文件权限
    sudo chown -R ubuntu:ubuntu hadoop-3.2.1.tar.gz
    #重命名文件夹   
    sudo mv  hadoop-3.2.1  hadoop
    #建议使用软连接
    ln -s hadoop-3.2.1 hadoop
    

    2. 配置Master节点的Hadoop环境变量

    2.1 全局环境变量

    [root@hadoop01 hadoop]# vi /etc/profile
    
    export HADOOP_HOME=/apprun/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin 
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    
    [root@hadoop01 apprun]# source /etc/profile
    [root@hadoop01 ~]# echo $HADOOP_HOME
    /apprun/hadoop
    

    2.2 hadoop 环境变量

    hadoop-env.sh中的JAVA_HOME参数,请指向自己系统的jdk路径。该参数是hadoop运行时读取环境变量的路径。(整个脚本就这一行,其他都是注释掉的)

    [root@hadoop01 hadoop]# vi etc/hadoop/hadoop-env.sh
    export JAVA_HOME=/apprun/jdk
    export HADOOP_HOME=/apprun/hadoop
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
    

    3. 配置Master节点

    Hadoop 的各个组件均用XML文件进行配置, 配置文件都放在 /usr/local/hadoop/etc/hadoop 目录中:

    • core-site.xml:配置通用属性,例如HDFS和MapReduce常用的I/O设置等
    • hdfs-site.xml:Hadoop守护进程配置,包括namenode、辅助namenode和datanode等
    • mapred-site.xml:MapReduce守护进程配置
    • yarn-site.xml:资源调度相关配置

    3.1 编辑core-site.xml文件,修改内容如下:

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/apprun/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop01:9000</value>
        </property>
    </configuration>
    

    参数说明:

    • fs.defaultFS:默认文件系统,HDFS的客户端访问HDFS需要此参数
    • hadoop.tmp.dir:指定Hadoop数据存储的临时目录,其它目录会基于此路径, 建议设置到一个足够空间的地方,而不是默认的/tmp下

    如没有配置hadoop.tmp.dir参数,系统使用默认的临时目录:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。

    3.2 编辑hdfs-site.xml,修改内容如下:

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.name.dir</name>
            <value>/apprun/hadoop/hdfs/name</value>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/apprun/hadoop/hdfs/data</value>
        </property>
        <!--namenode和second-->
        <property>
        	<name>dfs.http.address</name>
        	<value>hadoop01:50070</value>
      	</property>
        <property>
            <name>dfs.secondary.http.address</name>
            <value>hadoop02:50090</value>
      	</property>
    </configuration>
    

    参数说明:

    • dfs.replication:数据块副本数
    • dfs.name.dir:指定namenode节点的文件存储目录
    • dfs.data.dir:指定datanode节点的文件存储目录

    3.3 编辑mapred-site.xml,修改内容如下:

    <configuration>
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
      </property>
      <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
      </property>
    </configuration>
    

    3.4 编辑yarn-site.xml,修改内容如下:

    <configuration>
    <!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
        </property>
    </configuration>
    

    3.5 编辑workers, 修改内容如下:

    hadoop01
    hadoop02
    hadoop03
    

    4. 配置worker 的 Slave节点

    将Master节点配置好的Hadoop打包,发送到其他两个节点:

    # 打包hadoop包
    tar -czf hadoop.tar.gz /apprun/hadoop
    # 拷贝到其他两个节点
    scp hadoop.tar.gz root@hadoop02:~
    scp hadoop.tar.gz root@hadoop03:~
    

    在其他节点加压Hadoop包到/usr/local目录

    sudo tar -xzvf hadoop.tar.gz -C /apprun/
    

    配置Slave1和Slaver2两个节点的Hadoop环境变量:

    [root@hadoop01 hadoop]# vi /etc/profile
    
    export HADOOP_HOME=/apprun/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin 
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    
    [root@hadoop01 apprun]# source /etc/profile
    [root@hadoop01 ~]# echo $HADOOP_HOME
    /apprun/hadoop
    

    启动集群

    4.1 格式化HDFS文件系统

    进入Master节点的Hadoop目录,执行一下操作:

    bin/hadoop namenode -format
    

    格式化namenode,第一次启动服务前执行的操作,以后不需要执行。

    如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。

    common.Storage: Storage directory /apprun/elk/hadoop_repo/dfs/name has been successfully formatted.
    

    4.2 启动Hadoop集群

    服务启动

    [root@hadoop01 hadoop]# sbin/start-all.sh
    Starting namenodes on [hadoop01]
    Starting datanodes
    Starting secondary namenodes [hadoop02]
    2020-08-21 14:08:09,698 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting resourcemanager
    Starting nodemanagers
    [root@hadoop01 hadoop]#
    [root@hadoop01 hadoop]# jps
    31188 NodeManager
    30439 NameNode
    30615 DataNode
    31531 Jps
    

    服务状态

    [root@hadoop01 hadoop]# hadoop dfsadmin -report
    WARNING: Use of this script to execute dfsadmin is deprecated.
    WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
    
    2020-08-21 14:13:48,047 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Configured Capacity: 303759667200 (282.90 GB)
    Present Capacity: 261995925504 (244.00 GB)
    DFS Remaining: 261995827200 (244.00 GB)
    DFS Used: 98304 (96 KB)
    DFS Used%: 0.00%
    Replicated Blocks:
            Under replicated blocks: 0
            Blocks with corrupt replicas: 0
            Missing blocks: 0
            Missing blocks (with replication factor 1): 0
            Low redundancy blocks with highest priority to recover: 0
            Pending deletion blocks: 0
    Erasure Coded Block Groups:
            Low redundancy block groups: 0
            Block groups with corrupt internal blocks: 0
            Missing block groups: 0
            Low redundancy blocks with highest priority to recover: 0
            Pending deletion blocks: 0
    
    -------------------------------------------------
    Live datanodes (3):
    
    Name: 10.20.1.188:9866 (hadoop01)
    Hostname: hadoop01
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 6546034688 (6.10 GB)
    DFS Remaining: 89563734016 (83.41 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 88.46%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    Name: 10.20.1.189:9866 (hadoop02)
    Hostname: hadoop02
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 10085851136 (9.39 GB)
    DFS Remaining: 86023917568 (80.12 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 84.96%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    Name: 10.20.1.190:9866 (hadoop03)
    Hostname: hadoop03
    Decommission Status : Normal
    Configured Capacity: 101253222400 (94.30 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 9701593088 (9.04 GB)
    DFS Remaining: 86408175616 (80.47 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 85.34%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Aug 21 14:13:47 CST 2020
    Last Block Report: Fri Aug 21 14:08:05 CST 2020
    Num of Blocks: 0
    
    
    [root@hadoop01 hadoop]#
    

    服务关闭

    [root@hadoop01 hadoop]# sbin/stop-all.sh
    Stopping namenodes on [hadoop01]
    Stopping datanodes
    Stopping secondary namenodes [hadoop02]
    2020-08-21 14:06:07,290 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Stopping nodemanagers
    hadoop01: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    hadoop02: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    hadoop03: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
    Stopping resourcemanager
    [root@hadoop01 hadoop]#
    

    参考

    Hadoop3.2.1版本的环境搭建

  • 相关阅读:
    ActiveMQ的两种消息模式,主题、队列
    微信5.0打飞机怎么取得高分?
    微信公众平台消息接口星标功能
    WordPress的SEO技术
    2013中国微信公众平台用户研究报告
    jQuery Mobile入门教程
    微信公众平台商户模块
    使用PHP绘制统计图
    Google Chart API 参考 中文版
    使用Google Chart API绘制组合图
  • 原文地址:https://www.cnblogs.com/renzhuo/p/13554088.html
Copyright © 2011-2022 走看看