zoukankan      html  css  js  c++  java
  • Hadoop 2.7.3 分布式集群安装

    1. 集群规划:

    192.168.1.252 palo252 Namenode+Datanode
    192.168.1.253 palo253 YarnManager+Datanode+SecondaryNameNode
    192.168.1.254 palo254 Datanode

    2. 设定固定IP地址

    vi /etc/sysconfig/network-scripts/ifcfg-eth0
    TYPE=Ethernet
    BOOTPROTO=static
    DEFROUTE=yes
    NAME=eth0
    UUID=7ac09286-c35b-4f15-a9ba-701c093832bf
    DEVICE=eth0
    IPV4_FAILURE_FATAL=no
    IPV6INIT=yes
    IPV6_AUTOCONF=yes
    IPV6_DEFROUTE=yes
    IPV6_FAILURE_FATAL=no
    IPV6_ADDR_GEN_MODE=stable-privacy
    IPV6_PEERDNS=yes
    IPV6_PEERROUTES=yes
    IPV6_PRIVACY=no
    ONBOOT=yes
    DNS1=192.168.1.1
    IPADDR=192.168.1.252 #三台机器都要分别设置
    PREFIX=24
    GATEWAY=192.168.1.1

    3. 修改主机名:
    192.168.1.252

    hostnamectl set-hostname palo252
    hostnamectl --static set-hostname palo252

    192.168.1.253

    hostnamectl set-hostname palo253
    hostnamectl --static set-hostname palo253

    192.168.1.254

    hostnamectl set-hostname palo254
    hostnamectl --static set-hostname palo254

    4. 修改hosts文件

    vi /etc/hosts
    127.0.0.1 localhost
    ::1 localhost
    
    
    192.168.1.252 palo252
    192.168.1.253 palo253
    192.168.1.254 palo254

    5. 安装JDK(所有节点)
    具体到oracle官网下载

    6. SSH免密登录

    Preconditioninstall ssh server if not avalible

    #install ssh client and ssh-server
    sudo yum install -y openssl openssh-server
    #enable ssh server to start at system start up
    systemctl enable sshd.service
    #start ssh server service
    systemctl start sshd.service

    A) 每台机器生成访问秘钥,复制到192.168.1.252:/home/workspace目录下
    192.168.1.252:

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    cp ~/.ssh/authorized_keys /home/workspace/authorized_keys252
    rm -rf ~/.ssh/authorized_keys #删除公钥文件

    192.168.1.253:

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    scp ~/.ssh/authorized_keys 192.168.1.252:/home/workspace/authorized_keys253
    rm -rf ~/.ssh/authorized_keys #删除公钥文件

    192.168.1.254:

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    scp ~/.ssh/authorized_keys 192.168.1.252:/home/workspace/authorized_keys254
    rm -rf ~/.ssh/authorized_keys #删除公钥文件

    B) 在192.168.1.252上将所有的公钥合并成一个公钥文件

    cat /home/workspace/authorized_keys252 >> /home/workspace/authorized_keys
    cat /home/workspace/authorized_keys253 >> /home/workspace/authorized_keys
    cat /home/workspace/authorized_keys254 >> /home/workspace/authorized_keys

    C) 将合并后的公钥文件复制到集群中的各个主机中

    scp /home/workspace/authorized_keys 192.168.1.253:~/.ssh/
    scp /home/workspace/authorized_keys 192.168.1.254:~/.ssh/
    cp /home/workspace/authorized_keys ~/.ssh/ #因为目前在252主机中,所以使用的命令为cp而不是scp

    注:也可以借助 ssh-copy-id -i ~/.ssh/id_rsa.pub  {ip or hostname}来往远程机器复制公钥

    以本集群的配置为例,以上ABC三步的操作亦可以通过下面的操作来完成,操作方法如下:

    192.168.1.252,192.168.1.253,192168.1.254 均做以下操作,就完成了私钥的生成,公钥的分发

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa   #生成本机公钥和私钥
    ssh-copy-id -i ~/.ssh/id_rsa.pub palo252   #复制本机的公钥到palo252机器上,默认会存储在远程机器的~/.ssh/authorized_keys文件中,如果此文件不存在,会创建该文件
    ssh-copy-id -i ~/.ssh/id_rsa.pub palo253   #复制本机的公钥到palo252机器上,默认会存储在远程机器的~/.ssh/authorized_keys文件中,如果此文件不存在,会创建该文件
    ssh-copy-id -i ~/.ssh/id_rsa.pub palo254   #复制本机的公钥到palo252机器上,默认会存储在远程机器的~/.ssh/authorized_keys文件中,如果此文件不存在,会创建该文件

    D) 每台机器:

    chmod 755 ~                      #当前用户根目录访问权限
    chmod 700 ~/.ssh/                #.ssh目录权限
    chmod 600 ~/.ssh/id_rsa          #id_rsa的访问权限
    chmod 644 ~/.ssh/id_rsa.pub      #id_rsa.pub的访问权限
    chmod 644 ~/.ssh/authorized_keys #authorized_keys的访问权限

    说明:

    如果ssh 登录的时候失败或者需要密码才能登陆,可以查看sshd的日志信息。日志信息目录为,/var/log/secure
    你会发现如下字样的日志信息。
    Jul 22 14:20:33 v138020.go sshd[4917]: Authentication refused: bad ownership or modes for directory /home/edw

    则需要设置权限:sshd为了安全,对属主的目录和文件权限有所要求。如果权限不对,则ssh的免密码登陆不生效

    用户目录权限为 755 或者 700,就是不能是77x。
    .ssh目录权限一般为755或者700。
    rsa_id.pub 及authorized_keys权限一般为644
    rsa_id权限必须为600

    可通过来查看ssh过程中的日志.

    cat /var/log/secure

    7. 配置hadoop
    7-1) 解压
    下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

    tar xzvf hadoop-2.7.3.tar.gz -C /opt/

    7-2) 创建存放数据的目录(必须事先创建好,否则会报错)

    mkdir -p /opt/hadoop-2.7.3/data/full/tmp/
    mkdir -p /opt/hadoop-2.7.3/data/full/tmp/dfs/name
    mkdir -p /opt/hadoop-2.7.3/data/full/tmp/dfs/data

    7-3) 配置/opt/hadoop-2.7.3/etc/hadoop下面的配置文件

    cd opt/hadoop-2.7.3/etc/hadoop #定位到配置文件目录

    7-3-1) core-site.xml

    <configuration>
        <!-- 指定HDFS中NameNode的地址 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://127.0.0.1:9000</value>
            <description>hdfs://127.0.0.1:9000</description>
        </property>
        <!-- 指定hadoop运行时产生文件的存储目录 -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/lenmom/workspace/software/hadoop-2.7.3/data/tmp</value>
            <description>是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这个路径中</description>
        </property>
        <!--启用 webhdfs-->
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
            <description>启用 webhdfs</description>
        </property>
    
        <!--use hadoop native library-->
        <property>
            <name>hadoop.native.lib</name>
            <value>true</value>
            <description>Should native hadoop libraries, if present, be used.</description>
        </property>
    </configuration>

    7-3-2) yarn-site.xml

    <configuration>
        <property>
             <!-- reducer获取数据的方式 -->
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
             <!-- 指定YARN的ResourceManager的地址 -->
            <name>yarn.resourcemanager.hostname</name>
            <value>palo253</value>
        </property>
        <property>  
            <name>yarn.resourcemanager.address</name>  
            <value>palo253:8032</value>  
        </property>  
        <property>  
            <name>yarn.resourcemanager.scheduler.address</name>  
            <value>palo253:8030</value>  
        </property>  
        <property>  
            <name>yarn.resourcemanager.resource-tracker.address</name>  
            <value>palo253:8031</value>  
        </property>
        <property>  
            <name>yarn.nodemanager.resource.memory-mb</name>  
            <value>10240</value>  
        </property>  
        <property>  
            <name>yarn.scheduler.minimum-allocation-mb</name>  
            <value>1024</value>  
        </property>  
        <property>  
            <name>yarn.nodemanager.vmem-pmem-ratio</name>  
            <value>2.1</value>  
        </property>  
    </configuration>

    7-3-3) slaves

    palo252
    palo253
    palo254

    7-3-4) mapred-site.xml

    <configuration>
    <!-- 指定mr运行在yarn上 -->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>palo252:10020</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>palo252:19888</value>
    </property>
    </configuration>

    7-3-5) hdfs-site.xml

    <configuration>
         <property>
              <name>dfs.replication</name>
              <value>1</value>
              <description>不能大于datanode的数量,默认为3</description>
         </property>
         <!-- 设置secondname的端口 -->
         <property>
              <name>dfs.namenode.secondary.http-address</name>
              <value>palo253:50090</value>
         </property>
         <property>
              <name>dfs.data.dir</name>
              <value>file:/opt/hadoop-2.7.3/data/full/tmp/dfs/data</value>
              <description>用于确定将HDFS文件系统的数据保存在什么目录下,可以将这个参数设置为多个分区上目录,即可将HDFS建立在不同分区上。</description>
         </property>
         <property>
              <name>dfs.name.dir</name>
              <value>file:/opt/hadoop-2.7.3/data/full/tmp/dfs/name</value>
              <description>这个参数用于确定将HDFS文件系统的元信息保存在什么目录下,如果这个参数设置为多个目录,那么这些目录下都保存着元信息的多个备份.</description>
         </property>
    
         <!--设置 hadoop的代理用户-->
         <property>
              <name>hadoop.proxyuser.hadoop.hosts</name>
              <value>*</value>
              <description>配置成*的意义,表示任意节点使用 hadoop 集群的代理用户hadoop 都能访问 hdfs 集群</description>
         </property>
         <property>
              <name>hadoop.proxyuser.hadoop.groups</name>
              <value>*</value>
              <description>代理用户所属的组</description>
         </property>
    </configuration>

    7-3-6) hadoop-env.sh

    配置JAVA_HOME

      1 # Licensed to the Apache Software Foundation (ASF) under one
      2 # or more contributor license agreements.  See the NOTICE file
      3 # distributed with this work for additional information
      4 # regarding copyright ownership.  The ASF licenses this file
      5 # to you under the Apache License, Version 2.0 (the
      6 # "License"); you may not use this file except in compliance
      7 # with the License.  You may obtain a copy of the License at
      8 #
      9 #     http://www.apache.org/licenses/LICENSE-2.0
     10 #
     11 # Unless required by applicable law or agreed to in writing, software
     12 # distributed under the License is distributed on an "AS IS" BASIS,
     13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     14 # See the License for the specific language governing permissions and
     15 # limitations under the License.
     16 
     17 # Set Hadoop-specific environment variables here.
     18 
     19 # The only required environment variable is JAVA_HOME.  All others are
     20 # optional.  When running a distributed configuration it is best to
     21 # set JAVA_HOME in this file, so that it is correctly defined on
     22 # remote nodes.
     23 
     24 # The java implementation to use.
     25 #export JAVA_HOME=${JAVA_HOME}
     26 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64

    8. 配置环境变量(每台机器都必须做)

    vi /etc/profile

    在文件尾部添加:

    #####set jdk enviroment
    export JAVA_HOME=/usr/java/jdk1.8.0_172-amd64
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    
    ##### set hadoop_home enviroment
    export HADOOP_HOME=/opt/hadoop-2.7.3
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_HOME=/home/lenmom/workspace/software/hadoop-2.7.3
    export YARN_CONF_DIR=${YARN_HOME}/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    ###enable hadoop native library
    export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native 

    命令行终端执行 source  /etc/profile,让配置的环境变量生效

    source /etc/profile ####make the env variable to take effect right now.

    9. 启动:


    NameNode:(master 252)
    #格式化namenode

    hdfs namenode -format

    #启动dfs 

    start-dfs.sh # (master 252)

    #启动Yarn:  yarn节点(253)
    #注意:Namenode和ResourceManger如果不是同一台机器,
    #不能在NameNode上启动 yarn,
    #应该在ResouceManager所在的机器上启动yarn。

    start-yarn.sh

    #验证启动情况:
    jps #查看java进程
    http://namenode:50070/

    10 Hadoop启动停止方式

    1)各个服务组件逐一启动
    分别启动hdfs组件: hadoop-daemon.sh  start|stop  namenode|datanode|secondarynamenode
       启动yarn:     yarn-daemon.sh    start|stop  resourcemanager|nodemanager
    
    2)各个模块分开启动(配置ssh是前提)常用
    start|stop-dfs.sh     start|stop-yarn.sh
    
    3)全部启动(不建议使用)
    start|stop-all.sh

    4) 开启historyserver(任意节点启动即可)
    mr-jobhistory-daemon.sh start|stop historyserver

    reference:

    1. https://www.cnblogs.com/baierfa/p/6689022.html

  • 相关阅读:
    C#中对值类型和引用类型的一点认识
    C#随机双色球
    C#中使用ref 和 out 的一点认识
    PHP_零基础学php_3PHP函数、传参函数、默认参数、函数返回值
    PHP_零基础学php_2变量、预定义变量、预定义常量、表达式、运算符、程序控制流程
    PHP_零基础学php
    dedeCMS中单独调用子栏目模板和子栏目的文章时修改源代码给channel和chanenartllist加上limit
    CSS_img标签usemap属性图片中选择区域加入超链接
    C_文件读写流
    怎样从10亿查询词找出出现频率最高的10个
  • 原文地址:https://www.cnblogs.com/lenmom/p/9567846.html
Copyright © 2011-2022 走看看