zoukankan      html  css  js  c++  java
  • Hadoop集群安装-CDH5(3台服务器集群)

    CDH5包下载:http://archive.cloudera.com/cdh5/

    主机规划:

    IP

    Host

    部署模块

    进程

    192.168.107.82

    Hadoop-NN-01

    NameNode

    ResourceManager

    NameNode

    DFSZKFailoverController

    ResourceManager

    192.168.107.83

    Hadoop-DN-01

    Zookeeper-01

    DataNode

    NodeManager

    Zookeeper

    DataNode

    NodeManager

    JournalNode

    QuorumPeerMain

    192.168.107.84

    Hadoop-DN-02

    Zookeeper-02

    DataNode

    NodeManager

    Zookeeper

    DataNode

    NodeManager

    JournalNode

    QuorumPeerMain

    各个进程解释:

    • NameNode
    • ResourceManager
    • DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
    • DataNode
    • NodeManager
    • JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
    • QuorumPeerMain:Zookeeper主进程

    目录规划:

    名称

    路径

    $HADOOP_HOME

    /home/hadoopuser/hadoop-2.6.0-cdh5.6.0

    Data

    $ HADOOP_HOME/data

    Log

    $ HADOOP_HOME/logs

     

    配置:

    一、关闭防火墙(防火墙可以以后配置)

    二、安装JDK(略)

    三、修改HostName并配置Host3台)

    [root@Linux01 ~]# vim /etc/sysconfig/network
    [root@Linux01 ~]# vim /etc/hosts
    
    192.168.107.82 Hadoop-NN-01
    192.168.107.83 Hadoop-DN-01 Zookeeper-01
    192.168.107.84 Hadoop-DN-02 Zookeeper-01

    四、为了安全,创建Hadoop专门登录的用户(5台)

    [root@Linux01 ~]# useradd hadoopuser
    [root@Linux01 ~]# passwd hadoopuser
    [root@Linux01 ~]# su – hadoopuser        #切换用户

    五、配置SSH免密码登录(2NameNode

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-keygen   #生成公私钥
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopuser@Hadoop-NN-01

    -I 表示 input

    ~/.ssh/id_rsa.pub 表示哪个公钥组

    或者省略为:

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id Hadoop-NN-01(或写IP:10.10.51.231)   #将公钥扔到对方服务器
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id ”6000 Hadoop-NN-01”  #如果带端口则这样写

    注意修改Hadoop的配置文件 Hadoop-env.sh

    export HADOOP_SSH_OPTS=”-p 6000”

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01  #验证(退出当前连接命令:exit、logout)
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 –p 6000  #如果带端口这样写

    六、配置环境变量:vi ~/.bashrc 然后 source ~/.bashrc5台)

    [hadoopuser@Linux01 ~]$ vi ~/.bashrc
    # hadoop cdh5
    export HADOOP_HOME=/home/hadoopuser/hadoop-2.6.0-cdh5.6.0
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    
    [hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

    七、安装zookeeper2DataNode

    1、解压

    2、配置环境变量:vi ~/.bashrc

    [hadoopuser@Linux01 ~]$ vi ~/.bashrc
    # zookeeper cdh5
    export ZOOKEEPER_HOME=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    
    [hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

    3、修改日志输出

    [hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/libexec/zkEnv.sh
    56行: 找到如下位置修改语句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"

    4、修改配置文件

    [hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/conf/zoo.cfg
    
    # zookeeper
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0/data
    clientPort=2181
    
    # cluster
    server.1=Zookeeper-01:2888:3888
    server.2=Zookeeper-02:2888:3888

    5、设置myid

    (1)Hadoop-DN -01:

    mkdir $ZOOKEEPER_HOME/data
    echo 1 > $ZOOKEEPER_HOME/data/myid

    (2)Hadoop-DN -02:

    mkdir $ZOOKEEPER_HOME/data
    echo 2 > $ZOOKEEPER_HOME/data/myid

    6、各结点启动:

    [hadoopuser@Linux01 ~]$ zkServer.sh start

    7、验证

    [hadoopuser@Linux01 ~]$ jps
    
    3051 Jps
    2829 QuorumPeerMain

    8、状态

    [hadoopuser@Linux01 ~]$ zkServer.sh status
    
    JMX enabled by default
    Using config: /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg
    Mode: follower

    9、附录zoo.cfg各配置项说明

     

    属性

    意义

    tickTime

    时间单元,心跳和最低会话超时时间为tickTime的两倍

    dataDir

    数据存放位置,存放内存快照和事务更新日志

    clientPort

    客户端访问端口

    initLimit

    配 置 Zookeeper 接受客户端(这里所说的客户端不是用户连接 Zookeeper服务器的客户端,而是 Zookeeper 服务器集群中连接到 Leader 的 Follower 服务器)初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过 10 个心跳的时间(也就是 tickTime)长度后 Zookeeper 服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败。总的时间长度就是 5*2000=10 秒。

    syncLimit

    这个配置项标识 Leader 与 Follower 之间发送消息,请求和应答时间长度,最长不能超过多少个

    server.id=host:port:port

    server.A=BCD

    集群结点列表:

    A :是一个数字,表示这个是第几号服务器;

    B :是这个服务器的 ip 地址;

    C :表示的是这个服务器与集群中的 Leader 服务器交换信息的端口;

    D :表示的是万一集群中的 Leader 服务器挂了,需要一个端口来重新进行选举,选出一个新的 Leader,而这个端口就是用来执行选举时服务器相互通信的端口。如果是伪集群的配置方式,由于 B 都是一样,所以不同的 Zookeeper 实例通信端口号不能一样,所以要给它们分配不同的端口号。

     

    八、安装Hadoop,并配置(只装1配置完成后分发给其它节点)

    1、解压

    2、修改配置文件

    1)修改 $HADOOP_HOME/etc/hadoop/masters

    Hadoop-NN-01

    2)修改 $HADOOP_HOME/etc/hadoop/slaves

    Hadoop-DN-01
    Hadoop-DN-02

    3)修改 $HADOOP_HOME/etc/hadoop/vi core-site.xml

    <configuration>
            <property>
                   <name>fs.defaultFS</name>
                   <value>hdfs://Hadoop-NN-01:9000</value>
                   <description>定义HadoopMaster的URI和端口</description>
            </property>
            <property>
                   <name>io.file.buffer.size</name>
                   <value>131072</value>
                   <description>用作序列化文件处理时读写buffer的大小</description>
            </property>
            <property>
                   <name>hadoop.tmp.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/tmp</value>
                   <description>临时数据存储目录设定</description>
            </property>
    </configuration>

    (4)修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml

    <configuration>
            <property>
                   <name>dfs.namenode.name.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/name</value>
                   <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
            </property>
            <property>
                   <name>dfs.datanode.data.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/data</value>
                   <description>datanode存放block本地目录(需要修改)</description>
            </property>
            <property>
                   <name>dfs.replication</name>
                   <value>1</value>
                   <description>文件副本个数,默认为3</description>
            </property>
            <property>
                <name>dfs.blocksize</name>
                <value>134217728</value>
                <description>块大小128M</description>
            </property>
            <property>
                <name>dfs.permissions</name>
                <value>false</value>
                <description>是否对DFS中的文件进行权限控制(测试中一般用false)</description>
            </property>
    </configuration>

    (5)修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml

    <configuration>
            <property>
                   <name>yarn.resourcemanager.address</name>
                   <value>Hadoop-NN-01:8032</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.scheduler.address</name>
                   <value>Hadoop-NN-01:8030</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.resource-tracker.address</name>
                   <value>Hadoop-NN-01:8031</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.admin.address</name>
                   <value>Hadoop-NN-01:8033</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.webapp.address</name>
                   <value>Hadoop-NN-01:8088</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services</name>
                   <value>mapreduce_shuffle</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
    </configuration>

    (6)修改 $HADOOP_HOME/etc/hadoop/ mapred-site.xml

    <configuration>
            <property>
                   <name>mapreduce.framework.name</name>
                   <value>yarn</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.address</name>
                   <value>Hadoop-NN-01:10020</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.webapp.address</name>
                   <value>Hadoop-NN-01:19888</value>
            </property>
    </configuration>

    7)修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh

    #--------------------Java Env------------------------------
    export JAVA_HOME="/usr/java/jdk1.8.0_73"
    #--------------------Hadoop Env----------------------------
    #export HADOOP_PID_DIR=${HADOOP_PID_DIR}
    export HADOOP_PREFIX="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0"
    #--------------------Hadoop Daemon Options-----------------
    # export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
    # export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
    #--------------------Hadoop Logs---------------------------
    #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
    #--------------------SSH PORT-------------------------------
    export HADOOP_SSH_OPTS="-p 6000"        #如果你修改了SSH登录端口,一定要修改此配置。

    8)修改 $HADOOP_HOME/etc/hadoop/yarn-env.sh

    #Yarn Daemon Options
    #export YARN_RESOURCEMANAGER_OPTS
    #export YARN_NODEMANAGER_OPTS
    #export YARN_PROXYSERVER_OPTS
    #export HADOOP_JOB_HISTORYSERVER_OPTS
    
    #Yarn Logs
    export YARN_LOG_DIR="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs"

    3、分发程序

    scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-01:/home/hadoopuser
    scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-02:/home/hadoopuser

    4、格式化NameNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop namenode -format

    5、启动JournalNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs/hadoop-puppet-journalnode-BigData-03.out

    验证JournalNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ jps
    
    9076 Jps
    9029 JournalNode

    6、启动HDFS

    集群启动法:Hadoop-NN-01: start-dfs.sh

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-dfs.sh

    单进程启动法:

    <1>NameNode(Hadoop-NN-01,Hadoop-NN-02):hadoop-daemon.sh start namenode

    <2>DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start datanode

    <3>JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode

    7、启动Yarn

    <1>集群启动

    Hadoop-NN-01启动Yarn,命令所在目录:$HADOOP_HOME/sbin

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-yarn.sh

    <2>单进程启动

    ResourceManager(Hadoop-NN-01,Hadoop-NN-02):yarn-daemon.sh start resourcemanager

    DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):yarn-daemon.sh start nodemanager

    验证(略)

  • 相关阅读:
    C#练习3
    C#练习2
    C#环境变量配置及csc命令详解(转自cy88310)
    建站流程(转)
    C#练习
    程序竞赛1
    排序算法
    输出有向图的邻接矩阵
    C#高效分页代码(不用存储过程)
    存储过程详解
  • 原文地址:https://www.cnblogs.com/hunttown/p/5452159.html
Copyright © 2011-2022 走看看