zoukankan      html  css  js  c++  java
  • Hadoop集群安装-CDH5(3台服务器集群)

    CDH5包下载:http://archive.cloudera.com/cdh5/

    主机规划:

    IP

    Host

    部署模块

    进程

    192.168.107.82

    Hadoop-NN-01

    NameNode

    ResourceManager

    NameNode

    DFSZKFailoverController

    ResourceManager

    192.168.107.83

    Hadoop-DN-01

    Zookeeper-01

    DataNode

    NodeManager

    Zookeeper

    DataNode

    NodeManager

    JournalNode

    QuorumPeerMain

    192.168.107.84

    Hadoop-DN-02

    Zookeeper-02

    DataNode

    NodeManager

    Zookeeper

    DataNode

    NodeManager

    JournalNode

    QuorumPeerMain

    各个进程解释:

    • NameNode
    • ResourceManager
    • DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
    • DataNode
    • NodeManager
    • JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
    • QuorumPeerMain:Zookeeper主进程

    目录规划:

    名称

    路径

    $HADOOP_HOME

    /home/hadoopuser/hadoop-2.6.0-cdh5.6.0

    Data

    $ HADOOP_HOME/data

    Log

    $ HADOOP_HOME/logs

     

    配置:

    一、关闭防火墙(防火墙可以以后配置)

    二、安装JDK(略)

    三、修改HostName并配置Host3台)

    [root@Linux01 ~]# vim /etc/sysconfig/network
    [root@Linux01 ~]# vim /etc/hosts
    
    192.168.107.82 Hadoop-NN-01
    192.168.107.83 Hadoop-DN-01 Zookeeper-01
    192.168.107.84 Hadoop-DN-02 Zookeeper-01

    四、为了安全,创建Hadoop专门登录的用户(5台)

    [root@Linux01 ~]# useradd hadoopuser
    [root@Linux01 ~]# passwd hadoopuser
    [root@Linux01 ~]# su – hadoopuser        #切换用户

    五、配置SSH免密码登录(2NameNode

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-keygen   #生成公私钥
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopuser@Hadoop-NN-01

    -I 表示 input

    ~/.ssh/id_rsa.pub 表示哪个公钥组

    或者省略为:

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id Hadoop-NN-01(或写IP:10.10.51.231)   #将公钥扔到对方服务器
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id ”6000 Hadoop-NN-01”  #如果带端口则这样写

    注意修改Hadoop的配置文件 Hadoop-env.sh

    export HADOOP_SSH_OPTS=”-p 6000”

    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01  #验证(退出当前连接命令:exit、logout)
    [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 –p 6000  #如果带端口这样写

    六、配置环境变量:vi ~/.bashrc 然后 source ~/.bashrc5台)

    [hadoopuser@Linux01 ~]$ vi ~/.bashrc
    # hadoop cdh5
    export HADOOP_HOME=/home/hadoopuser/hadoop-2.6.0-cdh5.6.0
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    
    [hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

    七、安装zookeeper2DataNode

    1、解压

    2、配置环境变量:vi ~/.bashrc

    [hadoopuser@Linux01 ~]$ vi ~/.bashrc
    # zookeeper cdh5
    export ZOOKEEPER_HOME=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    
    [hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

    3、修改日志输出

    [hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/libexec/zkEnv.sh
    56行: 找到如下位置修改语句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"

    4、修改配置文件

    [hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/conf/zoo.cfg
    
    # zookeeper
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0/data
    clientPort=2181
    
    # cluster
    server.1=Zookeeper-01:2888:3888
    server.2=Zookeeper-02:2888:3888

    5、设置myid

    (1)Hadoop-DN -01:

    mkdir $ZOOKEEPER_HOME/data
    echo 1 > $ZOOKEEPER_HOME/data/myid

    (2)Hadoop-DN -02:

    mkdir $ZOOKEEPER_HOME/data
    echo 2 > $ZOOKEEPER_HOME/data/myid

    6、各结点启动:

    [hadoopuser@Linux01 ~]$ zkServer.sh start

    7、验证

    [hadoopuser@Linux01 ~]$ jps
    
    3051 Jps
    2829 QuorumPeerMain

    8、状态

    [hadoopuser@Linux01 ~]$ zkServer.sh status
    
    JMX enabled by default
    Using config: /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg
    Mode: follower

    9、附录zoo.cfg各配置项说明

     

    属性

    意义

    tickTime

    时间单元,心跳和最低会话超时时间为tickTime的两倍

    dataDir

    数据存放位置,存放内存快照和事务更新日志

    clientPort

    客户端访问端口

    initLimit

    配 置 Zookeeper 接受客户端(这里所说的客户端不是用户连接 Zookeeper服务器的客户端,而是 Zookeeper 服务器集群中连接到 Leader 的 Follower 服务器)初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过 10 个心跳的时间(也就是 tickTime)长度后 Zookeeper 服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败。总的时间长度就是 5*2000=10 秒。

    syncLimit

    这个配置项标识 Leader 与 Follower 之间发送消息,请求和应答时间长度,最长不能超过多少个

    server.id=host:port:port

    server.A=BCD

    集群结点列表:

    A :是一个数字,表示这个是第几号服务器;

    B :是这个服务器的 ip 地址;

    C :表示的是这个服务器与集群中的 Leader 服务器交换信息的端口;

    D :表示的是万一集群中的 Leader 服务器挂了,需要一个端口来重新进行选举,选出一个新的 Leader,而这个端口就是用来执行选举时服务器相互通信的端口。如果是伪集群的配置方式,由于 B 都是一样,所以不同的 Zookeeper 实例通信端口号不能一样,所以要给它们分配不同的端口号。

     

    八、安装Hadoop,并配置(只装1配置完成后分发给其它节点)

    1、解压

    2、修改配置文件

    1)修改 $HADOOP_HOME/etc/hadoop/masters

    Hadoop-NN-01

    2)修改 $HADOOP_HOME/etc/hadoop/slaves

    Hadoop-DN-01
    Hadoop-DN-02

    3)修改 $HADOOP_HOME/etc/hadoop/vi core-site.xml

    <configuration>
            <property>
                   <name>fs.defaultFS</name>
                   <value>hdfs://Hadoop-NN-01:9000</value>
                   <description>定义HadoopMaster的URI和端口</description>
            </property>
            <property>
                   <name>io.file.buffer.size</name>
                   <value>131072</value>
                   <description>用作序列化文件处理时读写buffer的大小</description>
            </property>
            <property>
                   <name>hadoop.tmp.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/tmp</value>
                   <description>临时数据存储目录设定</description>
            </property>
    </configuration>

    (4)修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml

    <configuration>
            <property>
                   <name>dfs.namenode.name.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/name</value>
                   <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
            </property>
            <property>
                   <name>dfs.datanode.data.dir</name>
                   <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/data</value>
                   <description>datanode存放block本地目录(需要修改)</description>
            </property>
            <property>
                   <name>dfs.replication</name>
                   <value>1</value>
                   <description>文件副本个数,默认为3</description>
            </property>
            <property>
                <name>dfs.blocksize</name>
                <value>134217728</value>
                <description>块大小128M</description>
            </property>
            <property>
                <name>dfs.permissions</name>
                <value>false</value>
                <description>是否对DFS中的文件进行权限控制(测试中一般用false)</description>
            </property>
    </configuration>

    (5)修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml

    <configuration>
            <property>
                   <name>yarn.resourcemanager.address</name>
                   <value>Hadoop-NN-01:8032</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.scheduler.address</name>
                   <value>Hadoop-NN-01:8030</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.resource-tracker.address</name>
                   <value>Hadoop-NN-01:8031</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.admin.address</name>
                   <value>Hadoop-NN-01:8033</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.webapp.address</name>
                   <value>Hadoop-NN-01:8088</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services</name>
                   <value>mapreduce_shuffle</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
    </configuration>

    (6)修改 $HADOOP_HOME/etc/hadoop/ mapred-site.xml

    <configuration>
            <property>
                   <name>mapreduce.framework.name</name>
                   <value>yarn</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.address</name>
                   <value>Hadoop-NN-01:10020</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.webapp.address</name>
                   <value>Hadoop-NN-01:19888</value>
            </property>
    </configuration>

    7)修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh

    #--------------------Java Env------------------------------
    export JAVA_HOME="/usr/java/jdk1.8.0_73"
    #--------------------Hadoop Env----------------------------
    #export HADOOP_PID_DIR=${HADOOP_PID_DIR}
    export HADOOP_PREFIX="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0"
    #--------------------Hadoop Daemon Options-----------------
    # export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
    # export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
    #--------------------Hadoop Logs---------------------------
    #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
    #--------------------SSH PORT-------------------------------
    export HADOOP_SSH_OPTS="-p 6000"        #如果你修改了SSH登录端口,一定要修改此配置。

    8)修改 $HADOOP_HOME/etc/hadoop/yarn-env.sh

    #Yarn Daemon Options
    #export YARN_RESOURCEMANAGER_OPTS
    #export YARN_NODEMANAGER_OPTS
    #export YARN_PROXYSERVER_OPTS
    #export HADOOP_JOB_HISTORYSERVER_OPTS
    
    #Yarn Logs
    export YARN_LOG_DIR="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs"

    3、分发程序

    scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-01:/home/hadoopuser
    scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-02:/home/hadoopuser

    4、格式化NameNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop namenode -format

    5、启动JournalNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs/hadoop-puppet-journalnode-BigData-03.out

    验证JournalNode

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ jps
    
    9076 Jps
    9029 JournalNode

    6、启动HDFS

    集群启动法:Hadoop-NN-01: start-dfs.sh

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-dfs.sh

    单进程启动法:

    <1>NameNode(Hadoop-NN-01,Hadoop-NN-02):hadoop-daemon.sh start namenode

    <2>DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start datanode

    <3>JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode

    7、启动Yarn

    <1>集群启动

    Hadoop-NN-01启动Yarn,命令所在目录:$HADOOP_HOME/sbin

    [hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-yarn.sh

    <2>单进程启动

    ResourceManager(Hadoop-NN-01,Hadoop-NN-02):yarn-daemon.sh start resourcemanager

    DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):yarn-daemon.sh start nodemanager

    验证(略)

  • 相关阅读:
    CQUOJ 10819 MUH and House of Cards
    CQUOJ 9920 Ladder
    CQUOJ 9906 Little Girl and Maximum XOR
    CQUOJ 10672 Kolya and Tandem Repeat
    CQUOJ 9711 Primes on Interval
    指针试水
    Another test
    Test
    二分图匹配的重要概念以及匈牙利算法
    二分图最大匹配
  • 原文地址:https://www.cnblogs.com/hunttown/p/5452159.html
Copyright © 2011-2022 走看看