zoukankan      html  css  js  c++  java
  • 基于zookeeper的高可用Hadoop HA集群安装

    (1)hadoop2.7.1源码编译 http://aperise.iteye.com/blog/2246856
    (2)hadoop2.7.1安装准备 http://aperise.iteye.com/blog/2253544
    (3)1.x和2.x都支持的集群安装 http://aperise.iteye.com/blog/2245547
    (4)hbase安装准备 http://aperise.iteye.com/blog/2254451
    (5)hbase安装 http://aperise.iteye.com/blog/2254460
    (6)snappy安装 http://aperise.iteye.com/blog/2254487
    (7)hbase性能优化 http://aperise.iteye.com/blog/2282670
    (8)雅虎YCSBC测试hbase性能测试 http://aperise.iteye.com/blog/2248863
    (9)spring-hadoop实战 http://aperise.iteye.com/blog/2254491
    (10)基于ZK的Hadoop HA集群安装  http://aperise.iteye.com/blog/2305809

    1.Hadoop集群方式介绍

        1.1 hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式


             优点:搭建环境简单,适合开发者模式下调试程序

             缺点:namenode作为很重要的服务,存在单点故障,如果namenode出问题,会导致整个集群不可用

        

        1.2.仅hadoop2.x支持的active namenode+standby namenode方式



           优点:为解决1.x中namenode单节点故障而生,充分保障Hadoop集群的高可用

           缺点:需要zookeeper最少3台,需要journalnode最少三台,目前最多支持2台namenode,不过节点可以复用,但是不建议

        1.3 Hadoop官网关于集群方式介绍

            1)单机Hadoop环境搭建

            http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

            2)集群方式

                集群方式一(hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式)

                http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

                集群方式二(仅hadoop2.x支持的active namenode+standby namenode方式,也叫HADOOP HA方式),这种方式又分为HDFS的HA和YARN的HA单独分开讲解。

                         HDFS HA(zookeeper+journalnode)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

                         HDFS HA(zookeeper+NFS)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailability

                         YARN HA(zookeeper)http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

            生产环境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式,这里我讲解的是仅hadoop2.x支持基于zookeeper的Hadoop HA集群方式,这种方式主要适用于生产环境。

    2.基于zookeeper的Hadoop HA集群安装

        2.1 安装环境介绍


     

        2.2 安装前准备工作

            1)关闭防火墙

    centos7防火墙操作介绍 
    #centos7启动firewall
    systemctl start firewalld.service
    #centos7重启firewall
    systemctl restart firewalld.service
    #centos7停止firewall
    systemctl stop firewalld.service 
    #centos7禁止firewall开机启动
    systemctl disable firewalld.service 
    #centos7查看防火墙状态
    firewall-cmd --state
    #开放防火墙端口
    vi /etc/sysconfig/iptables-config
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6379 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6380 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6381 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16379 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16380 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16381 -j ACCEPT

             这里我关闭防火墙,root下执行如下命令:

    systemctl stop firewalld.service 
    systemctl disable firewalld.service

            2)优化selinux

            作用:Hadoop主节点管理子节点是通过SSH实现的, SELinux不关闭的情况下无法实现,会限制ssh免密码登录。

            编辑/etc/selinux/config,修改前:

    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    # enforcing - SELinux security policy is enforced.
    # permissive - SELinux prints warnings instead of enforcing.
    # disabled - No SELinux policy is loaded.
    SELINUX=enforcing
    # SELINUXTYPE= can take one of these two values:
    # targeted - Targeted processes are protected,
    # minimum - Modification of targeted policy. Only selected processes are protected. 
    # mls - Multi Level Security protection.
    SELINUXTYPE=targeted

             修改后:

    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    # enforcing - SELinux security policy is enforced.
    # permissive - SELinux prints warnings instead of enforcing.
    # disabled - No SELinux policy is loaded.
    #SELINUX=enforcing
    SELINUX=disabled
    # SELINUXTYPE= can take one of these two values:
    # targeted - Targeted processes are protected,
    # minimum - Modification of targeted policy. Only selected processes are protected. 
    # mls - Multi Level Security protection.
    #SELINUXTYPE=targeted

             执行以下命令使selinux 修改立即生效:

    setenforce 0

        3)机器名配置

            作用:Hadoop集群中机器IP可能变化导致集群间服务中断,所以在Hadoop中最好以机器名进行配置。

            修改各机器上文件/etc/hostname,配置主机名称如下:

    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.185.31 hadoop31
    192.168.185.32 hadoop32
    192.168.185.33 hadoop33
    192.168.185.34 hadoop34
    192.168.185.35 hadoop35

             而centos7下各个机器的主机名设置文件为/etc/hostname,以hadoop31节点主机配置为例,配置如下:

    #localdomain
    hadoop31

        4)创建hadoop用户和组

            作用:后续单独以用户hadoop来管理Hadoop集群,防止其他用户误操作关闭Hadoop 集群

    #以root用户创建hadoop用户和组创建hadoop用户和组 
    groupadd hadoop 
    useradd -g hadoop hadoop 
    #修改用户密码
    passwd hadoop

        5)用户hadoop免秘钥登录

            作用:Hadoop中主节点管理从节点是通过SSH协议登录到从节点实现的,而一般的SSH登录,都是需要输入密码验证的,为了Hadoop主节点方便管理成千上百的从节点,这里将主节点公钥拷贝到从节点,实现SSH协议免秘钥登录,我这里做的是所有主从节点之间机器免秘钥登录

    #首先切换到上面的hadoop用户,这里我是在hadoop31机器上操作 
    ssh hadoop31
    su hadoop 
    #生成非对称公钥和私钥,这个在集群中所有节点机器都必须执行,一直回车就行 
    ssh-keygen -t rsa 
    #通过ssh登录远程机器时,本机会默认将当前用户目录下的.ssh/authorized_keys带到远程机器进行验证,这里是/home/hadoop/.ssh/authorized_keys中公钥(来自其他机器上的/home/hadoop/.ssh/id_rsa.pub.pub),以下代码只在主节点执行就可以做到主从节点之间SSH免密码登录 
    cd /home/hadoop/.ssh/ 
    #首先将Master节点的公钥添加到authorized_keys 
    cat id_rsa.pub>>authorized_keys 
    #其次将Slaves节点的公钥添加到authorized_keys,这里我是在Hadoop31机器上操作的 
    ssh hadoop@192.168.185.32 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
    ssh hadoop@192.168.185.33 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
    ssh hadoop@192.168.185.34 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
    ssh hadoop@192.168.185.35 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
    #必须设置修改/home/hadoop/.ssh/authorized_keys权限 
    chmod 600 /home/hadoop/.ssh/authorized_keys 
    #这里将Master节点的authorized_keys分发到其他slaves节点 
    scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.32:/home/hadoop/.ssh/ 
    scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.33:/home/hadoop/.ssh/ 
    scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.34:/home/hadoop/.ssh/ 
    scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.35:/home/hadoop/.ssh/

        6)JDK安装

            作用:Hadoop需要java环境支撑,而Hadoop2.7.1最少需要java版本1.7,安装如下:

    #登录到到到hadoop用户下
    su hadoop
    #下载jdk-7u65-linux-x64.gz放置于/home/hadoop/java并解压
    cd /home/hadoop/java
    tar -zxvf jdk-7u65-linux-x64.gz
    #编辑vi /home/hadoop/.bashrc,在文件末尾追加如下内容
    export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65 
    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 
    export PATH=$PATH:$JAVA_HOME/bin 
    #使得/home/hadoop/.bashrc配置生效
    source /home/hadoop/.bashrc

             很多人是配置linux全局/etc/profile,这里不建议这么做,一旦有人在里面降级了java环境或者删除了java环境,就会出问题,建议的是在管理Hadoop集群的用户下面修改其.bashrc单独配置该用户环境变量

        7)zookeeper安装

    #1登录hadoop用户并下载并解压zookeeper3.4.6
    su hadoop
    cd /home/hadoop 
    tar -zxvf zookeeper-3.4.6.tar.gz 

    #2在集群中各个节点中配置/etc/hosts,内容如下:
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.185.31 hadoop31 
    192.168.185.32 hadoop32 
    192.168.185.33 hadoop33 
    192.168.185.34 hadoop34 
    192.168.185.35 hadoop35

    #3在集群中各个节点中创建zookeeper数据文件
    ssh hadoop31
    cd /home/hadoop 
    #zookeeper数据存放位置
    mkdir -p /opt/hadoop/zookeeper 
    ssh hadoop32
    cd /home/hadoop 
    #zookeeper数据存放位置
    mkdir -p /opt/hadoop/zookeeper 
    ssh hadoop33
    cd /home/hadoop 
    #zookeeper数据存放位置
    mkdir -p /opt/hadoop/zookeeper 
    ssh hadoop34
    cd /home/hadoop 
    #zookeeper数据存放位置
    mkdir -p /opt/hadoop/zookeeper 
    ssh hadoop35
    cd /home/hadoop 
    #zookeeper数据存放位置
    mkdir -p /opt/hadoop/zookeeper 

    #4配置zoo.cfg
    ssh hadoop31
    cd /home/hadoop/zookeeper-3.4.6/conf
    cp zoo_sample.cfg zoo.cfg
    vi zoo.cfg
    #内容如下
    initLimit=10 
    syncLimit=5 
    dataDir=/opt/hadoop/zookeeper 
    clientPort=2181 
    #数据文件保存最近的3个快照,默认是都保存,时间长的话会占用很大磁盘空间
    autopurge.snapRetainCount=3
    #单位为小时,每小时清理一次快照数据
    autopurge.purgeInterval=1
    server.1=hadoop31:2888:3888 
    server.2=hadoop32:2888:3888 
    server.3=hadoop33:2888:3888
    server.4=hadoop34:2888:3888 
    server.5=hadoop35:2888:3888 
    #5在hadoop31上远程复制分发安装文件
    scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop32:/home/hadoop/ 
    scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop33:/home/hadoop/ 
    scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop34:/home/hadoop/ 
    scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop35:/home/hadoop/ 

    #6在集群中各个节点设置myid必须为数字 
    ssh hadoop31 
    echo "1" > /opt/hadoop/zookeeper/myid 
    ssh hadoop32 
    echo "2" > /opt/hadoop/zookeeper/myid 
    ssh hadoop33 
    echo "3" > /opt/hadoop/zookeeper/myid 

    #7.各个节点如何启动zookeeper
    ssh hadoop31
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

    #8.各个节点如何关闭zookeeper
    ssh hadoop31
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh stop 

    #9.各个节点如何查看zookeeper状态
    ssh hadoop31
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh status 

    #10.各个节点如何通过客户端访问zookeeper上目录数据
    ssh hadoop31
    /home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181

        2.3 Hadoop HA安装

            1)hadoop-2.7.1.tar.gz

    #下载hadoop-2.7.1.tar.gz放置于/home/hadoop下并解压,这里我在hadoop31操作
    ssh hadoop31
    su hadoop
    cd /home/hadoop
    tar –zxvf hadoop-2.7.1.tar.gz

          

            2)core-site.xml

            修改配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml

    [xml] view plain copy
     
    1. <?xml version="1.0" encoding="UTF-8"?>    
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>    
    3. <configuration>    
    4.     <!-- 开启垃圾回收站功能,HDFS文件删除后先进入垃圾回收站,垃圾回收站最长保留数据时间为1天,超过一天后就删除 -->   
    5.     <property>  
    6.         <name>fs.trash.interval</name>  
    7.         <value>1440</value>  
    8.     </property>  
    9.     <!-- Hadoop HA部署方式下namenode访问地址,bigdatacluster-ha是名字可自定义,后面hdfs-site.xml会用到 -->   
    10.     <property>  
    11.         <name>fs.defaultFS</name>    
    12.         <value>hdfs:// bigdatacluster-ha</value>  
    13.     </property>  
    14.     <!--hadoop访问文件的IO操作都需要通过代码库。因此,在很多情况下,io.file.buffer.size都被用来设置SequenceFile中用到的读/写缓存大小。不论是对硬盘或者是网络操作来讲,较大的缓存都可以提供更高的数据传输,但这也就意味着更大的内存消耗和延迟。这个参数要设置为系统页面大小的倍数,以byte为单位,默认值是4KB,一般情况下,可以设置为64KB(65536byte),这里设置128K-->    
    15.     <property>    
    16.         <name>io.file.buffer.size</name>    
    17.         <value>131072</value>    
    18.     </property>   
    19.     <!-- 指定hadoop临时目录 -->   
    20.     <property>   
    21.         <name>hadoop.tmp.dir</name>   
    22.         <value>/opt/hadoop/tmp</value>   
    23.     </property>   
    24.     <!-- 指定zookeeper地址 -->   
    25.     <property>   
    26.         <name>ha.zookeeper.quorum</name>   
    27.         <value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value>   
    28.     </property>   
    29.     <property>   
    30.         <name>ha.zookeeper.session-timeout.ms</name>   
    31.         <value>300000</value>   
    32.     </property>  
    33.     <!-- 指定Hadoop压缩格式,Apache官网下载的安装包不支持snappy,需要自己编译安装,如何编译安装包我在博客http://aperise.iteye.com/blog/2254487有讲解,不适用snappy的话可以不配置 -->   
    34.     <property>    
    35.         <name>io.compression.codecs</name>    
    36.         <value>org.apache.hadoop.io.compress.SnappyCodec</value>    
    37.     </property>    
    38. </configuration>  

            3)hdfs-site.xml

            修改配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

    [xml] view plain copy
     
    1. <?xml version="1.0" encoding="UTF-8"?>   
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>   
    3. <configuration>   
    4.     <!--指定hdfs的nameservice为bigdatacluster-ha,需要和core-site.xml中的保持一致 -->   
    5.     <property>   
    6.         <name>dfs.nameservices</name>   
    7.         <value>bigdatacluster-ha</value>   
    8.     </property>   
    9.     <!—指定磁盘预留多少空间,防止磁盘被撑满用完,单位为bytes -->   
    10.     <property>  
    11.         <name>dfs.datanode.du.reserved</name>  
    12.         <value>107374182400</value>  
    13.     </property>  
    14.     <!-- bigdatacluster-ha下面有两个NameNode,分别是namenode1,namenode2 -->   
    15.     <property>   
    16.         <name>dfs.ha.namenodes.bigdatacluster-ha</name>   
    17.         <value>namenode1,namenode2</value>   
    18.     </property>   
    19.     <!-- namenode1的RPC通信地址,这里端口要和core-site.xml中fs.defaultFS保持一致 -->   
    20.     <property>   
    21.         <name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode1</name>   
    22.         <value>hadoop31:9000</value>   
    23.     </property>   
    24.     <!-- namenode1的http通信地址 -->   
    25.     <property>   
    26.         <name>dfs.namenode.http-address.bigdatacluster-ha.namenode1</name>   
    27.         <value>hadoop31:50070</value>   
    28.     </property>   
    29.     <!-- namenode2的RPC通信地址,这里端口要和core-site.xml中fs.defaultFS保持一致 -->   
    30.     <property>   
    31.         <name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode2</name>   
    32.         <value>hadoop32:9000</value>   
    33.     </property>   
    34.     <!-- namenode2的http通信地址 -->   
    35.     <property>   
    36.         <name>dfs.namenode.http-address.bigdatacluster-ha.namenode2</name>   
    37.         <value>hadoop32:50070</value>   
    38.     </property>   
    39.   
    40.     <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->   
    41.     <property>   
    42.         <name>dfs.namenode.shared.edits.dir</name>   
    43.         <value>qjournal://hadoop31:8485;hadoop32:8485;hadoop33:8485;hadoop34:8485;hadoop35:8485/bigdatacluster-ha</value>   
    44.     </property>   
    45.   
    46.     <!-- 配置失败自动切换实现方式 -->   
    47.     <property>   
    48.         <name>dfs.client.failover.proxy.provider.bigdatacluster-ha</name>  
    49.         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>  
    50.     </property>   
    51.   
    52.     <!-- 配置隔离机制,主要用户远程管理监听其他机器相关服务 -->   
    53.     <property>   
    54.         <name>dfs.ha.fencing.methods</name>   
    55.         <value>sshfence</value>   
    56.     </property>   
    57.     <!-- 使用隔离机制时需要ssh免密码登陆 -->   
    58.     <property>   
    59.         <name>dfs.ha.fencing.ssh.private-key-files</name>   
    60.         <value>/home/hadoop/.ssh/id_rsa</value>   
    61.     </property>   
    62.   
    63.     <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->   
    64.     <property>   
    65.         <name>dfs.journalnode.edits.dir</name>   
    66.         <value>/opt/hadoop/journal</value>   
    67.     </property>   
    68.   
    69.     <!--指定支持高可用自动切换机制-->   
    70.     <property>   
    71.         <name>dfs.ha.automatic-failover.enabled</name>   
    72.         <value>true</value>   
    73.     </property>   
    74.   
    75.     <!--指定namenode名称空间的存储地址-->   
    76.     <property>   
    77.         <name>dfs.namenode.name.dir</name>      
    78.         <value>file:/opt/hadoop/hdfs/name</value>   
    79.     </property>   
    80.   
    81.     <!--指定datanode数据存储地址-->   
    82.     <property>   
    83.         <name>dfs.datanode.data.dir</name>   
    84.         <value>file:/opt/hadoop/hdfs/data</value>   
    85.     </property>   
    86.   
    87.     <!--指定数据冗余份数-->   
    88.     <property>   
    89.         <name>dfs.replication</name>   
    90.         <value>3</value>   
    91.     </property>   
    92.   
    93.     <!--指定可以通过web访问hdfs目录-->   
    94.     <property>   
    95.         <name>dfs.webhdfs.enabled</name>   
    96.         <value>true</value>   
    97.     </property>   
    98.   
    99.     <property>   
    100.         <name>ha.zookeeper.quorum</name>   
    101.         <value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value>   
    102.     </property>   
    103.   
    104.       
    105.     <property>  
    106.         <name>dfs.namenode.handler.count</name>  
    107.         <value>600</value>  
    108.         <description>The number of server threads for the namenode.</description>  
    109.     </property>  
    110.     <property>  
    111.         <name>dfs.datanode.handler.count</name>  
    112.         <value>600</value>  
    113.         <description>The number of server threads for the datanode.</description>  
    114.     </property>  
    115.     <property>  
    116.         <name>dfs.client.socket-timeout</name>  
    117.         <value>600000</value>  
    118.     </property>  
    119.     <property>    
    120.         <!--这里设置Hadoop允许打开最大文件数,默认4096,不设置的话会提示xcievers exceeded错误-->    
    121.         <name>dfs.datanode.max.transfer.threads</name>    
    122.         <value>409600</value>    
    123.     </property>     
    124. </configuration>  

            4)mapred-site.xml

            修改配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml

    [html] view plain copy
     
    1. <?xml version="1.0"?>     
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>     
    3. <configuration>     
    4.     <!-- 配置MapReduce运行于yarn中 -->     
    5.     <property>     
    6.         <name>mapreduce.framework.name</name>     
    7.         <value>yarn</value>     
    8.     </property>      
    9.     <property>    
    10.         <name>mapreduce.job.maps</name>    
    11.         <value>12</value>    
    12.     </property>    
    13.     <property>    
    14.         <name>mapreduce.job.reduces</name>    
    15.         <value>12</value>    
    16.     </property>    
    17.     
    18.     <!-- 指定Hadoop压缩格式,Apache官网下载的安装包不支持snappy,需要自己编译安装,如何编译安装包我在博客http://aperise.iteye.com/blog/2254487有讲解,不适用snappy的话可以不配置 -->     
    19.     <property>    
    20.         <name>mapreduce.output.fileoutputformat.compress</name>    
    21.         <value>true</value>    
    22.         <description>Should the job outputs be compressed?    
    23.         </description>    
    24.     </property>    
    25.     <property>    
    26.         <name>mapreduce.output.fileoutputformat.compress.type</name>    
    27.         <value>RECORD</value>    
    28.         <description>If the job outputs are to compressed as SequenceFiles, how should    
    29.                they be compressed? Should be one of NONE, RECORD or BLOCK.    
    30.         </description>    
    31.     </property>    
    32.     <property>    
    33.         <name>mapreduce.output.fileoutputformat.compress.codec</name>    
    34.         <value>org.apache.hadoop.io.compress.SnappyCodec</value>    
    35.         <description>If the job outputs are compressed, how should they be compressed?    
    36.         </description>    
    37.     </property>    
    38.     <property>    
    39.         <name>mapreduce.map.output.compress</name>    
    40.         <value>true</value>    
    41.         <description>Should the outputs of the maps be compressed before being    
    42.                sent across the network. Uses SequenceFile compression.    
    43.         </description>    
    44.     </property>    
    45.     <property>    
    46.         <name>mapreduce.map.output.compress.codec</name>    
    47.         <value>org.apache.hadoop.io.compress.SnappyCodec</value>    
    48.         <description>If the map outputs are compressed, how should they be     
    49.                compressed?    
    50.         </description>    
    51.     </property>      
    52. </configuration>   

            5)yarn-site.xml

            修改配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-site.xml

    [java] view plain copy
     
    1. <?xml version="1.0"?>   
    2. <configuration>   
    3.     <!--日志聚合功能yarn.log start------------------------------------------------------------------------>    
    4.     <property>   
    5.         <name>yarn.log-aggregation-enable</name>   
    6.         <value>true</value>   
    7.     </property>   
    8.     <!--在HDFS上聚合的日志最长保留多少秒。3天-->    
    9.     <property>   
    10.         <name>yarn.log-aggregation.retain-seconds</name>   
    11.         <value>259200</value>   
    12.     </property>   
    13.     <!--日志聚合功能yarn.log end-------------------------------------------------------------------------->    
    14.   
    15.     <!--resourcemanager失联后重新链接的时间-->    
    16.     <property>    
    17.         <name>yarn.resourcemanager.connect.retry-interval.ms</name>  
    18.         <value>2000</value>    
    19.     </property>   
    20.   
    21.     <!--配置resourcemanager start------------------------------------------------------------------------->  
    22.     <property>   
    23.         <name>yarn.resourcemanager.zk-address</name>   
    24.         <value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value>    
    25.     </property>   
    26.     <property>    
    27.         <name>yarn.resourcemanager.cluster-id</name>    
    28.         <value>besttonecluster-yarn</value>    
    29.     </property>    
    30.     <!--开启resourcemanager HA,默认为false-->    
    31.     <property>    
    32.         <name>yarn.resourcemanager.ha.enabled</name>    
    33.         <value>true</value>    
    34.     </property>    
    35.     <property>   
    36.         <name>yarn.resourcemanager.ha.rm-ids</name>   
    37.         <value>rm1,rm2</value>   
    38.     </property>   
    39.     <property>   
    40.         <name>yarn.resourcemanager.hostname.rm1</name>   
    41.         <value>hadoop31</value>   
    42.     </property>       
    43.     <property>   
    44.         <name>yarn.resourcemanager.hostname.rm2</name>   
    45.         <value>hadoop32</value>   
    46.     </property>   
    47.     <!--配置rm1-->   
    48.     <property>  
    49.         <name>yarn.resourcemanager.webapp.address.rm1</name>  
    50.         <value>hadoop31:8088</value>  
    51.     </property>  
    52.     <!--配置rm2-->    
    53.     <property>  
    54.         <name>yarn.resourcemanager.webapp.address.rm2</name>  
    55.         <value>hadoop32:8088</value>  
    56.     </property>  
    57.     <!--开启故障自动切换-->    
    58.     <property>  
    59.         <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>  
    60.         <value>true</value>  
    61.     </property>  
    62.     <property>  
    63.         <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>  
    64.         <value>true</value>  
    65.     </property>  
    66.     <property>  
    67.         <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>  
    68.         <value>/yarn-leader-election</value>  
    69.     </property>  
    70.   
    71.     <!--开启自动恢复功能-->    
    72.     <property>   
    73.         <name>yarn.resourcemanager.recovery.enabled</name>    
    74.         <value>true</value>    
    75.     </property>   
    76.     <property>  
    77.         <name>yarn.resourcemanager.store.class</name>  
    78.         <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>  
    79.     </property>  
    80.     <!--配置resourcemanager end--------------------------------------------------------------------------->  
    81.   
    82.     <!--配置nodemanager start----------------------------------------------------------------------------->  
    83.     <property>    
    84.         <name>yarn.nodemanager.aux-services</name>    
    85.         <value>mapreduce_shuffle</value>    
    86.     </property>    
    87.     <property>    
    88.         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    
    89.         <value>org.apache.hadoop.mapred.ShuffleHandler</value>    
    90.     </property>    
    91.     <!--配置nodemanager end------------------------------------------------------------------------------->  
    92. </configuration>  

            6)slaves

            修改配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/slaves

    Hadoop31
    Hadoop32
    Hadoop33
    Hadoop34
    Hadoop35

            7)hadoop-env.sh和yarn-env.sh

            在/home/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh和/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-env.sh中配置JAVA_HOME

    export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65

            8)bashrc

            当前用户hadoop生效,在用户目录下/home/hadoop/.bashrc增加如下配置

    export HADOOP_HOME=/home/hadoop/hadoop2.7.1
    export PATH=${HADOOP_HOME}/bin:${PATH}

            9)分发安装文件到其他机器

    #这里我是在hadoop31上操作
    scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop32:/home/hadoop/
    scp -r /home/hadoop/hadoop-2.7.1 hadoop@ hadoop33:/home/hadoop/
    scp -r /home/hadoop/hadoop-2.7.1 hadoop@ hadoop34:/home/hadoop/ 
    scp -r /home/hadoop/hadoop-2.7.1 hadoop@ hadoop35:/home/hadoop/

        2.4 Hadoop HA初次启动

            1)启动zookeeper

    ssh hadoop31
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
    ssh hadoop32
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
    ssh hadoop33
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
    ssh hadoop34
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
    ssh hadoop35
    /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

             #jps查看是否有QuorumPeerMain 进程

            #/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh status查看zookeeper状态

            #/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh stop关闭zookeeper

            2)格式化zookeeper上hadoop-ha目录

    /home/hadoop/hadoop-2.7.1/bin/hdfs zkfc –formatZK
    #可以通过如下方法检查zookeeper上是否已经有Hadoop HA目录
    # /home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181 
    #ls /

            3)启动namenode日志同步服务journalnode

    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
    ssh hadoop32
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
    ssh hadoop33
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
    ssh hadoop34
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
    ssh hadoop35
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode

            4)格式化namenode

    #这步操作只能在namenode服务节点hadoop31或者hadoop32执行中一台上执行
    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/bin/hdfs namenode -format

            5)启动namenode、同步备用namenode、启动备用namenode

    #启动namenode
    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode 
    #同步备用namenode、启动备用namenode
    ssh hadoop32
    /home/hadoop/hadoop-2.7.1/bin/hdfs namenode -bootstrapStandby 
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode

            6)启动DFSZKFailoverController

    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc 
    ssh hadoop32
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc

            7)启动datanode

    #注意hadoop-daemons.sh datanode是启动所有datanode,而hadoop-daemon.sh datanode是启动单个datanode
    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/sbin/hadoop-daemons.sh start datanode

            8)启动yarn

    #在hadoop31上启动resouremanager,在hadoop31,hadoop32,hadoop33,hadoop34,hadoop35上启动nodemanager
    ssh hadoop31
    /home/hadoop/hadoop-2.7.1/sbin/start-yarn.sh 
    #在hadoop31上启动备用resouremanager
    ssh hadoop32
    /home/hadoop/hadoop-2.7.1/sbin/yarn-daemon.sh start resourcemanager

             至此,Hadoop 基于zookeeper的高可用集群就安装成功,并且启动了。

  • 相关阅读:
    书单
    树莓派与 NATAPP 实现内网穿透
    WinForm分辨率适应-高DPI自动缩放
    ElasticSearch学习——搜索技术基础知识(上)
    JavaSE学习笔记-基础
    JavaSE学习笔记-第一个Java程序
    JavaSE学习笔记-Java开发环境搭建
    MySQL学习笔记-增删改查
    MySQL学习笔记-函数
    MySQL学习笔记-查询
  • 原文地址:https://www.cnblogs.com/hyl8218/p/8723040.html
Copyright © 2011-2022 走看看