zoukankan      html  css  js  c++  java
  • Apache Hadoop集群安装(NameNode HA + YARN HA + SPARK + 机架感知)


    1、主机规划

    序号主机名IP地址角色
    1nn-1192.168.9.21NameNode、mr-jobhistory、zookeeper、JournalNode
    2nn-2192.168.9.22Secondary NameNodeJournalNode
    3dn-1192.168.9.23DataNode、JournalNode、zookeeper、ResourceManager、NodeManager
    4dn-2192.168.9.24DataNode、zookeeper、ResourceManager、NodeManager
    5dn-3192.168.9.25DataNode、NodeManager

    集群说明:
    (1)、对于集群规模小于7台和以下的, 可以不做NameNode HA。
    (2)、HA的集群, JournalNode节点要在3个以上, 建议设置成5个节点。JournalNode是轻量级服务, 为了本地性, 其中两个JournalNode和两台NameNode节点复用。其他JournalNode和分散在其他节点上。
    3HA的集群,zookeeper节点要在3个以上, 建议设置成5个或者7个节点。zookeeper可以和DataNode节点复用。
    (4HA的集群,ResourceManager建议单独一个节点。对于较大规模的集群,且有空闲的主机资源, 可以考虑设置ResourceManager的HA。

    2、主机环境设置

    2.1 配置JDK


    卸载OpenJDK:
    1. --查看java版本
    2. [root@dtgr ~]# java -version
    3. java version "1.7.0_45"
    4. OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
    5. OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)--查看安装源

    6. [root@dtgr ~]# rpm -qa | grep java
    7. java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64-- 卸载
    8. [root@dtgr ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64

    9. --验证是否卸载成功
    10. [root@dtgr ~]# rpm -qa | grep java
    11. [root@dtgr ~]# java -version
    12. -bash: /usr/bin/java: 没有那个文件或目录

    安装jdk:
    1. -- 下载并解压java源码包
    2. [root@dtgr java]# mkdir /usr/local/java
    3. [root@dtgr java]# mv jdk-7u79-linux-x64.tar.gz /usr/local/java
    4. [root@dtgr java]# cd /usr/local/java
    5. [root@dtgr java]# tar xvf jdk-7u79-linux-x64.tar.gz
    6. [root@dtgr java]# ls
    7. jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
    8. [root@dtgr java]#
    9. --- 添加环境变量
    10. [root@dtgr java]# vim /etc/profile
    11. [root@dtgr java]# tail /etc/profile
    12. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
    13. export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre
    14. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
    15. export PATH=$JAVA_HOME/bin:$PATH
    16. -- 生效环境变量
    17. [root@dtgr ~]# source /etc/profile
    18. -- 验证
    19. [root@dtgr ~]# java -version
    20. java version "1.7.0_79"
    21. Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
    22. Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
    23. [root@dtgr ~]# javac -version
    24. javac 1.7.0_79

    2.2 修改主机名和配置主机名解析

    在所有节点按照规划修改主机名, 并将主机名加入/etc/hosts文件。
    修改主机名:
    1. [root@dn-3 ~]# cat /etc/sysconfig/network
    2. NETWORKING=yes
    3. HOSTNAME=dn-3
    4. [root@dn-3 ~]# hostname dn-3

    配置/etc/hosts, 并分发到所有节点:
    1. [root@dn-3 ~]# cat /etc/hosts
    2. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    3. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    4. 192.168.9.21 nn-1
    5. 192.168.9.22 nn-2
    6. 192.168.9.23 dn-1
    7. 192.168.9.24 dn-2
    8. 192.168.9.25 dn-3

    2.3 新建hadoop账户

    用户和组均为hadoop, 密码为hadoop, home目录为/hadoop。
    1. [root@dn-3 ~]# useradd -d /hadoop hadoop

    2.4 配置ntp时钟同步

    将nn-1主机作为时钟源)
    #vi  /etc/ntp.conf
    #server 0.centos.pool.ntp.org
    #server 1.centos.pool.ntp.org
    #server 2.centos.pool.ntp.org
    server nn-1

    配置ntp服务自启动
    #chkconfig ntpd on
    启动ntp服务
    #service ntpd start

    2.5 关闭防火墙iptables和selinux

    (1)、关闭iptables
    1. [root@dn-3 ~]# service iptables stop
    2. [root@dn-3 ~]# chkconfig iptables off
    3. [root@dn-3 ~]# chkconfig --list | grep iptables
    4. iptables 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
    5. [root@dn-3 ~]#

    (2)、关闭selinux
    1. [root@dn-3 ~]# setenforce 0
    2. setenforce: SELinux is disabled
    3. [root@dn-3 ~]# vim /etc/sysconfig/selinux
    SELINUX=disabled

    2.6 设置ssh无密码登陆

    (1)、在所有节点生成密钥
    所有节点, 切换到hadoop用户下,生成密钥,一路回车:
    1. [hadoop@nn-1 ~]$ ssh-keygen -t rsa

    (2)、在nn-1上面,将公钥复制到文件authorized_keys中:
    命令:$ ssh  主机名   'cat ./.ssh/id_rsa.pub' >> authorized_keys
    将上面的命令的主机名替换成实际的主机名, 在nn-1上面将所有的主机都执行一次,包括自己, 如下示例:
    1. [hadoop@nn-1 ~]$ ssh nn-1 'cat ./.ssh/id_rsa.pub' >> authorized_keys
    2. hadoop@nn-1's password:

    (3)、设置权限
    1. [hadoop@nn-1 .ssh]$ chmod 644 authorized_keys

    (4)、将authorized_keys分发到所有节点: $HOME/.ssh/ 。
    如下示例:
    1. [hadoop@nn-1 .ssh]$ scp authorized_keys hadoop@nn-2:/hadoop/.ssh/

    3、安装配置Hadoop


    说明: 先在nn-1上面修改配置, 配置完毕批量分发到其他节点。

    3.1 上传hadoop、zookeeper安装包

    复制安装包到/hadoop目录下。
    解压安装包: [hadoop@nn-1 ~]$ tar -xzvf hadoop2-js-0121.tar.gz

    3.2 修改hadoop-env.sh

    1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
    2. export HADOOP_HEAPSIZE=2000
    3. export HADOOP_NAMENODE_INIT_HEAPSIZE=10000
    4. export HADOOP_OPTS="-server $HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
    5. export HADOOP_NAMENODE_OPTS="-Xmx15000m -Xms15000m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger
    6. =${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

    3.3 修改core-site.xml

    1. <configuration>
    2. <property>
    3. <name>fs.defaultFS</name>
    4. <value>hdfs://dpi</value>
    5. </property>
    6. <property>
    7. <name>io.file.buffer.size</name>
    8. <value>131072</value>
    9. </property>
    10. <property>
    11. <name>hadoop.tmp.dir</name>
    12. <value>file:/hadoop/hdfs/temp</value>
    13. <description>Abase for other temporary directories.</description>
    14. </property>
    15. <property>
    16. <name>hadoop.proxyuser.hduser.hosts</name>
    17. <value>*</value>
    18. </property>
    19. <property>
    20. <name>hadoop.proxyuser.hduser.groups</name>
    21. <value>*</value>
    22. </property>
    23. <property>
    24. <name>ha.zookeeper.quorum</name>
    25. <value>dn-1:2181,dn-2:2181,dn-3:2181</value>
    26. </property>
    27. </configuration>

    3.4 修改hdfs-site.xml

    1. <configuration>
    2. <property>
    3. <name>dfs.namenode.secondary.http-address</name>
    4. <value>nn-1:9001</value>
    5. </property>
    6. <property>
    7. <name>dfs.namenode.name.dir</name>
    8. <value>file:/hadoop/hdfs/name</value>
    9. </property>
    10. <property>
    11. <name>dfs.datanode.data.dir</name>
    12. <value>file:/hadoop/hdfs/data,file:/hadoopdata/hdfs/data</value>
    13. </property>
    14. <property>
    15. <name>dfs.replication</name>
    16. <value>3</value>
    17. </property>
    18. <property>
    19. <name>dfs.webhdfs.enabled</name>
    20. <value>true</value>
    21. </property>
    22. <property>
    23. <name>dfs.nameservices</name>
    24. <value>dpi</value>
    25. </property>
    26. <property>
    27. <name>dfs.ha.namenodes.dpi</name>
    28. <value>nn-1,nn-2</value>
    29. </property>
    30. <property>
    31. <name>dfs.namenode.rpc-address.dpi.nn-1</name>
    32. <value>nn-1:9000</value>
    33. </property>
    34. <property>
    35. <name>dfs.namenode.http-address.dpi.nn-1</name>
    36. <value>nn-1:50070</value>
    37. </property>
    38. <property>
    39. <name>dfs.namenode.rpc-address.dpi.nn-2</name>
    40. <value>nn-2:9000</value>
    41. </property>
    42. <property>
    43. <name>dfs.namenode.http-address.dpi.nn-2</name>
    44. <value>nn-2:50070</value>
    45. </property>
    46. <property>
    47. <name>dfs.namenode.servicerpc-address.dpi.nn-1</name>
    48. <value>nn-1:53310</value>
    49. </property>
    50. <property>
    51. <name>dfs.namenode.servicerpc-address.dpi.nn-2</name>
    52. <value>nn-2:53310</value>
    53. </property>
    54. <property>
    55. <name>dfs.ha.automatic-failover.enabled</name>
    56. <value>true</value>
    57. </property>
    58. <property>
    59. <name>dfs.namenode.shared.edits.dir</name>
    60. <value>qjournal://nn-1:8485;nn-2:8485;dn-1:8485/dpi</value>
    61. </property>
    62. <property>
    63. <name>dfs.client.failover.proxy.provider.dpi</name>
    64. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    65. </property>
    66. <property>
    67. <name>dfs.journalnode.edits.dir</name>
    68. <value>/hadoop/hdfs/journal</value>
    69. </property>
    70. <property>
    71. <name>dfs.ha.fencing.methods</name>
    72. <value>sshfence</value>
    73. </property>
    74. <property>
    75. <name>dfs.ha.fencing.ssh.private-key-files</name>
    76. <value>/hadoop/.ssh/id_rsa</value>
    77. </property>
    78. </configuration>
    新建配置文件中的目录:
    1. mkdir -p /hadoop/hdfs/name
    2. mkdir -p /hadoop/hdfs/data
    3. mkdir -p /hadoop/hdfs/temp
    4. mkdir -p /hadoop/hdfs/journal
    5. 授权:chmod 755 /hadoop/hdfs
    6. mkdir -p /hadoopdata/hdfs/data
    7. chmod 755 /hadoopdata/hdfs

    属主和属组修改为:hadoop:hadoop


    3.5 修改mapred-site.xml

    1. <configuration>
    2. <property>
    3. <name>mapreduce.framework.name</name>
    4. <value>yarn</value>
    5. </property>
    6. <property>
    7. <name>mapreduce.jobhistory.address</name>
    8. <value>nn-1:10020</value>
    9. </property>
    10. <property>
    11. <name>mapreduce.jobhistory.webapp.address</name>
    12. <value>nn-1:19888</value>
    13. </property>
    14. </configuration>


    3.6 修改yarn-site.xml

    启用yarn ha功能, 根据规划, dn-1和dn-2为ResourceManager节点
    1. <configuration>
    2. <!-- Site specific YARN configuration properties -->
    3. <property>
    4. <name>yarn.nodemanager.aux-services</name>
    5. <value>mapreduce_shuffle</value>
    6. </property>
    7. <property>
    8. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    9. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    10. </property>
    11. <property>
    12. <name>yarn.resourcemanager.ha.enabled</name>
    13. <value>true</value>
    14. </property>
    15. <property>
    16. <name>yarn.resourcemanager.ha.rm-ids</name>
    17. <value>rm1,rm2</value>
    18. </property>
    19. <property>
    20. <name>yarn.resourcemanager.hostname.rm1</name>
    21. <value>dn-1</value>
    22. </property>
    23. <property>
    24. <name>yarn.resourcemanager.hostname.rm2</name>
    25. <value>dn-2</value>
    26. </property>
    27. <property>
    28. <name>yarn.resourcemanager.recovery.enabled</name>
    29. <value>true</value>
    30. </property>
    31. <property>
    32. <name>yarn.resourcemanager.store.class</name>
    33. <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    34. </property>
    35. <property>
    36. <name>yarn.resourcemanager.zk-address</name>
    37. <value>dn-1:2181,dn-2:2181,dn-3:2181</value>
    38. <description>For multiple zk services, separate them with comma</description>
    39. </property>
    40. <property>
    41. <name>yarn.resourcemanager.cluster-id</name>
    42. <value>yarn-ha</value>
    43. </property>
    44. </configuration>

    3.7 修改slaves

    将所有的DataNode节点加入到slaves文件中:
    1. dn-1
    2. dn-2
    3. dn-3


    3.8 修改yarn-env.sh

    1. # some Java parameters
    2. # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
    3. if [ "$JAVA_HOME" != "" ]; then
    4. #echo "run java in $JAVA_HOME"
    5. JAVA_HOME=/usr/local/java/jdk1.7.0_79
    6. fi
    7. JAVA_HEAP_MAX=-Xmx15000m
    8. YARN_HEAPSIZE=15000
    9. export YARN_RESOURCEMANAGER_HEAPSIZE=5000
    10. export YARN_TIMELINESERVER_HEAPSIZE=10000
    11. export YARN_NODEMANAGER_HEAPSIZE=10000

    3.9 分发配置好的hadoop目录到所有节点

    1. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@nn-2:/hadoop
    2. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-1:/hadoop
    3. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-2:/hadoop
    4. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-3:/hadoop

    4 安装配置zookeeper

    切换到hadoop目录下面, 根据规划, 三台zookeeper节点为:nn-1, dn-1, dn-2。
    先在nn-1节点配置zookeeper, 然后分发至三个zookeeper节点:

    4.1 在nn-1上传并解压zookeeper


    4.2 修改配置文件/hadoop/zookeeper/conf/zoo.cfg

    1. dataDir=/hadoop/zookeeper/data/
    2. dataLogDir=/hadoop/zookeeper/log/
    3. # the port at which the clients will connect
    4. clientPort=2181
    5. server.1=nn-1:2887:3887
    6. server.2=dn-1:2888:3888
    7. server.3=dn-2:2889:3889

    4.3 从nn-1分发配置的zookeeper目录到其他节点

    1. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-1:/hadoop
    2. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-2:/hadoop

    4.4 在所有zk节点创建目录

    1. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/data/
    2. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/log/

    4.5 修改myid

    在所有zk节点, 切换到目录/hadoop/zookeeper/data,创建myid文件:
    注意:myid文件的内容为zoo.cfg文件中配置的server.后面的数字(即nn-1为1,dn-1为2,dn-2为3)。
    在nn-1节点的myid内容为:
    1. [hadoop@nn-1 data]$ echo 1 > /hadoop/zookeeper/data/myid

    其他zk节点也安要求创建myid文件。


    4.6 设置环境变量

    1. $ echo "export ZOOKEEPER_HOME=/hadoop/zookeeper" >> $HOME/.bash_profile
    2. $ echo "export PATH=$ZOOKEEPER_HOME/bin:$PATH" >> $HOME/.bash_profile
    3. $ source $HOME/.bash_profile


    5 集群启动

    5.1 启动zookeeper

    根据规划, zk的节点为nn-1、dn-1和dn-2, 在这三台节点分别启动zk:

    启动命令:
    1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh start
    2. JMX enabled by default
    3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
    4. Starting zookeeper ... STARTED

    查看进程, 可以看到QuorumPeerMain:
    1. [hadoop@nn-1 ~]$ jps
    2. 9382 QuorumPeerMain
    3. 9407 Jps

    查看状态, 可以看到Mode: follower, 说明这是zk的从节点:
    1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh status
    2. JMX enabled by default
    3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
    4. Mode: follower

    查看状态, 可以看到Mode: leader, 说明这是zk的leader节点:
    1. [hadoop@dn-1 data]$ /hadoop/zookeeper/bin/zkServer.sh status
    2. JMX enabled by default
    3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
    4. Mode: leader

    5.2 格式化zookeeper集群(只做一次)(机器nn-1上执行)


    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hdfs zkfc -formatZK
    中间有个交互的步骤, 输入Y:
     
    进入zk, 查看是否创建成功:
    1. [hadoop@nn-1 bin]$ ./zkCli.sh
     

    5.3 启动zkfc(机器nn-1,nn-2上执行)

    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start zkfc
    2. starting zkfc, logging to /hadoop/hadoop/logs/hadoop-hadoop-zkfc-nn-1.out

    使用jps, 可以看到进程DFSZKFailoverController:
    1. [hadoop@nn-1 ~]$ jps
    2. 9681 Jps
    3. 9638 DFSZKFailoverController
    4. 9382 QuorumPeerMain

     

    5.4 启动journalnode

    根据规划, 启动journalnode节点为nn-1、nn-2和dn-1, 在这三个节点分别使用如下的命令启动服务:
    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode
    2. starting journalnode, logging to /hadoop/hadoop/logs/hadoop-hadoop-journalnode-nn-1.out

    使用jps命令可以看到进程JournalNode:
    1. [hadoop@nn-1 ~]$ jps
    2. 9714 JournalNode
    3. 9638 DFSZKFailoverController
    4. 9382 QuorumPeerMain
    5. 9762 Jps

    5.5 格式化namenode(机器nn-1上执行)

    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hadoop namenode -format

    查看日志信息:
     

    5.6 启动namenode(机器nn-1上执行)

    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
    2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out
    使用jps命令可以看到进程NameNode:
    1. [hadoop@nn-1 ~]$ jps
    2. 9714 JournalNode
    3. 9638 DFSZKFailoverController
    4. 9382 QuorumPeerMain
    5. 10157 NameNode
    6. 10269 Jps

    5.7 格式化secondnamnode(机器nn-2上执行)

    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs namenode -bootstrapStandby
    部分日志如下:
     

    5.8 启动namenode(机器nn-2上执行)

    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
    2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-2.out
    使用jps命令可以看到进程NameNode:
    1. [hadoop@nn-2 ~]$ jps
    2. 53990 NameNode
    3. 54083 Jps
    4. 53824 JournalNode
    5. 53708 DFSZKFailoverController

    5.9 启动datanode(机器dn-1到dn-3上执行)

    1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start datanode
    使用jps可以看到DataNode进程:
    1. [hadoop@dn-1 temp]$ jps
    2. 57007 Jps
    3. 56927 DataNode
    4. 56223 QuorumPeerMain


    5.10 启动resourcemanager

    根据规划,resourcemanager做了HA, 服务在节点dn-1和dn-2上面, 在dn-1和dn-2上面启动resourcemanager:
    1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start resourcemanager
    2. starting resourcemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-resourcemanager-dn-1.out

    使用jps, 可以看到进程ResourceManager:
    1. [hadoop@dn-1 ~]$ jps
    2. 57173 QuorumPeerMain
    3. 58317 Jps
    4. 57283 JournalNode
    5. 58270 ResourceManager
    6. 58149 DataNode

    5.11 启动jobhistory

    根据规划, jobhistory服务在nn-1上面, 使用如下命令启动:
    1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
    2. starting historyserver, logging to /hadoop/hadoop/logs/mapred-hadoop-historyserver-nn-1.out

    使用jps, 可以看到进程JobHistoryServer:
    1. [hadoop@nn-1 ~]$ jps
    2. 11210 JobHistoryServer
    3. 9714 JournalNode
    4. 9638 DFSZKFailoverController
    5. 9382 QuorumPeerMain
    6. 11039 NameNode
    7. 11303 Jps

    5.12 启动NodeManager

    根据规划, dn-1、dn-2和dn-3是nodemanager, 在这三个节点启动NodeManager:
    1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start nodemanager
    2. starting nodemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-nodemanager-dn-1.out

    使用jps可以看到进程NodeManager:
    1. [hadoop@dn-1 ~]$ jps
    2. 58559 NodeManager
    3. 57173 QuorumPeerMain
    4. 58668 Jps
    5. 57283 JournalNode
    6. 58270 ResourceManager
    7. 58149 DataNode


    6、安装后查看和验证

    6.1 HDFS相关操作命令

    查看NameNode状态的命令
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -getServiceState nn-1

    手工切换,将active的NameNode从nn-1切换到nn-2 。
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -DfSHAadmin -failover nn-1 nn-2
     
    NameNode健康检查:
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
     将其中一台NameNode给kill后, 查看健康状态:
     


    查看所有的DataNode列表:
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report | more
     
    查看正常DataNode列表:
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -live
    2. 17/03/01 22:49:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Configured Capacity: 224954695680 (209.51 GB)
    4. Present Capacity: 180557139968 (168.16 GB)
    5. DFS Remaining: 179963428864 (167.60 GB)
    6. DFS Used: 593711104 (566.21 MB)
    7. DFS Used%: 0.33%
    8. Under replicated blocks: 2
    9. Blocks with corrupt replicas: 0
    10. Missing blocks: 0
    11. -------------------------------------------------
    12. Live datanodes (3):
    13. Name: 192.168.9.23:50010 (dn-1)
    14. Hostname: dn-1
    15. Rack: /rack2
    16. Decommission Status : Normal
    17. Configured Capacity: 74984898560 (69.84 GB)
    18. DFS Used: 197902336 (188.73 MB)
    19. Non DFS Used: 14869356544 (13.85 GB)
    20. DFS Remaining: 59917639680 (55.80 GB)
    21. DFS Used%: 0.26%
    22. DFS Remaining%: 79.91%
    23. Configured Cache Capacity: 0 (0 B)
    24. Cache Used: 0 (0 B)
    25. Cache Remaining: 0 (0 B)
    26. Cache Used%: 100.00%
    27. Cache Remaining%: 0.00%
    28. Xceivers: 1
    29. Last contact: Wed Mar 01 22:49:42 CST 2017

    查看异常DataNode列表:
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -dead

    获取指定DataNode信息(运行时间及版本等):
    1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-2
    2. 17/03/01 22:55:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
    4. 17/03/01 22:55:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


    6.2 YARN相关的命令

    查看resourceManager状态的命令:
    1. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm1
    2. active
    3. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm2
    4. standby

    查看所有的yarn节点:
    1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -all -list
    2. 17/03/01 23:06:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Total Nodes:3
    4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
    5. dn-2:55506 RUNNING dn-2:8042 0
    6. dn-1:56447 RUNNING dn-1:8042 0
    7. dn-3:37533 RUNNING dn-3:8042 0

    查看正常的yarn节点:
    1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -list
    2. 17/03/01 23:07:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Total Nodes:3
    4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
    5. dn-2:55506 RUNNING dn-2:8042 0
    6. dn-1:56447 RUNNING dn-1:8042 0
    7. dn-3:37533 RUNNING dn-3:8042 0

    查看指定节点的信息:
    /hadoop/hadoop/bin/yarn node -status <NodeId>
    1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -status dn-2:55506
    2. 17/03/01 23:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Node Report :
    4. Node-Id : dn-2:55506
    5. Rack : /default-rack
    6. Node-State : RUNNING
    7. Node-Http-Address : dn-2:8042
    8. Last-Health-Update : 星期三 01/三月/17 11:06:21:373CST
    9. Health-Report :
    10. Containers : 0
    11. Memory-Used : 0MB
    12. Memory-Capacity : 8192MB
    13. CPU-Used : 0 vcores
    14. CPU-Capacity : 8 vcores
    15. Node-Labels :

    查看当前运行的MapReduce任务:
    1. [hadoop@dn-2 ~]$ /hadoop/hadoop/bin/yarn application -list
    2. 17/03/01 23:10:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
    4. Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
    5. application_1488375590901_0004 QuasiMonteCarlo MAPREDUCE hadoop default RUNNING UNDEFINED


    6.3 使用自带的例子测试

    1. [hadoop@dn-1 ~]$ cd hadoop/
    2. [hadoop@dn-1 hadoop]$
    3. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200

    1. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200
    2. Number of Maps = 2
    3. Samples per Map = 200
    4. 17/02/28 01:51:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    5. Wrote input for Map #0
    6. Wrote input for Map #1
    7. Starting Job
    8. 17/02/28 01:51:15 INFO input.FileInputFormat: Total input paths to process : 2
    9. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: number of splits:2
    10. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488216892564_0001
    11. 17/02/28 01:51:16 INFO impl.YarnClientImpl: Submitted application application_1488216892564_0001
    12. 17/02/28 01:51:16 INFO mapreduce.Job: The url to track the job: http://dn-1:8088/proxy/application_1488216892564_0001/
    13. 17/02/28 01:51:16 INFO mapreduce.Job: Running job: job_1488216892564_0001
    14. 17/02/28 01:51:24 INFO mapreduce.Job: Job job_1488216892564_0001 running in uber mode : false
    15. 17/02/28 01:51:24 INFO mapreduce.Job: map 0% reduce 0%
    16. 17/02/28 01:51:38 INFO mapreduce.Job: map 100% reduce 0%
    17. 17/02/28 01:51:49 INFO mapreduce.Job: map 100% reduce 100%
    18. 17/02/28 01:51:49 INFO mapreduce.Job: Job job_1488216892564_0001 completed successfully
    19. 17/02/28 01:51:50 INFO mapreduce.Job: Counters: 49
    20. File System Counters
    21. FILE: Number of bytes read=50
    22. FILE: Number of bytes written=326922
    23. FILE: Number of read operations=0
    24. FILE: Number of large read operations=0
    25. FILE: Number of write operations=0
    26. HDFS: Number of bytes read=510
    27. HDFS: Number of bytes written=215
    28. HDFS: Number of read operations=11
    29. HDFS: Number of large read operations=0
    30. HDFS: Number of write operations=3
    31. Job Counters
    32. Launched map tasks=2
    33. Launched reduce tasks=1
    34. Data-local map tasks=2
    35. Total time spent by all maps in occupied slots (ms)=25604
    36. Total time spent by all reduces in occupied slots (ms)=7267
    37. Total time spent by all map tasks (ms)=25604
    38. Total time spent by all reduce tasks (ms)=7267
    39. Total vcore-seconds taken by all map tasks=25604
    40. Total vcore-seconds taken by all reduce tasks=7267
    41. Total megabyte-seconds taken by all map tasks=26218496
    42. Total megabyte-seconds taken by all reduce tasks=7441408
    43. Map-Reduce Framework
    44. Map input records=2
    45. Map output records=4
    46. Map output bytes=36
    47. Map output materialized bytes=56
    48. Input split bytes=274
    49. Combine input records=0
    50. Combine output records=0
    51. Reduce input groups=2
    52. Reduce shuffle bytes=56
    53. Reduce input records=4
    54. Reduce output records=0
    55. Spilled Records=8
    56. Shuffled Maps =2
    57. Failed Shuffles=0
    58. Merged Map outputs=2
    59. GC time elapsed (ms)=419
    60. CPU time spent (ms)=6940
    61. Physical memory (bytes) snapshot=525877248
    62. Virtual memory (bytes) snapshot=2535231488
    63. Total committed heap usage (bytes)=260186112
    64. Shuffle Errors
    65. BAD_ID=0
    66. CONNECTION=0
    67. IO_ERROR=0
    68. WRONG_LENGTH=0
    69. WRONG_MAP=0
    70. WRONG_REDUCE=0
    71. File Input Format Counters
    72. Bytes Read=236
    73. File Output Format Counters
    74. Bytes Written=97
    75. Job Finished in 35.466 seconds
    76. Estimated value of Pi is 3.17000000000000000000

    6.4 查看NameNode

     链接分别为:

    192.168.9.21和192.168.9.22分别为NameNode和Secondary NameNode的地址。
     
     



    6.5 查看NameNode 的HA切换是否正常

    将nn-1上状态为active的NameNode进程kill, 查看nn-2上的NameNode能否从standby切换为active:
     

     


    6.6 查看RM页面


     



    查看节点信息, 192.168.9.23为Resource服务所在的active节点。
     


    运行测试任务, 查看YARN HA能否自动切换:
    1. [hadoop@dn-2 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200

    运行到中间的时候, 将rm1上的rm给kill掉, 查看切换是否正常:
     
    查看standby的HA state变成了active:


    查看下面的日志, 程序运行期间由于rm1被kill, 程序报错, 然后Trying to fail over immediately, 最终程序运行成功。
     
    1. [hadoop@dn-2 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200
    2. Number of Maps = 2
    3. Samples per Map = 200
    4. 17/02/28 02:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    5. Wrote input for Map #0
    6. Wrote input for Map #1
    7. Starting Job
    8. 17/02/28 02:11:12 INFO input.FileInputFormat: Total input paths to process : 2
    9. 17/02/28 02:11:12 INFO mapreduce.JobSubmitter: number of splits:2
    10. 17/02/28 02:11:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488216892564_0002
    11. 17/02/28 02:11:13 INFO impl.YarnClientImpl: Submitted application application_1488216892564_0002
    12. 17/02/28 02:11:13 INFO mapreduce.Job: The url to track the job: http://dn-1:8088/proxy/application_1488216892564_0002/
    13. 17/02/28 02:11:13 INFO mapreduce.Job: Running job: job_1488216892564_0002
    14. 17/02/28 02:11:18 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over immediately.
    15. java.io.EOFException: End of File Exception between local host is: "dn-2/192.168.9.24"; destination host is: "dn-1":8032; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
    16. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    17. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    18. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    19. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    20. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    21. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
    22. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    23. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    24. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    25. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
    26. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
    27. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    28. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    29. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    30. at java.lang.reflect.Method.invoke(Method.java:606)
    31. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    32. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    33. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
    34. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
    35. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
    36. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
    37. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
    38. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
    39. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
    40. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
    41. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
    42. at java.security.AccessController.doPrivileged(Native Method)
    43. at javax.security.auth.Subject.doAs(Subject.java:415)
    44. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    45. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
    46. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
    47. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
    48. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
    49. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
    50. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
    51. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    52. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
    53. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    54. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    55. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    56. at java.lang.reflect.Method.invoke(Method.java:606)
    57. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    58. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    59. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    60. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    61. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    62. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    63. at java.lang.reflect.Method.invoke(Method.java:606)
    64. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    65. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    66. Caused by: java.io.EOFException
    67. at java.io.DataInputStream.readInt(DataInputStream.java:392)
    68. at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1071)
    69. at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
    70. 17/02/28 02:11:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
    71. 17/02/28 02:11:18 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 40859ms.
    72. java.net.ConnectException: Call From dn-2/192.168.9.24 to dn-2:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    73. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    74. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    75. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    76. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    77. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    78. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    79. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    80. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    81. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    82. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
    83. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
    84. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    85. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    86. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    87. at java.lang.reflect.Method.invoke(Method.java:606)
    88. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    89. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    90. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
    91. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
    92. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
    93. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
    94. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
    95. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
    96. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
    97. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
    98. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
    99. at java.security.AccessController.doPrivileged(Native Method)
    100. at javax.security.auth.Subject.doAs(Subject.java:415)
    101. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    102. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
    103. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
    104. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
    105. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
    106. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
    107. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
    108. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    109. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
    110. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    111. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    112. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    113. at java.lang.reflect.Method.invoke(Method.java:606)
    114. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    115. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    116. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    117. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    118. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    119. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    120. at java.lang.reflect.Method.invoke(Method.java:606)
    121. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    122. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    123. Caused by: java.net.ConnectException: 拒绝连接
    124. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    125. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    126. at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    127. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    128. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    129. at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    130. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    131. at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    132. at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    133. at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    134. ... 43 more
    135. 17/02/28 02:11:59 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
    136. 17/02/28 02:11:59 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm1 after 2 fail over attempts. Trying to fail over after sleeping for 17213ms.
    137. java.net.ConnectException: Call From dn-2/192.168.9.24 to dn-1:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    138. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    139. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    140. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    141. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    142. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    143. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    144. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    145. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    146. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    147. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
    148. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
    149. at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    150. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    151. at java.lang.reflect.Method.invoke(Method.java:606)
    152. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    153. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    154. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
    155. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
    156. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
    157. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
    158. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
    159. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
    160. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
    161. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
    162. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
    163. at java.security.AccessController.doPrivileged(Native Method)
    164. at javax.security.auth.Subject.doAs(Subject.java:415)
    165. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    166. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
    167. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
    168. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
    169. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
    170. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
    171. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
    172. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    173. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
    174. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    175. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    176. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    177. at java.lang.reflect.Method.invoke(Method.java:606)
    178. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    179. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    180. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    181. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    182. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    183. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    184. at java.lang.reflect.Method.invoke(Method.java:606)
    185. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    186. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    187. Caused by: java.net.ConnectException: 拒绝连接
    188. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    189. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    190. at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    191. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    192. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    193. at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    194. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    195. at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    196. at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    197. at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    198. ... 42 more
    199. 17/02/28 02:12:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
    200. 17/02/28 02:12:18 INFO mapreduce.Job: Job job_1488216892564_0002 running in uber mode : false
    201. 17/02/28 02:12:18 INFO mapreduce.Job: map 0% reduce 0%
    202. 17/02/28 02:12:22 INFO mapreduce.Job: map 100% reduce 0%
    203. 17/02/28 02:12:28 INFO mapreduce.Job: map 100% reduce 100%
    204. 17/02/28 02:12:28 INFO mapreduce.Job: Job job_1488216892564_0002 completed successfully
    205. 17/02/28 02:12:28 INFO mapreduce.Job: Counters: 49
    206. File System Counters
    207. FILE: Number of bytes read=50
    208. FILE: Number of bytes written=326931
    209. FILE: Number of read operations=0
    210. FILE: Number of large read operations=0
    211. FILE: Number of write operations=0
    212. HDFS: Number of bytes read=510
    213. HDFS: Number of bytes written=215
    214. HDFS: Number of read operations=11
    215. HDFS: Number of large read operations=0
    216. HDFS: Number of write operations=3
    217. Job Counters
    218. Launched map tasks=2
    219. Launched reduce tasks=1
    220. Data-local map tasks=2
    221. Total time spent by all maps in occupied slots (ms)=22713
    222. Total time spent by all reduces in occupied slots (ms)=3213
    223. Total time spent by all map tasks (ms)=22713
    224. Total time spent by all reduce tasks (ms)=3213
    225. Total vcore-seconds taken by all map tasks=22713
    226. Total vcore-seconds taken by all reduce tasks=3213
    227. Total megabyte-seconds taken by all map tasks=23258112
    228. Total megabyte-seconds taken by all reduce tasks=3290112
    229. Map-Reduce Framework
    230. Map input records=2
    231. Map output records=4
    232. Map output bytes=36
    233. Map output materialized bytes=56
    234. Input split bytes=274
    235. Combine input records=0
    236. Combine output records=0
    237. Reduce input groups=2
    238. Reduce shuffle bytes=56
    239. Reduce input records=4
    240. Reduce output records=0
    241. Spilled Records=8
    242. Shuffled Maps =2
    243. Failed Shuffles=0
    244. Merged Map outputs=2
    245. GC time elapsed (ms)=233
    246. CPU time spent (ms)=12680
    247. Physical memory (bytes) snapshot=517484544
    248. Virtual memory (bytes) snapshot=2548441088
    249. Total committed heap usage (bytes)=260186112
    250. Shuffle Errors
    251. BAD_ID=0
    252. CONNECTION=0
    253. IO_ERROR=0
    254. WRONG_LENGTH=0
    255. WRONG_MAP=0
    256. WRONG_REDUCE=0
    257. File Input Format Counters
    258. Bytes Read=236
    259. File Output Format Counters
    260. Bytes Written=97
    261. Job Finished in 76.447 seconds
    262. Estimated value of Pi is 3.17000000000000000000
    263. [hadoop@dn-2 hadoop]$


    7、安装Spark


    规划, 在现有的Hadoop集群安装spark集群:
    master节点: nn-1
    worker节点: nn-2、dn-1、dn-2、dn-3。

    7.1 安装配置Scala

    上传安装包到nn-1的/hadoop目录下面,解压:
    1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz
    环境变量后面统一配置。

    7.2 安装spark


    上传安装包spark-1.6.0-bin-hadoop2.6.tgz到nn-1的目录/hadoop下面, 解压
    1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz

    进入目录:/hadoop/spark-1.6.0-bin-hadoop2.6/conf
    复制生成文件spark-env.sh和slaves:
    1. [hadoop@nn-1 conf]$ pwd
    2. /hadoop/spark-1.6.0-bin-hadoop2.6/conf
    3. [hadoop@nn-1 conf]$ cp spark-env.sh.template spark-env.sh
    4. [hadoop@nn-1 conf]$ cp slaves.template slaves
    编辑spark-env.sh, 加入如下内容:
    1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
    2. export SCALA_HOME=/hadoop/scala-2.11.7
    3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
    4. export SPARK_MASTER_IP=nn-1
    5. export SPARK_WORKER_MEMORY=2g
    6. export HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop
    SPARK_WORKER_MEMORY根据实际情况配置。

    编辑spark-env.sh, 加入如下内容:slaves
    1. nn-2
    2. dn-1
    3. dn-2
    4. dn-3
    slaves指定的是worker节点。

    7.3 配置环境变量

    1. [hadoop@nn-1 ~]$ vim .bash_profile
    追加如下内容:
    1. export HADOOP_HOME=/hadoop/hadoop
    2. export SCALA_HOME=/hadoop/scala-2.11.7
    3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
    4. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

    7.4 分发上面配置好的scala和spark目录到其他节点

    1. [hadoop@nn-1 bin]$ cd /hadoop
    2. [hadoop@nn-1 ~]$ scp -rp spark-1.6.0-bin-hadoop2.6 hadoop@dn-1:/hadoop
    3. [hadoop@nn-1 ~]$ scp -rp scala-2.11.7 hadoop@dn-1:/hadoop

    7.5 启动Spark集群

    1. [hadoop@nn-1 ~]$ /hadoop/spark-1.6.0-bin-hadoop2.6/sbin/start-all.sh

    在nn-1和其他slaves节点查看进程:
    在nn-1节点, 可以看到Master进程:
    1. [hadoop@nn-1 ~]$ jps
    2. 2473 JournalNode
    3. 2541 NameNode
    4. 4401 Jps
    5. 2399 DFSZKFailoverController
    6. 2687 JobHistoryServer
    7. 2775 Master
    8. 2351 QuorumPeerMain

    slaves节点可以看到Worker进程:
    1. [hadoop@dn-1 ~]$ jps
    2. 2522 NodeManager
    3. 3449 Jps
    4. 2007 QuorumPeerMain
    5. 2141 DataNode
    6. 2688 Worker
    7. 2061 JournalNode
    8. 2258 ResourceManager

    查看spark页面:

     

    7.6 运行测试案例

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi 

                       --master yarn --deploy-mode cluster 

                       --driver-memory 100M

                       --executor-memory 200M

                       --executor-cores 1 

                       --queue default 

                       lib/spark-examples*.jar 10

    或者:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi 

                       --master yarn --deploy-mode cluster 

                       --executor-cores 1 

                       --queue default 

                       lib/spark-examples*.jar 10


     
     
     



    8、配置机架感知

    在nn-1和nn-2节点的配置文件/hadoop/hadoop/etc/hadoop/core-site.xml加入如下配置:
    1. <property>
    2. <name>topology.script.file.name</name>
    3. <value>/hadoop/hadoop/etc/hadoop/RackAware.py</value>
    4. </property>
    新增文件:/hadoop/hadoop/etc/hadoop/RackAware.py,内容如下:
    1. #!/usr/bin/python
    2. #-*-coding:UTF-8 -*-
    3. import sys
    4. rack = {"dn-1":"rack2",
    5. "dn-2":"rack1",
    6. "dn-3":"rack1",
    7. "192.168.9.23":"rack2",
    8. "192.168.9.24":"rack1",
    9. "192.168.9.25":"rack1",
    10. }
    11. if __name__=="__main__":
    12. print "/" + rack.get(sys.argv[1],"rack0")
    设置权限:
    1. [root@nn-1 hadoop]# chmod +x RackAware.py
    2. [root@nn-1 hadoop]# ll RackAware.py
    3. -rwxr-xr-x 1 hadoop hadoop 294 3 1 21:24 RackAware.py

    重启nn-1和nn-2上的NameNode服务:
    1. [hadoop@nn-1 ~]$ hadoop-daemon.sh stop namenode
    2. stopping namenode
    3. [hadoop@nn-1 ~]$ hadoop-daemon.sh start namenode
    4. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out

    查看日志:
    1. [root@nn-1 logs]# pwd
    2. /hadoop/hadoop/logs
    3. [root@nn-1 logs]# vim hadoop-hadoop-namenode-nn-1.log

     


    使用命令查看拓扑:
    1. [hadoop@dn-3 ~]$ hdfs dfsadmin -printTopology
    2. 17/03/02 00:21:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3. Rack: /rack1
    4. 192.168.9.24:50010 (dn-2)
    5. 192.168.9.25:50010 (dn-3)
    6. Rack: /rack2
    7. 192.168.9.23:50010 (dn-1)







  • 相关阅读:
    SVN简介
    TFS简介
    UML简介
    C#++c1FlexGrid+帮助文档09
    vmware虚拟机 C硬盘空间 无损扩容 新测
    批处理命令中set定义的两种变量介绍 计算机基础知识
    ASP.NET获取网站根目录(路径)
    VMware(bridge、NAT、host-only、custom)含义
    spring3.0+Atomikos 构建jta的分布式事务
    在做了 BasePage 时: 只有在配置文件或 Page 指令中将 enableSessionState 设置为 true 时,才能使用会话状态。还请确保在应用程序配置的 / / 节中包括
  • 原文地址:https://www.cnblogs.com/xiaohe001/p/6484462.html
Copyright © 2011-2022 走看看