今天启动hadoop机器,但是其中一个节点的TaskTracker始终启动不起来。http://forum.hadoop.tw/viewtopic.php?p=149#p149中里面提到NameNode与DataNode的namespaceID 不同,于是我查看了两者的namespaceID,发现相同。我仔细看了一下如下日志:
2012-11-14 20:37:43,461 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'master' with reponseId '77 2012-11-14 20:37:44,463 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 0 time(s). 2012-11-14 20:37:45,467 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 1 time(s). 2012-11-14 20:37:46,471 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 2 time(s). 2012-11-14 20:37:47,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 3 time(s). 2012-11-14 20:37:48,480 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 4 time(s). 2012-11-14 20:37:49,484 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 5 time(s). 2012-11-14 20:37:50,488 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 6 time(s). 2012-11-14 20:37:51,492 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 7 time(s). 2012-11-14 20:37:52,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 8 time(s). 2012-11-14 20:37:53,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.21.7.137:9001. Already tried 9 time(s). 2012-11-14 20:37:53,501 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: java.net.ConnectException: Call to master/172.21.7.137:9001 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:767) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source) at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1215) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1037) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) at org.apache.hadoop.ipc.Client.getConnection(Client.java:860) at org.apache.hadoop.ipc.Client.call(Client.java:720) ... 6 more
发现TaskTracker一直Call to master/172.21.7.137:9001,说明TaskTracker一直尝试连接master节点。猜想可能:关闭Hadoop系统异常,TaskTracker未正常关闭。于是通过命令:ps -ef | grep java ,发现TaskTracker进程依然存在,于是通过命令:kill -9 TaskTracker的processId。重新启动Hadoop,TaskTracker启动成功。