zoukankan      html  css  js  c++  java
  • hadoop集群崩溃,因为tmp下/tmp/hadoop-hadoop/dfs/name文件误删除

    hadoop执行start-all后,显示正常启动。

    starting namenode, logging to /opt/hadoop-0.20.2-cdh3u0/logs/hadoop-hadoop-namenode-localhost.localdomain.out
    localhost: starting datanode, logging to /opt/hadoop-0.20.2-cdh3u0/bin/../logs/hadoop-hadoop-datanode-localhost.localdomain.out
    localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2-cdh3u0/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
    starting jobtracker, logging to /opt/hadoop-0.20.2-cdh3u0/logs/hadoop-hadoop-jobtracker-localhost.localdomain.out
    localhost: starting tasktracker, logging to /opt/hadoop-0.20.2-cdh3u0/bin/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out

    但却不能使用,执行hadoop命令显示

    13/07/19 14:23:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
    13/07/19 14:23:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
    13/07/19 14:23:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
    13/07/19 14:23:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
    13/07/19 14:23:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
    13/07/19 14:23:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
    13/07/19 14:23:36 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
    13/07/19 14:23:37 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
    13/07/19 14:23:38 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
    13/07/19 14:23:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
    Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

    jps发现只有

    11885 Jps
    11456 DataNode
    11586 SecondaryNameNode

    说明namenode没有启动,

    用ps -aux和ps -e查了相关进程,没有什么能看出来

    去看logs里,tail -1000 hadoop-hadoop-datanode-localhost.localdomain.log,内容显示的都是连接不上。

    hadoop-hadoop-namenode-localhost.localdomain.log中,

    2013-07-19 14:14:18,083 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG: host = localhost.localdomain/127.0.0.1
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2-cdh3u0
    STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
    ************************************************************/
    2013-07-19 14:14:18,249 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    2013-07-19 14:14:18,252 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    2013-07-19 14:14:18,267 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit
    2013-07-19 14:14:18,268 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB
    2013-07-19 14:14:18,268 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries
    2013-07-19 14:14:18,268 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152
    2013-07-19 14:14:18,284 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop
    2013-07-19 14:14:18,284 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
    2013-07-19 14:14:18,284 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
    2013-07-19 14:14:18,288 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=1000
    2013-07-19 14:14:18,288 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    2013-07-19 14:14:18,437 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
    2013-07-19 14:14:18,460 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name does not exist.
    2013-07-19 14:14:18,462 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217)
    2013-07-19 14:14:18,463 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217)

    2013-07-19 14:14:18,463 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
    ************************************************************/

    原来在tmp下的name文件,被我非法关机后给弄丢了。网上说的hadoop崩溃恢复方法是secondnamenode及namenode分开在不同两台机器运行,可以在集群崩溃时=从secondnamenode恢复数据,但我的不行了,就只能hadoop namenode -format了.

    如果你secondnamenode没问题,可以用如下方法恢复

    1. 删除 namenode主节点的metadata配置目录

    rm -fr /data/hadoop-tmp/hadoop-hadoop/dfs/name

    2. 启动secondnamenode

    使用start-all.sh命令启动secondnamenode,namenode的启动不了不管

    3. 从secondnamenode恢复

    使用命令: hadoop namenode -importCheckpoint

  • 相关阅读:
    20200116
    20200115
    20191214数组之四:数字不相同的完全平方数(关于数位上数字判断与sprintf)
    结构体与C++sort()函数的用法
    字符串常用函数
    sscanf用法
    螺旋矩阵
    模m的k次根
    梅森素数与完全数
    bit_reverse_swap
  • 原文地址:https://www.cnblogs.com/cl1024cl/p/6205661.html
Copyright © 2011-2022 走看看