zoukankan      html  css  js  c++  java
  • Namenode主节点停止报错 Error: flush failed for required journal

    主节点间歇性报错其他没有问题 ,SNN的NN没有问题,相关的journalNode也都在,就是主节点的NN会停止。

    查看hadoop主节点的NN日志。

    2016-11-21 22:36:40,908 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19822 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
    2016-11-21 22:36:41,088 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.58.183:8485, 192.168.58.181:8485, 192.168.58.182:8485], stream=QuorumOutputStream starting at txid 24533))
    java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
    	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
    	at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
    	at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
    	at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
    	at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
    	at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
    	at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
    	at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
    	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2645)
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
    	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
    	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
    	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:422)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
    	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
    2016-11-21 22:36:41,089 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 24533
    2016-11-21 22:36:41,113 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
    2016-11-21 22:36:41,122 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave2/192.168.58.182:8485. Already tried 0 time(s); maxRetries=45
    2016-11-21 22:36:41,123 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave1/192.168.58.181:8485. Already tried 0 time(s); maxRetries=45
    2016-11-21 22:36:41,123 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: StandByNameNode/192.168.58.183:8485. Already tried 0 time(s); maxRetries=45
    2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20050ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.182:8485
    2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20052ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.181:8485
    2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20065ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.183:8485
    2016-11-21 22:36:41,145 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at CentOSMaster/192.168.58.180
    ************************************************************/

      首先保证设置dfs.namenode.edits.dir和dfs.journalnode.edits.dir,然后设置在hdfs-site.xml中超时时间如下:

    <property>
       <name>dfs.qjournal.start-segment.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.prepare-recovery.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.accept-recovery.timeout.ms</name>
       <value>600000000</value>
      </property>
      <property>
       <name>dfs.qjournal.prepare-recovery.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.accept-recovery.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.finalize-segment.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.select-input-streams.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.get-journal-state.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.new-epoch.timeout.ms</name>
       <value>600000000</value>
      </property>
    
      <property>
       <name>dfs.qjournal.write-txns.timeout.ms</name>
       <value>600000000</value>
      </property>
    

      貌似解决了,至今今天早上没出问题。

  • 相关阅读:
    1014 Waiting in Line (30)(30 point(s))
    1013 Battle Over Cities (25)(25 point(s))
    1012 The Best Rank (25)(25 point(s))
    1011 World Cup Betting (20)(20 point(s))
    1010 Radix (25)(25 point(s))
    1009 Product of Polynomials (25)(25 point(s))
    1008 Elevator (20)(20 point(s))
    1007 Maximum Subsequence Sum (25)(25 point(s))
    1006 Sign In and Sign Out (25)(25 point(s))
    1005 Spell It Right (20)(20 point(s))
  • 原文地址:https://www.cnblogs.com/hxsyl/p/6087573.html
Copyright © 2011-2022 走看看