namenode异常信息:
2020-06-03 04:44:42,313 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [1xx:8485, xxx:8485, xxx:8485], stream=QuorumOutputStream starting at txid 47431233)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:707) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188) at java.lang.Thread.run(Thread.java:748)
解决办法:
在hdfs-site.xml添加如下配置:
<property>
<!-- EditLog会被切割为很多段,每一段称为一个segment,Namenode发起新写入editlog的RPC调用,会使用startLogSegment方法,该参数表示发起新segment的超时时间 --> <name>dfs.qjournal.start-segment.timeout.ms</name> <value>90000</value> </property> <property> <name>dfs.qjournal.select-input-streams.timeout.ms</name> <value>90000</value> </property> <property>
<!--写入超时时间 --> <name>dfs.qjournal.write-txns.timeout.ms</name> <value>90000</value> </property>
在core-site.xml
ipc.client.connect.timeout = 90000
2、关闭zk优先同步日志功能
forceSync=no
借鉴: