zoukankan      html  css  js  c++  java
  • Kafka:ZK+Kafka+Spark Streaming集群环境搭建(五)针对hadoop2.9.0启动之后发现slave上正常启动了DataNode,DataManager,但是过了几秒后发现DataNode被关闭

    启动之后发现slave上正常启动了DataNode,DataManager,但是过了几秒后发现DataNode被关闭

    以slave1上错误日期为例查看错误信息:

    more /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave1.log

    找到错误信息:

    2018-06-30 22:29:50,944 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/opt/hadoop-2.9.0/dfs/data/
    java.io.IOException: Incompatible clusterIDs in /opt/hadoop-2.9.0/dfs/data: namenode clusterID = CID-f1195fc7-ca7c-4a2a-b32f-211131a5d699; datanode clusterID = CID-292293a6-9c34-4de7-aecd-d72657a26dd5
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:760)
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:293)
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:388)
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
            at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:374)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
            at java.lang.Thread.run(Thread.java:748)
    2018-06-30 22:29:50,948 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid f4badff3-7a0b-4db0-bd77-83b370f67eed) service to master/192
    .168.0.120:9000. Exiting. 
    java.io.IOException: All specified directories have failed to load.
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:557)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
            at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:374)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
            at java.lang.Thread.run(Thread.java:748)
    2018-06-30 22:29:50,948 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid f4badff3-7a0b-4db0-bd77-83b370f67eed) service to master
    /192.168.0.120:9000
    2018-06-30 22:29:51,060 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid f4badff3-7a0b-4db0-bd77-83b370f67eed)
    2018-06-30 22:29:53,060 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
    2018-06-30 22:29:53,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down DataNode at slave1/192.168.0.121
    ************************************************************/

    解决方案

    错误问题原因:多次格式化导致的。

    1)在master执行sbin/stop-all.sh,关闭hadoop:

    cd /opt/hadoop-2.9.0
    sbin/stop-all.sh

    2)依次在master,slave1,slave2,slave3上执行以下命令:

    cd /opt/hadoop-2.9.0
    rm -r dfs
    rm -r logs
    rm -r tmp

    3)在master上重新格式化hadoop,重新启动hadoop

    cd /opt/hadoop-2.9.0     #进入hadoop目录
    bin/hadoop namenode -format     #格式化namenode
    sbin/start-all.sh               #启动dfs
    [spark@master hadoop-2.9.0]$ cd /opt/hadoop-2.9.0     #进入hadoop目录
    [spark@master hadoop-2.9.0]$ bin/hadoop namenode -format     #格式化namenode
    sbin/start-all.sh               #启动dfs
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    
    18/06/30 22:45:20 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = master/192.168.0.120
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.9.0
    STARTUP_MSG:   classpath = /opt/hadoop-2.9.0/etc/hadoop:/opt/hadoop-2.9.0/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/opt/hadoop-2.9.0/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop-...
    STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z
    STARTUP_MSG:   java = 1.8.0_171
    ************************************************************/
    18/06/30 22:45:20 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    18/06/30 22:45:20 INFO namenode.NameNode: createNameNode [-format]
    Formatting using clusterid: CID-d4e2f108-de3c-4910-9eeb-abbbb1024fe8
    18/06/30 22:45:20 INFO namenode.FSEditLog: Edit logging is async:true
    18/06/30 22:45:20 INFO namenode.FSNamesystem: KeyProvider: null
    18/06/30 22:45:20 INFO namenode.FSNamesystem: fsLock is fair: true
    18/06/30 22:45:20 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
    18/06/30 22:45:20 INFO namenode.FSNamesystem: fsOwner             = spark (auth:SIMPLE)
    18/06/30 22:45:20 INFO namenode.FSNamesystem: supergroup          = supergroup
    18/06/30 22:45:20 INFO namenode.FSNamesystem: isPermissionEnabled = true
    18/06/30 22:45:20 INFO namenode.FSNamesystem: HA Enabled: false
    18/06/30 22:45:20 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
    18/06/30 22:45:20 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
    18/06/30 22:45:20 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jun 30 22:45:20
    18/06/30 22:45:20 INFO util.GSet: Computing capacity for map BlocksMap
    18/06/30 22:45:20 INFO util.GSet: VM type       = 64-bit
    18/06/30 22:45:20 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
    18/06/30 22:45:20 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
    18/06/30 22:45:20 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
    18/06/30 22:45:20 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    18/06/30 22:45:20 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
    18/06/30 22:45:20 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: defaultReplication         = 3
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: maxReplication             = 512
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: minReplication             = 1
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    18/06/30 22:45:20 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
    18/06/30 22:45:20 INFO namenode.FSNamesystem: Append Enabled: true
    18/06/30 22:45:20 INFO util.GSet: Computing capacity for map INodeMap
    18/06/30 22:45:20 INFO util.GSet: VM type       = 64-bit
    18/06/30 22:45:20 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
    18/06/30 22:45:20 INFO util.GSet: capacity      = 2^20 = 1048576 entries
    18/06/30 22:45:20 INFO namenode.FSDirectory: ACLs enabled? false
    18/06/30 22:45:20 INFO namenode.FSDirectory: XAttrs enabled? true
    18/06/30 22:45:20 INFO namenode.NameNode: Caching file names occurring more than 10 times
    18/06/30 22:45:20 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
    18/06/30 22:45:20 INFO util.GSet: Computing capacity for map cachedBlocks
    18/06/30 22:45:20 INFO util.GSet: VM type       = 64-bit
    18/06/30 22:45:20 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
    18/06/30 22:45:20 INFO util.GSet: capacity      = 2^18 = 262144 entries
    18/06/30 22:45:20 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
    18/06/30 22:45:20 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    18/06/30 22:45:20 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    18/06/30 22:45:20 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    18/06/30 22:45:20 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    18/06/30 22:45:20 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    18/06/30 22:45:20 INFO util.GSet: VM type       = 64-bit
    18/06/30 22:45:20 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
    18/06/30 22:45:20 INFO util.GSet: capacity      = 2^15 = 32768 entries
    18/06/30 22:45:21 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1880726246-192.168.0.120-1530369921005
    18/06/30 22:45:21 INFO common.Storage: Storage directory /opt/hadoop-2.9.0/dfs/name has been successfully formatted.
    18/06/30 22:45:21 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop-2.9.0/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    18/06/30 22:45:21 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-2.9.0/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds.
    18/06/30 22:45:21 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    18/06/30 22:45:21 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at master/192.168.0.120
    ************************************************************/
    [spark@master hadoop-2.9.0]$ sbin/start-all.sh               #启动dfs
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [master]
    master: starting namenode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-namenode-master.out
    slave1: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave1.out
    slave3: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave3.out
    slave2: starting datanode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-datanode-slave2.out
    Starting secondary namenodes [master]
    master: starting secondarynamenode, logging to /opt/hadoop-2.9.0/logs/hadoop-spark-secondarynamenode-master.out
    starting yarn daemons
    starting resourcemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-resourcemanager-master.out
    slave2: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave2.out
    slave3: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave3.out
    slave1: starting nodemanager, logging to /opt/hadoop-2.9.0/logs/yarn-spark-nodemanager-slave1.out

    4)过30s后,查看master,slave1,slave2,slave3是否启动成功

    查看master是否启动成功:

    [spark@master hadoop-2.9.0]$ jps
    3808 Jps
    3540 ResourceManager
    3191 NameNode
    3387 SecondaryNameNode
    [spark@master hadoop-2.9.0]$ 

    在slave1,slave2,slave3分别jps查看是否都启动了DataNode,DataManager进程:
    以slave1为例:

    [spark@slave1 hadoop-2.9.0]$ jps
    2160 Jps
    2018 NodeManager
    1909 DataNode
    [spark@slave1 hadoop-2.9.0]$

    参考《https://blog.csdn.net/magggggic/article/details/52503502》

  • 相关阅读:
    说一下 session 的工作原理?
    session 和 cookie 有什么区别?
    说一下 JSP 的 4 种作用域?
    jsp有哪些内置对象?作用分别是什么?
    MVC的各个部分都有那些技术来实现?如何实现?
    request.getAttribute()和 request.getParameter()有何区别?
    Servlet API中forward()与redirect()的区别?
    jsp和servlet的区别、共同点、各自应用的范围?
    说一说Servlet的生命周期?
    如何从CDN加载jQuery?
  • 原文地址:https://www.cnblogs.com/yy3b2007com/p/9247609.html
Copyright © 2011-2022 走看看