zoukankan      html  css  js  c++  java
  • hbase报错ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet 采坑记

    1、错误异常信息:

    Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata@hbase
    	at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:99)
    	at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:111)
    	at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:99)
    	at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:43)
    Caused by: java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    	at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:92)
    	... 3 more
    Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
    Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524833, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
    Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524855, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
    	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
    	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
    	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
    	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
    
    Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524855, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
    	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
    	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
    	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
    	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
    
    
    Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524833, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
    Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046525453, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    	at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
    

    2、解决方式:

    1)、检查发现此时hadoop 处于安全模式,需要让hadoop退出安全模式

    hadoop dfsadmin -safemode leave 

    2) 重启hbase ,发现Hbase还是服务不能正常使用,Hmaster异常,Regionserver异常,异常日志如下:

    Hmaster关键异常日志

    2018-05-25 10:19:12,737 DEBUG[hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Opening state-log:FileStatus{path=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036689.log;isDirectory=false; length=45760804; replication=3; blocksize=536870912;modification_time=1527123981127; access_time=1527165673882; owner=hadoop;group=hadoop; permission=rw-rw-r--; isSymlink=false} 2018-05-25 10:19:12,742 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recover lease on dfs filehdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log 2018-05-25 10:19:12,742 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recovered lease, attempt=0 onfile=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log after 0ms 2018-05-25 10:19:12,742 DEBUG[hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Opening state-log:FileStatus{path=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log;isDirectory=false; length=45761668; replication=3; blocksize=536870912;modification_time=1527123982242; access_time=1527165673883; owner=hadoop;group=hadoop; permission=rw-rw-r--; isSymlink=false} 2018-05-25 10:19:12,767 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recover lease on dfs filehdfs://beh/hbase/MasterProcWALs/state-00000000000000036691.log 2018-05-25 10:19:12,768 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recovered lease, attempt=0 onfile=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036691.log after 1ms . . . 2018-05-25 10:29:29,656 DEBUG[B.defaultRpcServer.handler=31,queue=13,port=60000] ipc.RpcServer: B.defaultRpcServer.handler=31,queue=13,port=60000:callId: 301 service: RegionServerStatusService methodName: RegionServerStartupsize: 46 connection: 172.33.2.22:38698 org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:Server is not running yet   
    at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296)
    atorg.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
    atorg.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
    atorg.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
    atorg.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
    atorg.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    atorg.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    atjava.lang.Thread.run(Thread.java:745)

    Regionserver关键异常日志

    20180525 10:31:14,446 WARN [regionserver/hadoo031730 regionserver .HRegionServer: reportForDuty failed; sleeping and then retrying.
    201805-25 10:31:17 446 INFO [regionserver/hadop030 regionserver .HRegionServer: reportForDuty to master=hadoop001 , 60o00, 1527214458906 with port=60025, startcode=1527214459823
    20180525 10:31:17 447 DEBUG [regionserver/hadoo03730 regionserver .HRegionServer: Master is not running yet
    20180525 10:31:17 447 WARN [regionserver/hadoo03730 regionserver .HRegionServer: reportForDuty failed
      sleeping and then retrying
    20180525 10:31:20,447 INFO [regionserver/hadoo031730 regionserver .HRegionServer: reportForDuty to master=hadoop001 60000, 1527214458906 with port60025, startcode1527214459823
    20180525 10:31:20,448 DEBUG [regionserver/hadoop003173 regionserver .HRegionServer: Master is not running yet
    20180525 10:31:20,448 WARN [ regionserver/hadoop003173 regionserver .HRegionServer: reportForDuty failed
      sleeping and then retrying.
    201805-25 10:31:23,448 INFO [regionserver/hadop030 regionserver .HRegionServer: reportForDuty to master=hadoop001 , 60000,1527214458906 with port=60025, startcode=1527214459823
    20180525 10:31:23,449 DEBUG [regionserver/hadoop003/173 regionserver .HRegionServer: Master is not running yet

    Datanode关键异常日志

    2018-05-25 11:04:20,540 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stoppedreading, disconnecting it (hadoop028:50010:DataXceiver error processingREAD_BLOCK operation  src: /172.33.2.17:39882dst: /172.33.2.44:50010); 
    java.net.SocketTimeoutException: 600000 millistimeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/172.33.2.44:50010remote=/172.33.2.17:39882] 2018-05-25 11:04:20,652 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stoppedreading, disconnecting it (hadoop028:50010:DataXceiver error processingREAD_BLOCK operation src:/172.33.2.17:39930 dst: /172.33.2.44:50010);
    java.net.SocketTimeoutException:600000 millis timeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/172.33.2.44:50010remote=/172.33.2.17:39930] 2018-05-25 11:04:21,088 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Likely the client has stopped reading, disconnecting it(hadoop028:50010:DataXceiver error processing READ_BLOCK operation src: /172.33.2.17:40038 dst:/172.33.2.44:50010);
    java.net.SocketTimeoutException: 600000 millis timeoutwhile waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connectedlocal=/172.33.2.44:50010 remote=/172.33.2.17:40038]

    3)、问题分析

    • 解决前以排除hdfs问题,datanode异常信息是由hbase Hmaster不能正常启动导致,172.33.2.17是active(zk确定)Hmaster节点;

    • 根据Reginserver和Hmaster的日志org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is notrunning yet

    Master is not running yet

    确定是Hmaster服务不能正常启动导致;

    • 根据Hmaster异常日志:2018-05-25 10:19:59,868 WARN [hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Unable toread tracker for hdfs://beh/hbase/MasterProcWALs/state-00000000000000040786.log- Missing trailer: size=11 startPos=11查看目录hdfs://beh/hbase/MasterProcWALs,该目录总大小为1.3T大小

    Ø 原因:Hmaster状态变为active状态,它就会有许多不同的日志来recover, lease, read;但是日志量巨大,是给了namenode很大压力,耗尽了tcp缓冲空间,导致服务恢复时间超长。

    4)、解决方式:  删除hdfs://beh/hbase/MasterProcWALs目录下的日志文件 ,然后重启hbase集群

    作者的原创文章,转载须注明出处。原创文章归作者所有,欢迎转载,但是保留版权。对于转载了博主的原创文章,不标注出处的,作者将依法追究版权,请尊重作者的成果。
  • 相关阅读:
    codeforces C. Fixing Typos 解题报告
    codeforces B. The Fibonacci Segment 解题报告
    codeforces B. Color the Fence 解题报告
    codeforces B. Petya and Staircases 解题报告
    codeforces A. Sereja and Bottles 解题报告
    codeforces B. Levko and Permutation 解题报告
    codeforces B.Fence 解题报告
    tmp
    API 设计 POSIX File API
    分布式跟踪的一个流行标准是OpenTracing API,该标准的一个流行实现是Jaeger项目。
  • 原文地址:https://www.cnblogs.com/laoqing/p/15112134.html
Copyright © 2011-2022 走看看