zoukankan      html  css  js  c++  java
  • hbase时间不同步问题引起的bug

    查看步骤:

    一:读取hbase数据库时出现异常

    2018-12-10 10:00:13,620 ERROR [hconnection-0x2609b277-metaLookup-shared--pool1-t2] zookeeper.ZooKeeperWatcher - hconnection-0x2609b277-0x267942b66f701d1, quorum=10.100.2.92:2181,10.100.2.93:2181,10.100.2.94:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
    org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/meta-region-server
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)
        at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
        at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
        at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608)
        at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
        at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
        at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1211)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1178)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1152)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:303)
        at org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:105)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

    二:首先看了下hbase的监控,http://masterHostIp:60010/master-status

    发现少了个serverName。下图是正常状态。

    三:重新启动hbase,命令如下。期间也试过重启zookeeper,再启动hbase。

    启动HBase集群:
    bin/start-hbase.sh
    单独启动一个HMaster进程:
    bin/hbase-daemon.sh start master
    单独停止一个HMaster进程:
    bin/hbase-daemon.sh stop master
    单独启动一个HRegionServer进程:
    bin/hbase-daemon.sh start regionserver
    单独停止一个HRegionServer进程:
    bin/hbase-daemon.sh stop regionserver

    四:发现仍然是有一个服务器的hbase没有启动起来。看hbase的日志:

    2018-12-10 10:40:38,785 FATAL [regionserver/dev-hadoop2/10.100.2.93:16020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
    org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
        at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
        at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:748)
    
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2318)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:907)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
        at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
        at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:748)
    
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1267)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2316)
        ... 2 more
    
    2018-12-10 10:40:38,866 INFO  [regionserver/dev-hadoop2/10.100.2.93:16020] regionserver.HRegionServer: STOPPED: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
        at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
        at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:748)
    
    2018-12-10 10:40:38,995 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
    java.lang.RuntimeException: HRegionServer Aborted
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2697)

    原因是:Time difference of 79232ms > max allowed of 30000ms,总结就是系统时间不同步。

    五:解决方法:

    1- vi /etc/ntp.conf   加上黄色的一行,意思是所有的时间都和10.100.2.93时间同步。

    server 127.127.1.0
    fudge 127.127.1.0 stratum 8
    Broadcastdelay 0.008
    server 0.centos.pool.ntp.org
    server 1.centos.pool.ntp.org
    server 2.centos.pool.ntp.org
    server 10.100.2.93

    2- service ntpd restart    重启ntpd,使配置生效。

    3- ntpq -pn 查看状态(在非2.93上查看):

         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
     185.134.197.4   .STEP.          16 u    -  512    0    0.000    0.000   0.000
     193.228.143.14  .STEP.          16 u    -  512    0    0.000    0.000   0.000
     119.28.206.193  .STEP.          16 u    -  512    0    0.000    0.000   0.000
     85.199.214.100  .STEP.          16 u    -  512    0    0.000    0.000   0.000
    *10.100.2.93     LOCAL(0)         9 u   21   64  377    2.026    0.781   0.804

    没有任何两样东西一样,晶振(计算机硬件)也是有差异的。带来的问题是,时间差异会越来越大。

    refid:参考的上一层NTP主机的地址

    st:即stratum阶层

    when:几秒前曾做过时间同步更新的操作

    poll:下次更新在几秒之后

    reach:已经向上层NTP服务器要求更新的次数

    delay:网络传输过程钟延迟的时间

    offset:时间补偿的结果

    jitter:Linux系统时间与BIOS硬件时间的差异时间

    最后提及一点,ntp服务,默认只会同步系统时间。如果想要让ntp同时同步硬件时间,可以设置/etc/sysconfig/ntpd 文件。

    在/etc/sysconfig/ntpd文件中,添加 SYNC_HWCLOCK=yes 这样,就可以让硬件时间与系统时间一起同步。

    4- 使用定时器定时同步时间。

    详情可参考:https://my.oschina.net/myaniu/blog/182959

  • 相关阅读:
    IntelliJ IDEA常用的快捷键积累总结
    Linux命令(六)之防火墙iptables的相关操作以及端口的开放
    Linux命令(五)之service服务查找、启动/停止等相关操作
    Linux命令(四)之常用文件拷贝/移动,文件解压缩,文件查找等相关的操作
    Linux命令(三)vim编辑器的常用命令
    Linux命令(一)之目录结构、Linux终端操作、关机重启等一些基本操作
    zookeeper核心知识与投票机制详解
    zuul开发实战(限流,超时解决)
    IO多路复用技术详解
    Linux五大网络IO模型图解
  • 原文地址:https://www.cnblogs.com/parent-absent-son/p/10096064.html
Copyright © 2011-2022 走看看