zoukankan      html  css  js  c++  java
  • hadoop集群运维碰到的问题汇总

    1.zookeeper报错

    2017-12-13 16:47:55,968 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
    2017-12-13 16:47:55,968 [myid:] - WARN  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1102] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
    java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

    原因:zookeeper节点挂了,启动即可

    2.kafka消费报错:Job aborted due to stage failure:kafka.common.OffsetOutOfRangeException

    Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): kafka.common.OffsetOutOfRangeException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

    kafka message过期时间log.retention.hours=168

    解决:问题原因是,cosumer-group消费的offset已早于kafka存储的最早的message。参考blog里面有更详尽的解释

    获取topic mysqlslowlog的offset的最小值

    ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list=node:9092 --topic topic_name --time -2

    获取topic:mysqlslowlog的offset的最大值

    ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list=node:9092 --topic topic_name--time -1

    在zk上更新topic partition的offset

    #查partition  0最小值

    get /rootdir/consumers/[cousumer_group]/offsets/mysqlslowlog/0

    #更新partition  0最小值

    set /rootdir/consumers/[cousumer_group]/offsets/mysqlslowlog/0 3546232

    或者可以使用如下命令批量更新为最小值

    ./kafka-run-class.sh kafka.tools.UpdateOffsetsInZK earliest 

    参考:

    http://blog.csdn.net/xueba207/article/details/51135423
    http://blog.csdn.net/xueba207/article/details/51174818

    3.重启hbase regionserver节点报错:

    Server ...,1514436003346 has been rejected; Reported time is too far out of sync with master.  Time difference of 136758ms > max allowed of 30000ms

    一般是因为hmaster 节点和 regionserver节点时间不一致导致。同步时间,重启节点即可。

    4.摘除hdfs  datanode节点,datanode节点一直处于Decommission In Progress状态

    通过WEB UI查看:

    #低于副本数要求的blocks
    Under replicated blocks :2979
    #没有副本的blocks
    Blocks with no live replicas: 0
    #低于副本数要求的blocks,且正在创建中
    Under Replicated Blocks In files under construction:1

    或者通过../bin/hadoop dfsadmin -report命令查看datanode的状态。

    副本数为:2,当Under replicated blocks是越来越低,等于0时,应该就会完全摘除。

    另外,因为同一个rack的datanode节点一般会有一个副本,因此,可以通过修改副本数的方式,快速下线datanode

    #查看集群状态

    ./bin/hadoop fsck / -blocks -locations -files

    #修改副本数(当Blocks with no live replicas为0时可以操作)

     ./bin/hadoop fs -setrep -R 1 /

    #关闭datanode节点,

    ./sbin/hadoop-daemon.sh stop datanode

    #从slaves列表和rack列表中删掉对应节点

    #freshnode或者依次重启namenode

    ./bin/hdfs dfsadmin -refreshNodes
    ./bin/yarn rmadmin -refreshNodes

    5.摘除hdfs的datanode节点

    Failed to add xxxxxxxx:50010: You cannot have a rack and a non-rack node at the same level of the network topology.

     解决:

    通过 ./bin/hdfs dfsadmin -printTopology查看rack list

    刷新

    ./bin/hdfs dfsadmin -refreshNodes
    ./bin/yarn rmadmin -refreshNodes

    不管用,
    (1)页面依然显示状态为dead的datanode,
    (2)依然报You cannot have a rack and a non-rack node at the same level of the network topology.

    依次重启namenode,生效

    ./sbin/hadoop-daemon.sh stop namenode
    ./sbin/hadoop-daemon.sh start namenode

    通过

    ./bin/hdfs dfsadmin -printTopology

    查看rack信息,应该被摘掉的节点也不再显示

  • 相关阅读:
    常用的CSS命名规则 (web标准化设计)
    有哪些概率论和数理统计的深入教材可以推荐?
    CV2X国内现状分析
    隐私计算,新能源汽车“安全上路”的“救命稻草”?
    2022年中国车联网行业全景图谱
    2022年十大AI预测:气候独角兽涌现、中美竞争加剧
    OSEK/VDX介绍
    Adaptive Autosar
    基于我国商密算法的车联网5GV2X通信安全可信体系
    行研篇 | 汽车域控制器研究
  • 原文地址:https://www.cnblogs.com/wyett/p/8146044.html
Copyright © 2011-2022 走看看