zoukankan      html  css  js  c++  java
  • 报错:(未解决)NoReplicaOnlineException: No replica in ISR for partition __consumer_offsets-8 is alive. Live brokers are: [Set(50, 51, 52)], ISR brokers are: [68]

    报错背景:

     CDH集成kafka插件之后,启动kafka时就报出此错误。

    报错现象:

    2019-05-17 08:18:06,428 ERROR state.change.logger: [Controller id=50 epoch=4447617] Initiated state change for partition __consumer_offsets-8 from OfflinePartition to OnlinePartition failed
    kafka.common.NoReplicaOnlineException: No replica in ISR for partition __consumer_offsets-8 is alive. Live brokers are: [Set(50, 51, 52)], ISR brokers are: [68]
            at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:65)
            at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:303)
            at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:163)
            at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:84)
            at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:81)
            at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
            at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
            at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
            at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
            at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
            at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
            at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
            at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:81)
            at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:58)
            at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:298)
            at kafka.controller.KafkaController.elect(KafkaController.scala:1681)
            at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1610)
            at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
            at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
            at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
            at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
            at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
            at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)

    报错原因:

    主要信息:No replica in ISR for partition __consumer_offsets-8 is alive

    信息翻译:ISR中没有用于分区__consumer_offsets-8的副本存活

    根据网上的资料,可以初步分析原因是leader的选举出现了问题。

    四种 leader 选举实现类及对应触发条件如下所示:

    实现触发条件
    OfflinePartitionLeaderSelector leader 掉线时触发
    ReassignedPartitionLeaderSelector 分区的副本重新分配数据同步完成后触发的
    PreferredReplicaPartitionLeaderSelector 最优 leader 选举,手动触发或自动 leader 均衡调度时触发
    ControlledShutdownLeaderSelector broker 发送 ShutDown 请求主动关闭服务时触发

     

     

     

     

    OfflinePartitionLeaderSelector Partition leader 选举的逻辑是:

    1. 如果 isr 中至少有一个副本是存活的,那么从该 Partition 存活的 isr 中选举第一个副本作为新的 leader,存活的 isr 作为新的 isr;
    2. 否则,如果脏选举(unclear elect)是禁止的,那么就抛出 NoReplicaOnlineException 异常;
    3. 否则,即允许脏选举的情况下,从存活的、所分配的副本(不在 isr 中的副本)中选出一个副本作为新的 leader 和新的 isr 集合;
    4. 否则,即是 Partition 分配的副本没有存活的,抛出 NoReplicaOnlineException 异常;

    根据以上信息可知,kafka的副本有挂掉的,但是具体什么原因我无法定位。

    报错解决:

     如果是CDH报错,我的做法是将kafka的所以topic都给删除

    1.使用命令删除topic:
    kafka-topics.sh --delete --zookeeper localhost:2181 --topic AlarmHis
    只是这样事实上并没有真正删Topic
    2.进入/tmp/kafka-logs目录,删除文件名为test的文件夹
    3.进入zookeeper的安装目录,再进入bin目录下,
    使用命令启动zookeeper客户端 zookeeper-client
    再使用命令 ls /brokers/topics 查看所建的topic,
    使用命令 rmr /brokers/topics/test

    删除完成之后关闭所有服务,重启计算机,启动集群。

    此时CDH没有了报错,但是后来发现云主机中kafka的log文件里依然报错产生,暂时未能解决。

    参考:https://www.colabug.com/3174494.html

  • 相关阅读:
    水晶报表的部署
    成熟是一种明亮而不刺眼的光辉...
    获取页面地址的各种返回值
    索引的基本原理(转)
    cron
    VS2010 测试 普通单元测试
    SQL 学习笔记
    负载均衡
    Expression 常用方法
    轻松实现QQ用户接入
  • 原文地址:https://www.cnblogs.com/chuijingjing/p/10880761.html
Copyright © 2011-2022 走看看