zoukankan      html  css  js  c++  java
  • 假如Kafka集群中一个broker宕机无法恢复,应该如何处理?

    假如Kafka集群中一个broker宕机无法恢复, 应该如何处理?

    今天面试时遇到这个问题, 网上资料说添加新的broker, 是不会自动同步旧数据的.

    笨办法

    环境介绍

    三个broker的集群, zk,kafka装在一起

    | broker | IP | broker.id |
    |---------|---------------|-----------|
    | broker1 | 172.18.12.211 | 211 |
    | broker2 | 172.18.12.212 | 212 |
    | broker3 | 172.18.12.213 | 213 |
    

    创建测试topic

    #./bin/kafka-topics.sh --zookeeper 172.18.12.212:2181 --create --topic test1 --replication-factor 3 --partitions 1
    Created topic "test1".
    

    查看

    #./bin/kafka-topics.sh --zookeeper 172.18.12.212:2181 --describe --topic test1
    Topic:test1 PartitionCount:1 ReplicationFactor:3 Configs:
            Topic: test1 Partition: 0 Leader: 213 Replicas: 213,212,211 Isr: 213,212,211
    

    注意当前
    Replicas: 213,212,211
    Isr: 213,212,211

    造一些消息

    #./bin/kafka-console-producer.sh --broker-list 172.18.12.212:9092 --topic test1
    >1
    >2
    >3
    

    kill broker2

    [root@node024212 ~]# ps -ef| grep kafka
    root 17633 1 1 Feb17 ? 00:55:18 /usr/local/java/bin/java -server -Xmx2g - ...
    [root@node024212 ~]# kill -9 17633
    [root@node024212 ~]# ps -ef| grep kafka
    root 21875 21651 0 11:27 pts/2 00:00:00 grep --color=auto kafka
    

    稍等一会, 再次describe test1

    #./bin/kafka-topics.sh --zookeeper 172.18.12.212:2181 --describe --topic test1
    Topic:test1 PartitionCount:1 ReplicationFactor:3 Configs:
            Topic: test1 Partition: 0 Leader: 213 Replicas: 213,212,211 Isr: 213,211
    

    可看到副本仍然是Replicas: 213,212,211
    ISR已经变为Isr: 213,211

    在212启动新broker

    创建一份新的配置文件, 自动一个新的broker

    # cp server.properties server2.properties 
    # vim server2.properties 
    只修改这两个参数
    broker.id=218
    log.dirs=/DATA21/kafka/kafka-logs,/DATA22/kafka/kafka-logs,/DATA23/kafka/kafka-logs,/DATA24/kafka/kafka-logs
    

    创建相应目录

    mkdir -p /DATA21/kafka/kafka-logs
    mkdir -p /DATA22/kafka/kafka-logs
    mkdir -p /DATA23/kafka/kafka-logs
    mkdir -p /DATA24/kafka/kafka-logs
    

    启动新broker

    ./bin/kafka-server-start.sh -daemon config/server2.properties 
    

    稍等, 查看 test1 状态

    #./bin/kafka-topics.sh --zookeeper 172.18.12.212:2181 --describe --topic test1
    Topic:test1 PartitionCount:1 ReplicationFactor:3 Configs:
            Topic: test2 Partition: 0 Leader: 213 Replicas: 213,212,211 Isr: 213,218,211
    

    可以看到 test1 副本仍然是Replicas: 213,212,211
    ISR为Isr: 213,218,211. 也就是说缺失的副本不会自动迁移到新broker上.

    使用kafka-reassign-partitions.sh重分配分区

    将212删除,添加218

    [root@node024211 12:04:48 /usr/local/kafka]
    #echo '{"version":1,"partitions":[{"topic":"test1","partition":0,"replicas":[211,213,218]}]}' > increase-replication-factor.json
    
    [root@node024211 12:58:30 /usr/local/kafka]
    #./bin/kafka-reassign-partitions.sh --zookeeper 172.18.12.211:2181 --reassignment-json-file increase-replication-factor.json --execute
    Current partition replica assignment
    
    {"version":1,"partitions":[{"topic":"test1","partition":0,"replicas":[213,212,211],"log_dirs":["any","any","any"]}]}
    
    Save this to use as the --reassignment-json-file option during rollback
    Successfully started reassignment of partitions.
    
    [root@node024211 12:58:49 /usr/local/kafka]
    #./bin/kafka-reassign-partitions.sh --zookeeper 172.18.12.211:2181 --reassignment-json-file increase-replication-factor.json --verify
    Status of partition reassignment: 
    Reassignment of partition test1-0 completed successfully
    

    查看topic信息

    #./bin/kafka-topics.sh --zookeeper 172.18.12.212:2181 --describe --topic test1
    Topic:test1 PartitionCount:1 ReplicationFactor:3 Configs:
            Topic: test1 Partition: 0 Leader: 213 Replicas: 211,213,218 Isr: 213,211,218
    

    验证218是否有全部数据

    虽然看副本信息中已经有了218, 但是218是否包含旧消息呢?
    我的办法是, kill 211,213, 然后–from-beginning 消费218数据, 实际测试也是可以的

    #./bin/kafka-console-consumer.sh --bootstrap-server 172.18.12.212:9092 --topic test1 --from-beginning
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    11
    

    看了下211 218的log文件大小也是一样的

    [2019-02-21 13:29:19]#ls -l /DATA22/kafka/kafka-logs/test1-0/
    [2019-02-21 13:29:19]total 8
    [2019-02-21 13:29:19]-rw-r--r--. 1 root root 10485760 Feb 21 12:58 00000000000000000000.index
    [2019-02-21 13:29:19]-rw-r--r--. 1 root root 381 Feb 21 13:00 00000000000000000000.log
    [2019-02-21 13:29:19]-rw-r--r--. 1 root root 10485756 Feb 21 12:58 00000000000000000000.timeindex
    [2019-02-21 13:29:19]-rw-r--r--. 1 root root 16 Feb 21 13:00 leader-epoch-checkpoint
    

    更简单的办法

    通过阅读文档发现
    https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Howtoreplaceafailedbroker

    How to replace a failed broker?
    When a broker fails, Kafka doesn’t automatically re-replicate the data on the failed broker to other brokers. This is because in the common case, one brings down a broker to apply code or config changes, and will bring up the broker quickly afterward. Re-replicating the data in this case will be wasteful. In the rarer case that a broker fails completely, one will need to bring up another broker with the same broker id on a new server. The new broker will automatically replicate the missing data.

    这上面说的,如果服务器真的坏了, 只需要新启动一个broker, 把broker.id设置为 损坏的那个broker的id, 就会自动复制过去丢失的数据。

    实际测试了一下, 确实可以恢复。

  • 相关阅读:
    day03--变量与基本数据类型
    day02--编程语言的分类与Python开发环境的搭建
    day01--编程与计算机组成原理
    基本数据类型操作
    python格式化字符串
    Python垃圾回收机制
    day04作业
    day03作业
    Pycharm2018安装与激活
    Python入门-python浅谈
  • 原文地址:https://www.cnblogs.com/leffss/p/11294942.html
Copyright © 2011-2022 走看看