zoukankan      html  css  js  c++  java
  • Cassanda节点重启后无法加入集群并报错“received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397”

    目前环境有一套6节点2数据中心的cassandra集群,版本为2.1.9。

    今天将集群中一台机器10.168.12.3重启后发现该节点无法加入集群,现象分析。

    在重启后的节点查看集群状态,发现集群状态一切正常。

    $ nodetool status
    Datacenter: DC-SGM-DR
    =====================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns    Host ID                               Rack
    UN  10.168.50.205  822.91 MB  256     ?       bea84e24-76c8-4070-9c41-d0051d8aba63  RAC-1B
    UN  10.168.50.212  825.43 MB  256     ?       97e92d11-028a-44f6-b6ea-be3992985506  RAC-1B
    UN  10.168.50.213  14.37 GB   256     ?       de47960c-54ab-4ed3-99e7-e3abcb66c014  RAC-1B
    Datacenter: DC-SGM-SH
    =====================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns    Host ID                               Rack
    UN  10.168.11.11   10.17 GB   256     ?       9d016b9f-5655-4899-8652-607bdc24eda3  RAC-1A
    UN  10.168.12.3    831.42 MB  256     ?       57c4d98b-c52c-48bf-b8ee-7d8f22bcc08f  RAC-1A
    UN  10.168.11.6    828.2 MB   256     ?       9cf69121-4dbc-419c-b3a8-e166d83b4177  RAC-1A
    
    Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

    我们登录集群其他节点查看集群状态

    $ nodetool status
    Datacenter: DC-SGM-DR
    =====================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns    Host ID                               Rack
    UN  10.168.50.205  828.16 MB  256     ?       bea84e24-76c8-4070-9c41-d0051d8aba63  RAC-1B
    UN  10.168.50.212  825.43 MB  256     ?       97e92d11-028a-44f6-b6ea-be3992985506  RAC-1B
    UN  10.168.50.213  14.37 GB   256     ?       de47960c-54ab-4ed3-99e7-e3abcb66c014  RAC-1B
    Datacenter: DC-SGM-SH
    =====================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns    Host ID                               Rack
    UN  10.168.11.11   834.48 MB  256     ?       9d016b9f-5655-4899-8652-607bdc24eda3  RAC-1A
    DN  10.168.12.3    831.31 MB  256     ?       57c4d98b-c52c-48bf-b8ee-7d8f22bcc08f  RAC-1A
    UN  10.168.11.6    828.17 MB  256     ?       9cf69121-4dbc-419c-b3a8-e166d83b4177  RAC-1A
    
    Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

    我们发现集群其他节点显示被重启的节点为“DN”状态,并在各节点的cassandra的system.log文件报错

    ..................................................
    WARN  [GossipStage:1] 2020-01-02 10:07:45,831 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:47,680 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:49,682 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:50,690 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:50,833 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:51,681 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:51,833 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:52,833 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:54,684 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:55,683 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:55,834 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:57,683 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:07:58,684 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:00,684 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:01,688 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:05,686 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:06,686 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:08,838 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:09,839 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:11,688 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:11,839 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:12,840 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:13,688 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:17,841 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:20,690 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:21,691 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:21,843 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:22,691 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    WARN  [GossipStage:1] 2020-01-02 10:08:22,843 Gossiper.java:1105 - received an invalid gossip generation for peer /10.168.12.3; local generation = 1527840276, received generation = 1577928397
    ..................................................

    我们登录被重启的cassandra节点查看gossipinfo

    $ nodetool gossipinfo
    .................................
    /10.168.12.3
      generation:1527840276
      heartbeat:22488596
      HOST_ID:57c4d98b-c52c-48bf-b8ee-7d8f22bcc08f
      SCHEMA:54b29ca7-5a9c-345b-be73-437504faf71b
      SEVERITY:0.0
      NET_VERSION:8
      RACK:RAC-1A
      DC:DC-SGM-SH
      RELEASE_VERSION:2.1.9
      STATUS:NORMAL,-101651619030947983
      RPC_ADDRESS:10.168.12.3
      LOAD:8.72963151E8
    .................................

    可以看到其他节点记录重启节点的generation的epoch为1527840276,我们转换成可读时间为2018年6月1日FridayAM8点04分,该时间为我们启动cassandra的时间,登录重启节点,查看local表的

    cqlsh `hostname` -u cassandra
    cassandra@cqlsh> use system;
    cassandra@cqlsh:system> select key , gossip_generation from local ;
    
     key   | gossip_generation
    -------+-------------------
     local |        1577928397
    
    (1 rows)

    将1577928397转换为2020年1月2日ThursdayAM1点26分,可以看到两个时间点之间间隔一年半时间,也就是说上次cassandra启动的时间还是2018年6月1日FridayAM8点04分,其实这次重启触发了一个cassandra的bug

    https://issues.apache.org/jira/browse/CASSANDRA-10969

    https://support.datastax.com/hc/en-us/articles/115001096783-Nodes-showing-DN-in-nodetool-status-with-invalid-gossip-generation-warning-in-logs

    可以查看大牛写的blog

    https://mash213.wordpress.com/2019/07/05/scylla-received-an-invalid-gossip-generation-for-peer-how-to-resolve/

    我们依次将集群节点重启。

  • 相关阅读:
    Uva 1636 决斗
    数论初步
    Gym 100169A 最短路
    Uva 12169 不爽的裁判 模运算
    Uva 11582 巨大的斐波那契数 模运算
    Uva 10791 最小公倍数的最小和 唯一分解定理
    Uva 10375 选择与除法 唯一分解定理
    poj 3485 区间选点
    第二届团体程序设计天梯赛
    Uva 12657 双向链表
  • 原文地址:https://www.cnblogs.com/ilifeilong/p/12133529.html
Copyright © 2011-2022 走看看