etcd频繁选举leader
集群中etcd出现报警
Alert Name: A high number of leader changes within the etcd cluster are happening Severity: warning Cluster Name: shdmz-prod-diamond (ID: c-n6wc4) Namespace: cattle-prometheus Expression: increase(etcd_server_leader_changes_seen_total[1h])>3 Description: Threshold Crossed: datapoint value 4.067796610169491 was greater than to the threshold (3) for (3m)
日志中发现的问题,还有类似心跳检测超时的情况
2020-07-08 11:32:11.730958 W | rafthttp: the clock difference against peer db40725e6f94d8e3 is too high [13.717094955s > 1s] (prober "ROUND_TRIPPER_RAFT_MESSAGE")
解决方式
1、集群中有某些机器时间不同步
2、扩大心跳检测时长
- --election-timeout=5000 - --heartbeat-interval=500