偶然听开发说服务发现是不是出故障了,查看日志发现Error: etcdserver: mvcc: database space exceeded,发现ETCD默认最大空间使用已经达到了2GB,临时将etcd库直接删除重启这种暴力方式解决了,但是没有解决根本问题,下次一定还会再出现,所以在本地测试一下增加空间和自动压缩。
先将etcd最大空间设置为16M启动
[root@localhost ~]# etcd --quota-backend-bytes=$((16*1024*1024))
循环写入数据
[root@localhost ~]# while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024 | ETCDCTL_API=3 etcdctl put key || break; done
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.010793 s, 97.2 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0178137 s, 58.9 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0270086 s, 38.8 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0208378 s, 50.3 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0213647 s, 49.1 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0180274 s, 58.2 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.017903 s, 58.6 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00862558 s, 122 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0276305 s, 37.9 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0169549 s, 61.8 MB/s
OK
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0390035 s, 26.9 MB/s
Error: etcdserver: mvcc: database space exceeded
发现上述空间已满,查看如下超出16MB限制
[root@localhost ~]# ETCDCTL_API=3 etcdctl --endpoints="http://192.168.32.134:2379" --write-out=table endpoint status
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://192.168.32.134:2379 | 8e9e05c52164694d | 3.3.8 | 19 MB | true | 2 | 16 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
我们再次尝试往里面写入数据,发现还是空间已满
[root@localhost ~]# ETCDCTL_API=3 etcdctl put newkey 123
Error: etcdserver: mvcc: database space exceeded
查看告警被触发
[root@localhost ~]# ETCDCTL_API=3 etcdctl alarm list
memberID:10276657743932975437 alarm:NOSPACE
删除多读的键空间将把集群带回配额限制,因此警告能被接触:
#获取当前修订版本
[root@localhost ~]# rev=$(ETCDCTL_API=3 etcdctl --endpoints=:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
#压缩所有旧的修订版本
[root@localhost ~]# ETCDCTL_API=3 etcdctl compact $rev
compacted revision 1516
#反碎片化过度空间
[root@localhost ~]# ETCDCTL_API=3 etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
#解除警告
[root@localhost ~]# ETCDCTL_API=3 etcdctl alarm disarm
memberID:10276657743932975437 alarm:NOSPACE
#测试put又允许了
[root@localhost ~]# ETCDCTL_API=3 etcdctl put newkey newvalue
OK
针对上述改进如下:
修改etcd.service文件增加每小时自动压缩和空间限额为8G
[root@localhost ~]# vim /etc/systemd/system/etcd.service
[Unit]
Description=Etcd
After=network.target
Before=flanneld.service
[Service]
User=root
ExecStart=/usr/local/bin/etcd
-name etcd1
-data-dir /var/lib/etcd
--advertise-client-urls http://192.168.32.134:2379,http://127.0.0.1:2379
--listen-client-urls http://192.168.32.134:2379,http://127.0.0.1:2379
--auto-compaction-retention=1 #开启每隔一个小时自动压缩
--quota-backend-bytes=8388608000 #磁盘空间调整为8GB
Restart=on-failure
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
重启服务
[root@localhost ~]# systemctl daemon-reload;systemctl restart etcd
ETCD相关操作命令:
#列出所有key
ETCDCTL_API=3 etcdctl get --prefix ""
#基于相同前缀查找
ETCDCTL_API=3 etcdctl get /test/ok --prefix
#添加key
ETCDCTL_API=3 etcdctl put /test/ok 11
#删除key
ETCDCTL_API=3 etcdctl del /test/ok
#删除所有/test前缀的节点
ETCDCTL_API=3 etcdctl del /test --prefix
#监听key
ETCDCTL_API=3 etcdctl wath /test/ok
#监听子节点
ETCDCTL_API=3 etcdctl /test/ok --prefix
#查看etcd集群成员状态
ETCDCTL_API=3 etcdctl --write-out=table --endpoints=localhost:2379 member list