背景:最近在Azure上自建原生k8s集群,然后很不稳定有时雪崩,日志一直报如下, 网上查了很多资料说是etcd数据存储磁盘io影响etcd查查询慢。确实我这块存储盘是hdd,于是打算迁到ssd盘试下还会不会有这种情况。
etcdserver: read-only range request took too long with etcd 3.2.24 #70082
迁移方法:
- 关闭服务并拷贝数据库文件到新的目录下
[root@node1 ~]# systemctl stop etcd [root@node1 ~]# cp -ar /data/etcd/ /var/lib/ [root@node1 ~]# ll /var/lib/etcd/ total 0 drwx------. 4 root root 29 Apr 28 05:54 member
2. 修改etcd.env文件新数据目录
[root@node1 ~]# vim /etc/etcd.env # Environment file for etcd v3.3.12 ETCD_DATA_DIR=/var/lib/etcd
3.修改启动参数
[root@node1 ~]# vim /usr/local/bin/etcd #!/bin/bash /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=0 --blkio-weight=1000 --name=etcd1 quay.io/coreos/etcd:v3.3.12 /usr/local/bin/etcd "$@"
4.启动服务
[root@node1 ~]# systemctl start etcd [root@node1 ~]# systemctl status etcd ● etcd.service - etcd docker wrapper Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-04-29 12:40:24 UTC; 5s ago Process: 4218 ExecStop=/usr/bin/docker stop etcd1 (code=exited, status=0/SUCCESS) Process: 6071 ExecStartPre=/usr/bin/docker rm -f etcd1 (code=exited, status=0/SUCCESS) Main PID: 6086 (etcd) Tasks: 16 Memory: 34.4M CGroup: /system.slice/etcd.service ├─6086 /bin/bash /usr/local/bin/etcd └─6088 /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var... Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.938238 I | rafthttp: established a TCP streaming connection with peer 1c74700fc9501a08 (stream Message reader) Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.938385 I | rafthttp: established a TCP streaming connection with peer bfb5d71282c2db49 (stream MsgApp v2 reader) Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.979248 I | etcdserver: 465aba9a8e04dd3f initialzed peer connection; fast-forwarding 3 ticks (election ticks 5) with...ctive peer(s) Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.006641 I | mvcc: store.index: compact 1078020 Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.010060 I | mvcc: finished scheduled compaction at 1078020 (took 2.330859ms) Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016608 I | etcdserver: published {Name:etcd1 ClientURLs:[https://10.10.10.11:2379]} to cluster 4059f5ad1e3ba1cc Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016643 I | embed: ready to serve client requests Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016964 I | embed: ready to serve client requests Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.019571 I | embed: serving client requests on 10.10.10.11:2379 Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.020805 I | embed: serving client requests on 127.0.0.1:2379 Hint: Some lines were ellipsized, use -l to show in full. [root@node1 ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"}