背景
一个7节点的k8s集群,一个master节点的磁盘异常,导致该节点的etcd经常写入卡死,进而拖垮etcd集群,因此考虑将该节点的etcd迁移走。
如上图,tstr501384节点磁盘异常,考虑将上面的etcd迁移到tstr501405a节点,需要首先扩展etcd集群,将tstr501405a节点(IP 10.233.130.47)加入。
操作步骤
1.在一个master节点操作(示例为tstr501382),执行etcdctl member add将tstr501405a节点加入etcd集群。
etcdctl --endpoints=https://10.233.130.15:2379,https://10.233.130.16:2379,https://10.233.130.17:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member add "tstr501405a" --peer-urls="https://10.233.130.47:2380"
如上图,tstr501405a节点已经加入etcd集群,但是为unstarted阶段。
2.拷贝一个master节点的相关文件到待添加节点,主要为启动etcd静态pod的etcd.yaml和相关证书。
scp -rp /etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/pki/ cloudops@10.233.130.47:/home/cloudops/addetcd
3.在待添加的etcd节点操作(示例为tstr501405a节点的/home/cloudops/addetcd目录),利用拷贝过来的证书,修改相关信息,制作启动新etcd节点需要的证书(主要为peer.crt和server.crt)。
以下为制作peer.crt证书操作
#######为待添加节点制作peer.crt证书,以下在待添加节点操作,注意修改dns和ip信息为对应节点信息
cat <<EOF>peer-ssl.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[v3_req]
keyUsage =critical, digitalSignature, keyEncipherment
extendedKeyUsage = TLS Web Server Authentication, TLS Web Client Authentication
subjectAltName = @alt_names
[alt_names]
DNS.1 = tstr501405a
DNS.2 = localhost
IP.1 = 10.233.130.47
IP.2 = 127.0.0.1
EOF
#####利用拷贝过来的master节点证书信息,生成待添加节点的peer.crt证书
openssl req -new -key ./pki/etcd/peer.key -out peer.csr -subj "/CN=$HOSTNAME" -config peer-ssl.conf
openssl x509 -req -in peer.csr -CA ./pki/etcd/ca.crt -CAkey ./pki/etcd/ca.key -CAcreateserial -out peer.crt -days 3650 -extensions v3_req -extfile peer-ssl.conf
以下为制作server.crt证书操作
#######制作待添加节点的server.crt证书,以下在待添加节点操作,注意修改dns和ip信息为对应节点信息
cat <<EOF>server-ssl.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[v3_req]
keyUsage =critical, digitalSignature, keyEncipherment
extendedKeyUsage = TLS Web Server Authentication, TLS Web Client Authentication
subjectAltName = @alt_names
[alt_names]
DNS.1 = tstr501405a
DNS.2 = localhost
IP.1 = 10.233.130.47
IP.2 = 127.0.0.1
EOF
openssl req -new -key ./pki/etcd/server.key -out server.csr -subj "/CN=$HOSTNAME" -config server-ssl.conf
openssl x509 -req -in server.csr -CA ./pki/etcd/ca.crt -CAkey ./pki/etcd/ca.key -CAcreateserial -out server.crt -days 3650 -extensions v3_req -extfile server-ssl.conf
4.参考第1步执行etcdctl member add操作时的返回结果,修改master节点拷贝过来的etcd.yaml文件。
[root@tstr501405a addetcd]# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://10.233.130.47:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://10.233.130.47:2380
- --initial-cluster=tstr501383=https://10.233.130.16:2380,tstr501405a=https://10.233.130.47:2380,tstr501384=https://10.233.130.17:2380,tstr501382=https://10.233.130.15:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://10.233.130.47:2379
- --listen-peer-urls=https://10.233.130.47:2380
- --name=tstr501405a
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: 10.233.71.70:60080/claas/etcd:3.2.24
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
5.拷贝相关证书到/etc/kubernetes/pki/etcd/目录(tstr501405a节点/home/cloudops/addetcd目录操作),拷贝修改好的etcd.yaml到/etc/kubernetes/manifests/,等待etcd静态pod启动并确认加入etcd集群。
cp peer.crt server.crt /etc/kubernetes/pki/etcd/
cp ./pki/etcd/healthcheck-client.crt ./pki/etcd/healthcheck-client.key ./pki/etcd/server.key ./pki/etcd/peer.key ./pki/etcd/ca.crt /etc/kubernetes/pki/etcd/
cp etcd.yaml /etc/kubernetes/manifests/
6.确认待添加的节点etcd启动成功。
如上图,确认新加入etcd集群的tstr501405a节点etcd的pod启动成功,状态变成started。