k8s版本升级
前言
注意:根据不同的业务选择流量低谷的时候操作,比如游戏公司,肯定不能晚上12点停服升级,可以选择凌晨四五点。像银行,就可以选择晚上升级。
查看节点状态
[root@hdss7-21 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
hdss7-21.host.com Ready master,node 13d v1.15.2
hdss7-22.host.com Ready master,node 13d v1.15.2
查看pod状态
[root@hdss7-21 ~]# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6b6c4f9648-ttfg8 1/1 Running 0 47h 172.7.21.3 hdss7-21.host.com <none> <none>
kubernetes-dashboard-6d58ccc9fc-jzjj6 1/1 Running 0 23h 172.7.21.5 hdss7-21.host.com <none> <none>
traefik-ingress-fnrbg 1/1 Running 0 35h 172.7.22.4 hdss7-22.host.com <none> <none>
traefik-ingress-kxm8t 1/1 Running 0 35h 172.7.21.4 hdss7-21.host.com <none> <none>
删除节点
[root@hdss7-21 ~]# kubectl delete node hdss7-21.host.com
node "hdss7-21.host.com" deleted
[root@hdss7-21 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
hdss7-22.host.com Ready master,node 13d v1.15.2
再次查看pod状态
[root@hdss7-21 ~]# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6b6c4f9648-6vsvq 1/1 Running 0 75s 172.7.22.5 hdss7-22.host.com <none> <none>
kubernetes-dashboard-6d58ccc9fc-ptfn9 1/1 Running 0 75s 172.7.22.6 hdss7-22.host.com <none> <none>
traefik-ingress-fnrbg 1/1 Running 0 35h 172.7.22.4 hdss7-22.host.com <none> <none>
可以看到,之前在7-21节点上的pod,已经自动迁移到7-22节点上。
查看服务是否受影响。
[root@hdss7-21 ~]# dig -t A kubernetes.default.svc.cluster.local @192.168.0.2 +short
192.168.0.1
注释nginx和conf文件对应内容并重启相应的服务。
[root@hdss7-11 conf.d]# vim /etc/nginx/nginx.conf
# server 10.4.7.21:6443 max_fails=3 fail_timeout=30s;
[root@hdss7-11 conf.d]# vim od.com.conf
# server 10.4.7.21:81 max_fails=3 fail_timeout=10s;
[root@hdss7-11 conf.d]# nginx -t
[root@hdss7-11 conf.d]# nginx -s reload
上传和解压包
上传略。
解压
[root@hdss7-21 src]# tar -zxvf kubernetes-server-linux-amd64-v1.15.4.tar.gz -C .
[root@hdss7-21 src]# mv kubernetes kubernetes-v1.15.4
[root@hdss7-21 src]# cp -r kubernetes-v1.15.4 /opt/
[root@hdss7-21 src]# cd ..
[root@hdss7-21 opt]# rm -rf kubernetes
[root@hdss7-21 opt]# ln -s kubernetes-v1.15.4/ /opt/kubernetes
[root@hdss7-21 opt]# cd kubernetes
[root@hdss7-21 kubernetes]# rm -rf kubernetes-src.tar.gz
[root@hdss7-21 kubernetes]# cd server/
[root@hdss7-21 server]# cd bin/
[root@hdss7-21 bin]# rm -rf *.tar
[root@hdss7-21 bin]# rm -rf *_tag
[root@hdss7-21 bin]# mkdir conf cert
拷贝cert和conf
[root@hdss7-21 cert]# cp /opt/kubernetes-v1.15.2/server/bin/cert/* .
[root@hdss7-21 cert]# ll
total 32
-rw------- 1 root root 1675 Aug 24 22:08 apiserver-key.pem
-rw-r--r-- 1 root root 1598 Aug 24 22:08 apiserver.pem
-rw------- 1 root root 1679 Aug 24 22:08 ca-key.pem
-rw-r--r-- 1 root root 1346 Aug 24 22:08 ca.pem
-rw------- 1 root root 1675 Aug 24 22:08 client-key.pem
-rw-r--r-- 1 root root 1367 Aug 24 22:08 client.pem
-rw------- 1 root root 1675 Aug 24 22:08 kubelet-key.pem
-rw-r--r-- 1 root root 1468 Aug 24 22:08 kubelet.pem
[root@hdss7-21 conf]# ll
total 24
-rw-r--r-- 1 root root 2224 Aug 24 22:09 audit.yaml
-rw-r--r-- 1 root root 259 Aug 24 22:09 k8s-node.yaml
-rw------- 1 root root 6199 Aug 24 22:09 kubelet.kubeconfig
-rw------- 1 root root 6223 Aug 24 22:09 kube-proxy.kubeconfig
[root@hdss7-21 conf]# cp ..
[root@hdss7-21 bin]# cp /opt/kubernetes-v1.15.2/server/bin/*.sh .
使用supervisorctl重启所有节点
[root@hdss7-21 bin]# supervisorctl restart all
[root@hdss7-21 bin]# kubectl get node
NAME STATUS ROLES AGE VERSION
hdss7-21.host.com Ready <none> 9m56s v1.15.4
hdss7-22.host.com Ready master,node 14d v1.15.2
[root@hdss7-21 bin]# supervisorctl restart all
flanneld-7-21: ERROR (spawn error)
[root@hdss7-21 bin]# supervisorctl status flanneld-7-21
flanneld-7-21 FATAL Exited too quickly (process log may have details)
I0824 22:12:40.503300 45487 main.go:587] Start healthz server on 0.0.0.0:2401
E0824 22:12:40.503356 45487 main.go:595] Start healthz server error. listen tcp 0.0.0.0:2401: bind: address already in use
panic: listen tcp 0.0.0.0:2401: bind: address already in use
[root@hdss7-21 bin]# netstat -luntp | grep flannel
tcp6 0 0 :::2401 :::* LISTEN 6948/./flanneld
[root@hdss7-21 bin]# kill -9 6948
[root@hdss7-21 bin]# supervisorctl start flanneld-7-21
flanneld-7-21: started
查看节点状态
[root@hdss7-21 bin]# kubectl get node
NAME STATUS ROLES AGE VERSION
hdss7-21.host.com Ready <none> 9m56s v1.15.4
hdss7-22.host.com Ready master,node 14d v1.15.2
[root@hdss7-21 bin]# cd /opt/
去7-22上做同样操作。
注意:需要到7-11上修改nginx配置文件和od.com.conf文件,并重新加载nginx配置文件
vim /etc/nginx/conf.d/nginx.conf
server 10.4.7.21:81 max_fails=3 fail_timeout=10s;
# server 10.4.7.22:81 max_fails=3 fail_timeout=10s;
vim /etc/nginx/conf.d/od.com.conf
server 10.4.7.21:6443 max_fails=3 fail_timeout=30s;
# server 10.4.7.22:6443 max_fails=3 fail_timeout=30s;
查看节点状态
[root@hdss7-22 bin]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
hdss7-21.host.com Ready <none> 31m v1.15.4
hdss7-22.host.com Ready master,node 14d v1.15.4
突然发现,我这里忘记先删除7-22节点了,但是依然更新成功了,通过查阅资料发现,删除node是为了能够平滑升级,需要驱逐pod,使pod能够自动迁移到另外的节点上,如果不删除node节点直接升级,会导致服务暂停。而并不会迁移到其他节点上。
[root@hdss7-21 opt]# kubectl label node hdss7-21.host.com node-role.kubernetes.io/node=
[root@hdss7-21 opt]# kubectl label node hdss7-21.host.com node-role.kubernetes.io/master=
[root@hdss7-21 opt]# kubectl get node
NAME STATUS ROLES AGE VERSION
hdss7-21.host.com Ready master,node 47m v1.15.4
hdss7-22.host.com Ready master,node 14d v1.15.4
然后修改7-11的nginx配置文件,把注释掉的内容取消注释。
[root@hdss7-11 conf.d]# vim /etc/nginx/od.com.conf
[root@hdss7-11 conf.d]# vim /etc/nginx/nginx.conf
[root@hdss7-11 conf.d]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@hdss7-11 conf.d]# nginx -s reload