zoukankan      html  css  js  c++  java
  • kubernetes-集群备份和恢复

    一、备份
     
    思路:
    ①集群运行中etcd数据备份到磁盘上
    ②kubeasz项目创建的集群,需要备份CA证书文件,以及ansible的hosts文件
     
    【deploy节点操作】
    1:创建存放备份文件目录
     1 [root@master ~]# mkdir -p /backup/k8s1 
     
    2:etcd数据保存到备份目录下
    1 [root@master ~]# ETCDCTL_API=3 etcdctl snapshot save /backup/k8s1/snapshot.db
    2 Snapshot saved at /backup/k8s1/snapshot.db
    3 [root@master ~]# du -h /backup/k8s1/snapshot.db 
    4 1.6M    /backup/k8s1/snapshot.db

    3:拷贝kubernetes目录下ssl文件

     1 [root@master ~]# cp /etc/kubernetes/ssl/* /backup/k8s1/
     2 [root@master ~]# ll /backup/k8s1/
     3 总用量 1628
     4 -rw-r--r--. 1 root root 1675 12月 10 21:21 admin-key.pem
     5 -rw-r--r--. 1 root root 1391 12月 10 21:21 admin.pem
     6 -rw-r--r--. 1 root root 997 12月 10 21:21 aggregator-proxy.csr
     7 -rw-r--r--. 1 root root 219 12月 10 21:21 aggregator-proxy-csr.json
     8 -rw-------. 1 root root 1675 12月 10 21:21 aggregator-proxy-key.pem
     9 -rw-r--r--. 1 root root 1383 12月 10 21:21 aggregator-proxy.pem
    10 -rw-r--r--. 1 root root 294 12月 10 21:21 ca-config.json
    11 -rw-r--r--. 1 root root 1675 12月 10 21:21 ca-key.pem
    12 -rw-r--r--. 1 root root 1350 12月 10 21:21 ca.pem
    13 -rw-r--r--. 1 root root 1082 12月 10 21:21 kubelet.csr
    14 -rw-r--r--. 1 root root 283 12月 10 21:21 kubelet-csr.json
    15 -rw-------. 1 root root 1675 12月 10 21:21 kubelet-key.pem
    16 -rw-r--r--. 1 root root 1452 12月 10 21:21 kubelet.pem
    17 -rw-r--r--. 1 root root 1273 12月 10 21:21 kubernetes.csr
    18 -rw-r--r--. 1 root root 488 12月 10 21:21 kubernetes-csr.json
    19 -rw-------. 1 root root 1679 12月 10 21:21 kubernetes-key.pem
    20 -rw-r--r--. 1 root root 1639 12月 10 21:21 kubernetes.pem
    21 -rw-r--r--. 1 root root 1593376 12月 10 21:32 snapshot.db

    4:模拟集群崩溃,执行clean.yml清除操作 

    1 [root@master ~]# cd /etc/ansible/

    2 [root@master ansible]# ansible-playbook 99.clean.yml 

    二、恢复
     
    【deploy节点操作】
    1:恢复ca证书 
    1 [root@master ansible]# mkdir -p /etc/kubernetes/ssl
    2 [root@master ansible]# cp /backup/k8s1/ca* /etc/kubernetes/ssl/
     
    2:开始执行重建集群操作
    1 [root@master ansible]# ansible-playbook 01.prepare.yml
    2 [root@master ansible]# ansible-playbook 02.etcd.yml
    3 [root@master ansible]# ansible-playbook 03.docker.yml
    4 [root@master ansible]# ansible-playbook 04.kube-master.yml
    5 [root@master ansible]# ansible-playbook 05.kube-node.yml

    3:暂停etcd服务

    1 [root@master ansible]# ansible etcd -m service -a 'name=etcd state=stopped'

    4:清空数据 
     1 [root@master ansible]# ansible etcd -m file -a 'name=/var/lib/etcd/member/ state=absent'
     2 [DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user 
     3 configurable on deprecation. This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
     4 [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
     5 
     6 192.168.1.203 | CHANGED => {
     7     "ansible_facts": {
     8         "discovered_interpreter_python": "/usr/bin/python"
     9     }, 
    10     "changed": true, 
    11     "path": "/var/lib/etcd/member/", 
    12     "state": "absent"
    13 }
    14 192.168.1.202 | CHANGED => {
    15     "ansible_facts": {
    16         "discovered_interpreter_python": "/usr/bin/python"
    17     }, 
    18     "changed": true, 
    19     "path": "/var/lib/etcd/member/", 
    20     "state": "absent"
    21 }
    22 192.168.1.200 | CHANGED => {
    23     "ansible_facts": {
    24         "discovered_interpreter_python": "/usr/bin/python"
    25     }, 
    26     "changed": true, 
    27     "path": "/var/lib/etcd/member/", 
    28     "state": "absent"
    29 }

    4:将备份的etcd数据文件同步到每个etcd节点上

     1 [root@master ansible]# for i in 202 203; do rsync -av /backup/k8s1 192.168.1.$i:/backup/; done 
     2 sending incremental file list
     3 created directory /backup
     4 k8s1/
     5 k8s1/admin-key.pem
     6 k8s1/admin.pem
     7 k8s1/aggregator-proxy-csr.json
     8 k8s1/aggregator-proxy-key.pem
     9 k8s1/aggregator-proxy.csr
    10 k8s1/aggregator-proxy.pem
    11 k8s1/ca-config.json
    12 k8s1/ca-key.pem
    13 k8s1/ca.pem
    14 k8s1/kubelet-csr.json
    15 k8s1/kubelet-key.pem
    16 k8s1/kubelet.csr
    17 k8s1/kubelet.pem
    18 k8s1/kubernetes-csr.json
    19 k8s1/kubernetes-key.pem
    20 k8s1/kubernetes.csr
    21 k8s1/kubernetes.pem
    22 k8s1/snapshot.db
    23 
    24 sent 1,615,207 bytes  received 392 bytes  646,239.60 bytes/sec
    25 total size is 1,613,606  speedup is 1.00
    26 sending incremental file list
    27 created directory /backup
    28 k8s1/
    29 k8s1/admin-key.pem
    30 k8s1/admin.pem
    31 k8s1/aggregator-proxy-csr.json
    32 k8s1/aggregator-proxy-key.pem
    33 k8s1/aggregator-proxy.csr
    34 k8s1/aggregator-proxy.pem
    35 k8s1/ca-config.json
    36 k8s1/ca-key.pem
    37 k8s1/ca.pem
    38 k8s1/kubelet-csr.json
    39 k8s1/kubelet-key.pem
    40 k8s1/kubelet.csr
    41 k8s1/kubelet.pem
    42 k8s1/kubernetes-csr.json
    43 k8s1/kubernetes-key.pem
    44 k8s1/kubernetes.csr
    45 k8s1/kubernetes.pem
    46 k8s1/snapshot.db
    47 
    48 sent 1,615,207 bytes  received 392 bytes  1,077,066.00 bytes/sec
    49 total size is 1,613,606  speedup is 1.00

    5:在每个etcd节点执行下面数据恢复操作,然后重启etcd

     ##说明:在/etc/systemd/system/etcd.service找到--inital-cluster etcd1=https://xxxx:2380,etcd2=https://xxxx:2380,etcd3=https://xxxx:2380替换恢复命令中的--initial-cluster{ }变量,--name=【当前etcd-node-name】,最后还需要填写当前节点的IP:2380

    ①【deploy操作】

    1 [root@master ansible]# cd /backup/k8s1/
    2 [root@master k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db  --name etcd1 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380  --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.200:2380 
    3 2019-12-10 22:26:50.037127 I | mvcc: restore compact to 46505
    4 2019-12-10 22:26:50.052409 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
    5 2019-12-10 22:26:50.052451 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
    6 2019-12-10 22:26:50.052474 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d

    执行上面步骤后,会在当前节点目录下,生成一个【node-name】.etcd目录文件

     1 [root@master k8s1]# tree etcd1.etcd/
     2 etcd1.etcd/
     3 └── member
     4     ├── snap
     5     │   ├── 0000000000000001-0000000000000003.snap
     6     │   └── db
     7     └── wal
     8         └── 0000000000000000-0000000000000000.wal
     9 [root@master k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
    10 [root@master k8s1]# systemctl restart etcd   

    ②【etcd2节点操作】

     1 [root@node1 ~]# cd /backup/k8s1/
     2 [root@node1 k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db  --name etcd2 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380  --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.202:2380
     3 2019-12-10 22:28:35.175032 I | mvcc: restore compact to 46505
     4 2019-12-10 22:28:35.232386 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
     5 2019-12-10 22:28:35.232507 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
     6 2019-12-10 22:28:35.232541 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
     7 [root@node1 k8s1]# tree etcd2.etcd/
     8 etcd2.etcd/
     9 └── member
    10     ├── snap
    11     │   ├── 0000000000000001-0000000000000003.snap
    12     │   └── db
    13     └── wal
    14         └── 0000000000000000-0000000000000000.wal
    15 [root@node1 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
    16 [root@node1 k8s1]# systemctl restart etcd

    ③【etcd3节点操作】

     1 [root@node2 ~]# cd /backup/k8s1/
     2 [root@node2 k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db  --name etcd3 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380  --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.203:2380
     3 2019-12-10 22:28:55.943364 I | mvcc: restore compact to 46505
     4 2019-12-10 22:28:55.988674 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
     5 2019-12-10 22:28:55.988726 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
     6 2019-12-10 22:28:55.988754 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
     7 [root@node2 k8s1]# tree etcd3.etcd/
     8 etcd3.etcd/
     9 └── member
    10     ├── snap
    11     │   ├── 0000000000000001-0000000000000003.snap
    12     │   └── db
    13     └── wal
    14         └── 0000000000000000-0000000000000000.wa
    15 [root@node2 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
    16 [root@node2 k8s1]# systemctl restart etcd

    6:在deploy节点上操作重建网络 

    1 [root@master ansible]# cd /etc/ansible/

    2 [root@master ansible]# ansible-playbook tools/change_k8s_network.yml 

    7:查看pod、svc恢复是否成功

    1 [root@master ansible]# kubectl get svc
    2 NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    3 kubernetes   ClusterIP   10.68.0.1       <none>        443/TCP    5d5h
    4 nginx        ClusterIP   10.68.241.175   <none>        80/TCP     5d4h
    5 tomcat       ClusterIP   10.68.235.35    <none>        8080/TCP   76m
    1 [root@master ansible]# kubectl get pods
    2 NAME                     READY   STATUS              RESTARTS   AGE
    3 nginx-7c45b84548-4998z   1/1     Running             0          5d4h
    4 tomcat-8fc9f5995-9kl5b   1/1     Running             0          77m

    三、自动备份、自动恢复

    1:一键备份 

    1 [root@master ansible]# ansible-playbook /etc/ansible/23.backup.yml 

    2:模拟故障

    1 [root@master ansible]# ansible-playbook /etc/ansible/99.clean.yml

    修改文件/etc/ansible/roles/cluster-restore/defaults/main.yml,指定要恢复的etcd快照备份,如果不修改就是最新的一次

    3:执行自动恢复操作

    1 [root@master ansible]# ansible-playbook /etc/ansible/24.restore.yml

    2 [root@master ansible]# ansible-playbook /etc/ansible/tools/change_k8s_network.yml 

     
  • 相关阅读:
    使用Docker搭建svn服务器教程
    VirtualBox上Centos7磁盘扩容
    下载CentOS6.5
    Ubuntu 防火墙常用配置操作(ufw)【适用于 Debian 及其衍生版---Linux Mint、Deepin 等】-转
    诺依/RuoYi开源系统搭建总结
    phpMyAdmin报错#1045
    EasyPHP(php集成环境)下载 v5.4.6官方安装版
    详解----memcache服务端与客户端
    linux 下nginx除了首页404的问题
    linux下禁止root远程登录和添加新用户
  • 原文地址:https://www.cnblogs.com/douyi/p/12019807.html
Copyright © 2011-2022 走看看