zoukankan      html  css  js  c++  java
  • kubeadm renew更新证书后,master依旧过期

    背景

    kubernetes: 1.16.3

    master: 3台

    采用kubeadm部署,在证书还有30天以上时,使用kubeadm alpha certs renew all更新所有证书,以为万无一失,但是到原有的证书过期时间,发现API异常告警。

    问题

    api-server日志:

    E0721 08:09:28.129981       1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.133091       1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.133460       1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.135093       1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.139986       1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.141188       1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
    E0721 08:09:28.143084       1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
    

    解决思路

    1.kubectl无法使用,显示证书过期或者失效,随即

    $ kubectl get node
    Unable to connect to the server: x509: certificate has expired or is not yet valid
    

    2.随即将/etc/kubernetes/admin.conf复制~/.kube/conf,执行命令后,还是返回步骤1的错误,随即意识到是api-server的问题。

    3.检查pki目录下的所有证书过期时间

    $ for i in $(ls *.crt); do echo "===== $i ====="; openssl x509 -in $i -text -noout | grep -A 3 'Validity' ; done
    ===== apiserver.crt =====
            Validity
                Not Before: Jul 21 08:08:33 2020 GMT
                Not After : Apr 14 05:58:54 2022 GMT
            Subject: CN=kube-apiserver
    ===== apiserver-kubelet-client.crt =====
            Validity
                Not Before: Jul 21 08:08:33 2020 GMT
                Not After : Apr 14 06:00:03 2022 GMT
            Subject: O=system:masters, CN=kube-apiserver-kubelet-client
    ===== ca.crt =====
            Validity
                Not Before: Jul 21 08:08:33 2020 GMT
                Not After : Jul 19 08:08:33 2030 GMT
            Subject: CN=kubernetes
    ===== front-proxy-ca.crt =====
            Validity
                Not Before: Jul 21 08:08:34 2020 GMT
                Not After : Jul 19 08:08:34 2030 GMT
            Subject: CN=front-proxy-ca
    ===== front-proxy-client.crt =====
            Validity
                Not Before: Jul 21 08:08:34 2020 GMT
                Not After : Apr 14 06:00:40 2022 GMT
            Subject: CN=front-proxy-client
    

    4.pki下证书检测均没有过期,问题就很明显,出在正在运行的container上,重启kubelet无法解决问题

    5.重启kulet并不会重建container

    systemctl stop kubelet
    docker ps -q | xargs docker stop
    df -Th | grep "docker" | awk '{print $NF}' | xargs umount
    df -Th | grep "kubelet" | awk '{print $NF}' | xargs umount
    docker ps -a -q | xargs docker rm
    systemctl restart kubelet
    

    6.重启后,发现除了主控制节点(kubeadm init的第一台服务器),其余master节点都恢复正常

    7.查看报错信息,主控制节点上的kubelet无法启动

    Jul 21 17:53:03 master001 kubelet[23047]: E0721 17:53:03.545713   23047 bootstrap.go:250] unable to load TLS configuration from existing bootstrap client config: tls: private key does not match public key
    Jul 21 17:53:03 master001 kubelet[23047]: F0721 17:53:03.545749   23047 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
    

    8.对比其他集群,发现均没有/etc/kubernetes/bootstrap-kubelet.conf文件

    9.随即针对kubelet进行检查

    $ cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
    # Note: This dropin only works with kubeadm and kubelet v1.11+
    [Service]
    Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
    

    kubelet在启动时,会优先读取kubelet.conf,当kubeleyt不可用时,才读取bootstrap-kubelet.conf

    10.对比住控制节点上的kubelet.conf发现
    主控制节点

    users:
    - name: system:node:master001
      user:
        client-certificate-data: *****
        client-key-data: *****
    

    ***** 为hash值
    其他主控节点

    users:
    - name: default-auth
      user:
        client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
        client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
    

    可以看到直接读取的文件,在首个master节点初始化集群时,kubelet还没加入集群,也就没法生成这个文件,所以直接用的hash值,我们只需要手动更改即可。

    11.重启首台master,集群全部恢复正常

    总结

    1.kubeadm renew在更新证书后,api-server,controller,schedule并不会重载证书文件,需要重建container。
    2.主控制节点的kubelet.conf中使用的hash值,并不是文件,而且kubeadm renew也不会更新这个文件,所以需要手动操作。

    每天学习一点点,重在积累!
  • 相关阅读:
    Maven
    Mybatis
    WinDbg的安装、配置和功能(转发)
    gRPC —— gRPC 基础: C#(待续)
    gRPC —— 通讯协议
    gRPC —— 安全认证
    gRPC —— 概念
    gRPC —— 概览
    grpc和protocol buffer介绍&实例(转载)
    protocol buffers ——git 源码
  • 原文地址:https://www.cnblogs.com/GXLo/p/15042913.html
Copyright © 2011-2022 走看看