一、添加node节点,报错1
注:可以提前在各node节点上修改好(无报错无需执行此项)
yang@node2:~$ sudo kubeadm join 192.168.1.101:6443 --token 6131nu.8ohxo1ttgwiqlwmp --discovery-token-ca-cert-hash sha256:9bece23d1089b6753a42ce4dab3fa5ac7d2d4feb260a0f682bfb06ccf1eb4fe2 [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition To see the stack trace of this error execute with --v=5 or higher
原 因:
1.因为docker默认的Cgroup Driver是cgroupfs ,cgroupfs是cgroup为给用户提供的操作接口而开发的虚拟文件系统类型
它和sysfs,proc类似,可以向用户展示cgroup的hierarchy,通知kernel用户对cgroup改动
对cgroup的查询和修改只能通过cgroupfs文件系统来进行
2.Kubernetes 推荐使用 systemd
来代替 cgroupfs
因为
systemd是Kubernetes自带的cgroup管理器, 负责为每个进程分配cgroups,
但docker的cgroup driver默认是cgroupfs,这样就同时运行有两个cgroup控制管理器,
当资源有压力的情况时,有可能出现不稳定的情况
解决办法:
①.创建daemon.json文件,加入以下内容: ls /etc/docker/daemon.json {"exec-opts": ["native.cgroupdriver=systemd"]} ②.重启docker sudo systemctl restart docker ③.重启kubelet sudo systemctl restart kubelet sudo systemctl status kubelet ④.重新执行加入集群命令 首先清除缓存 sudo kubeadm reset ⑤.加入集群 sudo kubeadm join 192.168.1.101:6443 --token 6131nu.8ohxo1ttgwiqlwmp --discovery-token-ca-cert-hash sha256:9bece23d1089b6753a42ce4dab3fa5ac7d2d4feb260a0f682bfb06ccf1eb4fe2
二、报错2
注:各node上需要有kube-proxy镜像
问题:
Normal BackOff 106s (x249 over 62m) kubelet Back-off pulling image "k8s.gcr.io/kube-proxy:v1.22.4"
原 因:
缺少k8s.gcr.io/kube-proxy:v1.22.4镜像
解决办法:
① 上传镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4
② 修改镜像tag
sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4 k8s.gcr.io/kube-proxy:v1.22.4
三、报错3
注:各node上需要有pause:3.5镜像
问题:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 17m default-scheduler Successfully assigned kube-system/kube-proxy-dv2sw to node2 Warning FailedCreatePodSandBox 4m48s (x26 over 16m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.5": Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Warning FailedCreatePodSandBox 93s (x2 over 16m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.5": Error response from daemon: Get "https://k8s.gcr.io/v2/": dial tcp 74.125.195.82:443: i/o timeout
原 因:
缺少k8s.gcr.io/pause:3.5镜像
解决办法:
① 上传镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5
② 修改镜像tag
sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5 k8s.gcr.io/pause:3.5
四、报错4
注:各node上需要有coredns:v1.8.4镜像
问题:
Warning Failed 28m (x11 over 73m) kubelet Failed to pull image "k8s.gcr.io/coredns/coredns:v1.8.4" : rpc error: code = Unknown desc = Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
这里是一个坑,镜像的tag是v1.8.4,而不是1.8.4,所以导致下载超时
查看官网images列表
yang@master:/etc/docker$ kubeadm config images list k8s.gcr.io/kube-apiserver:v1.22.4 k8s.gcr.io/kube-controller-manager:v1.22.4 k8s.gcr.io/kube-scheduler:v1.22.4 k8s.gcr.io/kube-proxy:v1.22.4 k8s.gcr.io/pause:3.5 k8s.gcr.io/etcd:3.5.0-0 k8s.gcr.io/coredns/coredns:v1.8.4
解决办法:
① 上传镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4
② 修改镜像tag
sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4 k8s.gcr.io/coredns/coredns:v1.8.4