尝试的解决方案:
- 升级docker,因为通过查看,集群中的机器docker进程版本并不完全相同,升级完之后并且重启docker进程
- 通过describe信息查看得到以下输出
State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 23 Mar 2020 16:24:15 +0800
Finished: Mon, 23 Mar 2020 16:24:27 +0800
Ready: False
Restart Count: 29
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 48m (x43134 over 17h) kubelet, master Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-d7xxk": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused "read init-p: connection reset by peer"": unknown
Warning FailedCreatePodSandBox 13m (x8967 over 16h) kubelet, master Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-d7xxk": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:299: copying bootstrap data to pipe caused "write init-p: broken pipe"": unknown
Normal SandboxChanged 3m40s (x54265 over 22h) kubelet, master Pod sandbox changed, it will be killed and re-created.
oomkilld,内存不够吗?只有master上的flannel有这个错误,node上的没有,限制的同样的内存和CPU资源啊。但是查看node上的flannel组件并没有出现类似信息。
kubectl patch ds -n=kube-system kube-flannel-ds-amd64 -p '{"spec": {"template":{"spec":{"containers": [{"name":"kube-flannel", "resources": {"limits": {"cpu": "250m","memory": "550Mi"},"requests": {"cpu": "100m","memory": "100Mi"}}}]}}}}'
但是我还是通过命令将内存和CPU资源扩展了一点,之后再查看会不会发生。如果不发生,那就是资源限制除了问题吧