kuberbetes部署和启动正常,但是kube-dns持续重启
使用命令
kubectl get pods --all-namespaces
得到结果
从图中可以看出kube-dns-c7d85897f-jmntw 在不断重启
使用命令
kubectl describe pod kube-dns-c7d85897f-jmntw -n kube-system
得到结果
Name: kube-dns-c7d85897f-jmntw Namespace: kube-system Node: 172.18.196.2/172.18.196.2 Start Time: Tue, 05 Jun 2018 15:28:18 +0800 Labels: k8s-app=kube-dns pod-template-hash=738414539 Annotations: scheduler.alpha.kubernetes.io/critical-pod= Status: Running IP: 172.20.1.9 Controlled By: ReplicaSet/kube-dns-c7d85897f Containers: kubedns: Container ID: docker://516c137ece876a83fc16d26a4fb2c526d8daa75423d1f2371b0b2142bfd2e00a Image: mirrorgooglecontainers/k8s-dns-kube-dns-amd64:1.14.9 Image ID: docker-pullable://mirrorgooglecontainers/k8s-dns-kube-dns-amd64@sha256:956ac5f14a388ab9887ae07f36e770852f3f51dcac9e0d193ce8f62cbf066b13 Ports: 10053/UDP, 10053/TCP, 10055/TCP Args: --domain=cluster.local. --dns-port=10053 --config-dir=/kube-dns-config --v=2 State: Running Started: Tue, 05 Jun 2018 15:28:27 +0800 Ready: True Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3 Environment: PROMETHEUS_PORT: 10055 Mounts: /kube-dns-config from kube-dns-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro) dnsmasq: Container ID: docker://5871fe23f088d23dd342fa7a891be0b5b9f3f879a0902e6633baaa418b2a920f Image: mirrorgooglecontainers/k8s-dns-dnsmasq-nanny-amd64:1.14.9 Image ID: docker-pullable://mirrorgooglecontainers/k8s-dns-dnsmasq-nanny-amd64@sha256:38f69fab59a32a490c8c62b035f6aa8dbf9a320686537225adaee16a07856d17 Ports: 53/UDP, 53/TCP Args: -v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local./127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 State: Running Started: Tue, 05 Jun 2018 16:53:08 +0800 Last State: Terminated Reason: Error Exit Code: 137 Started: Tue, 05 Jun 2018 16:43:08 +0800 Finished: Tue, 05 Jun 2018 16:53:08 +0800 Ready: True Restart Count: 9 Requests: cpu: 150m memory: 20Mi Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro) sidecar: Container ID: docker://bffdb2ace942a0608c2a35e34098d0b43519cce8778371fd96ac549300bf9897 Image: mirrorgooglecontainers/k8s-dns-sidecar-amd64:1.14.9 Image ID: docker-pullable://mirrorgooglecontainers/k8s-dns-sidecar-amd64@sha256:7caad6678b148c0c74f8b84efa93ddde84e742fa37b25d20ecfdbd43fba74360 Port: 10054/TCP Args: --v=2 --logtostderr --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local.,5,A --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local.,5,A State: Running Started: Tue, 05 Jun 2018 16:53:30 +0800 Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 05 Jun 2018 16:43:28 +0800 Finished: Tue, 05 Jun 2018 16:53:09 +0800 Ready: True Restart Count: 9 Requests: cpu: 10m memory: 20Mi Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: kube-dns-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: kube-dns Optional: true kube-dns-token-2ndrd: Type: Secret (a volume populated by a Secret) SecretName: kube-dns-token-2ndrd Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 8m (x41 over 1h) kubelet, 172.18.196.2 Liveness probe failed: HTTP probe failed with statuscode: 503 Warning Unhealthy 7m (x15 over 1h) kubelet, 172.18.196.2 Liveness probe failed: Get http://172.20.1.9:10054/healthcheck/kubedns: dial tcp 172.20.1.9:10054: getsockopt: connection refused
这里有两个warning ,不知道什么原因
使用命令
kubectl logs -n kube-system kube-dns-c7d85897f-jmntw -c dnsmasq
得到结果
I0605 09:13:08.863881 1 main.go:74] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local./127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000} I0605 09:13:08.863997 1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local./127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] I0605 09:13:09.049758 1 nanny.go:119] W0605 09:13:09.049779 1 nanny.go:120] Got EOF from stdout I0605 09:13:09.049789 1 nanny.go:116] dnsmasq[17]: started, version 2.78 cachesize 1000 I0605 09:13:09.049795 1 nanny.go:116] dnsmasq[17]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify I0605 09:13:09.049800 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain ip6.arpa I0605 09:13:09.049803 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa I0605 09:13:09.049807 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain cluster.local I0605 09:13:09.049811 1 nanny.go:116] dnsmasq[17]: reading /etc/resolv.conf I0605 09:13:09.049815 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain ip6.arpa I0605 09:13:09.049819 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa I0605 09:13:09.049823 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain cluster.local I0605 09:13:09.049827 1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.1.1#53 I0605 09:13:09.049836 1 nanny.go:116] dnsmasq[17]: read /etc/hosts - 7 addresses I0605 09:21:50.451300 1 nanny.go:116] dnsmasq[17]: Maximum number of concurrent DNS queries reached (max: 150) I0605 09:22:00.464414 1 nanny.go:116] dnsmasq[17]: Maximum number of concurrent DNS queries reached (max: 150)
从这里可以看出nameserver是少了个节点上的nameserver
其实这里是因为忘了改node节点上的nameserver
修改 /etc/resolv.conf的nameserver
改成学校的域名服务器,注意每一个node上都要改,因为不知道dns服务会部署在哪个node上
然后再重启kubedns的服务
kubectl delete pod -n kube-system kube-dns-69bf9d5cc9-c68mw
看到nameserver用了10.8.8.8就可以了
但是通常集群都有好多个节点,一个一个节点修改太慢了,下面再补充一个利用ansible 修改集群所有节点的nameserver
root@ht-1:/etc/ansible# ansible all -m lineinfile -a "dest=/etc/resolv.conf regexp='nameserver 127.0.1.1' line='nameserver 10.8.8.8'"