参考文档
集群搭建参考:https://www.kubernetes.org.cn/4291.html
calico 排查参考:http://blog.51cto.com/newfly/2062210?utm_source=oschina-app
1、修改现有k8s集群中的calico网络,默认是ipip模式(在每台node主机创建一个tunl0网口,这个隧道链接所有的node容器网络,官网推荐不同的ip网段适合,比如aws的不同区域主机),
修改成BGP模式,它会以daemonset方式安装在所有node主机,每台主机启动一个bird(BGP client),它会将calico网络内的所有node分配的ip段告知集群内的主机,并通过本机的网卡eth0或者ens160转发数据;
修改下默认集群为ipip模式的k8s集群:
1 kubectl edit -n kube-system daemonset.extensions/calico-node #编辑calico-node的daemonset
修改
1 - name: CALICO_IPV4POOL_IPIP #ipip模式关闭 2 value: "off" 3 - name: FELIX_IPINIPENABLED #felix关闭ipip 4 value: "false"
修改之后,集群会自动生效:
原有的tunl0接口会在主机重启后消失(不重启也不会影响效果)
1 tunl0 Link encap:IPIP Tunnel HWaddr 2 inet addr:10.244.0.1 Mask:255.255.255.255 3 UP RUNNING NOARP MTU:1440 Metric:1 4 RX packets:6025 errors:0 dropped:0 overruns:0 frame:0 5 TX packets:5633 errors:0 dropped:0 overruns:0 carrier:0 6 collisions:0 txqueuelen:1 7 RX bytes:5916925 (5.9 MB) TX bytes:1600038 (1.6 MB)
检查calico网络
1 root@ub1604-k8s231:~# ip route |grep bird 2 10.244.0.0/24 via 10.96.141.233 dev ens160 proto bird #其他node配置的网络 3 blackhole 10.244.1.0/24 proto bird #本机node分配的网络 4 10.244.2.0/24 via 10.96.141.232 dev ens160 proto bird 5 10.244.3.0/24 via 10.96.141.234 dev ens160 proto bird 6 10.244.4.0/24 via 10.96.141.235 dev ens160 proto bird
1 root@ub1604-k8s232:~# ip route |grep bird 2 10.244.0.0/24 via 10.96.141.233 dev ens160 proto bird 3 10.244.1.0/24 via 10.96.141.231 dev ens160 proto bird 4 blackhole 10.244.2.0/24 proto bird 5 10.244.3.0/24 via 10.96.141.234 dev ens160 proto bird 6 10.244.4.0/24 via 10.96.141.235 dev ens160 proto bird
1 root@ub1604-k8s235:~# ip route |grep bird 2 10.244.0.0/24 via 10.96.141.233 dev ens160 proto bird 3 10.244.1.0/24 via 10.96.141.231 dev ens160 proto bird 4 10.244.2.0/24 via 10.96.141.232 dev ens160 proto bird 5 10.244.3.0/24 via 10.96.141.234 dev ens160 proto bird 6 blackhole 10.244.4.0/24 proto bird
所有的容器网络在k8s集群内部可以互访,因公司交换机不支持BGP协议,要想其他网段访问容器服务,需要在核心交换机添加各个网段的静态路由,直接访问容器服务;
1 root@ub1604-k8s231:/etc/cni/net.d# kubectl get ds,deploy -n kube-system 2 NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE 3 daemonset.extensions/calico-node 5 5 5 5 5 <none> 4h 4 daemonset.extensions/kube-proxy 5 5 5 5 5 <none> 1d 5 6 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 7 deployment.extensions/calico-typha 0 0 0 0 4h 8 deployment.extensions/coredns 2 2 2 2 1d 9 deployment.extensions/kubernetes-dashboard 1 1 1 1 1d
备注:
calico的deployment实例数为零,跟集群主机数有关:
参考配置文件:
https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/calico
# To enable Typha, set this to "calico-typha" *and* set a non-zero value for Typha replicas # below. We recommend using Typha if you have more than 50 nodes. Above 100 nodes it is # essential. typha_service_name: "none"
# Number of Typha replicas. To enable Typha, set this to a non-zero value *and* set the # typha_service_name variable in the calico-config ConfigMap above. # # We recommend using Typha if you have more than 50 nodes. Above 100 nodes it is essential # (when using the Kubernetes datastore). Use one replica for every 100-200 nodes. In # production, we recommend running at least 3 replicas to reduce the impact of rolling upgrade. replicas: 0
官网建议:
生产环境,node数量在50以内
typha_service_name: "none"
replicas: 0
node数量为:100-200,
In theConfigMap
namedcalico-config
, locate thetypha_service_name
, delete thenone
value, and replace it withcalico-typha.
Modify the replica count in theDeployment
namedcalico-typha
to the desired number of replicas.
typha_service_name: "calico-typha"
replicas: 3
node数量每增加200个实例:
We recommend at least one replica for every 200 nodes and no more than 20 replicas. In production, we recommend a minimum of three replicas to reduce the impact of rolling upgrades and failures.
我们建议每200个节点至少复制一个副本,不超过20个副本。 在生产中,我们建议至少使用三个副本来减少滚动升级和故障的影响。
Warning: If you set typha_service_name without increasing the replica count from its default of 0 Felix will try to connect to Typha, find no Typha instances to connect to, and fail to start.
警告:如果设置typha_service_name而不将副本计数从默认值0增加.Felix将尝试连接到Typha,找不到要连接的Typha实例,并且无法启动。
calico-node报错:
1 root@ub1604-k8s231:/etc/cni/net.d# kubectl get pods -n kube-system 2 NAME READY STATUS RESTARTS AGE 3 calico-node-877k8 2/2 Running 0 4h 4 calico-node-d7lfd 2/2 Running 0 4h 5 calico-node-lq8f9 2/2 Running 0 4h 6 calico-node-qsv66 2/2 Running 0 4h 7 calico-node-wfskg 2/2 Running 2 4h 8 calicoctl 1/1 Running 0 4h 9 coredns-6fd6cb9656-cnn5g 1/1 Running 0 1d 10 coredns-6fd6cb9656-rj76h 1/1 Running 0 1d 11 etcd-ub1604-k8s231 1/1 Running 1 1d 12 etcd-ub1604-k8s232 1/1 Running 0 1d 13 etcd-ub1604-k8s233 1/1 Running 7 1d 14 kube-apiserver-ub1604-k8s231 1/1 Running 4 1d 15 kube-apiserver-ub1604-k8s232 1/1 Running 0 1d 16 kube-apiserver-ub1604-k8s233 1/1 Running 0 1d 17 kube-controller-manager-ub1604-k8s231 1/1 Running 1 1d 18 kube-controller-manager-ub1604-k8s232 1/1 Running 0 1d 19 kube-controller-manager-ub1604-k8s233 1/1 Running 0 1d 20 kube-haproxy-ub1604-k8s231 1/1 Running 1 1d 21 kube-haproxy-ub1604-k8s232 1/1 Running 0 1d 22 kube-haproxy-ub1604-k8s233 1/1 Running 0 1d 23 kube-keepalived-ub1604-k8s231 1/1 Running 1 1d 24 kube-keepalived-ub1604-k8s232 1/1 Running 1 1d 25 kube-keepalived-ub1604-k8s233 1/1 Running 0 1d 26 kube-proxy-h7nsf 1/1 Running 0 1d 27 kube-proxy-j6nt5 1/1 Running 0 1d 28 kube-proxy-p6tvt 1/1 Running 1 1d 29 kube-proxy-vkb75 1/1 Running 0 1d 30 kube-proxy-w8sdf 1/1 Running 0 1d 31 kube-scheduler-ub1604-k8s231 1/1 Running 1 1d 32 kube-scheduler-ub1604-k8s232 1/1 Running 0 1d 33 kube-scheduler-ub1604-k8s233 1/1 Running 0 1d 34 kubernetes-dashboard-6948bdb78-jcbp8 1/1 Running 0 1d
1 root@ub1604-k8s231:/etc/cni/net.d# kubectl -n kube-system logs -f calico-node-877k8 2 Error from server (BadRequest): a container name must be specified for pod calico-node-877k8, choose one of: [calico-node install-cni] 3 root@ub1604-k8s231:/etc/cni/net.d# kubectl -n kube-system logs -f calico-node-877k8 calico-node 4 2018-08-03 05:01:58.813 [INFO][81] watcher.go 85: Kubernetes watcher/converter stopped, closing result channel resource="FelixConfiguration (custom)" 5 2018-08-03 05:01:58.813 [INFO][81] watchercache.go 156: Starting watch sync/resync processing ListRoot="/calico/resources/v3/projectcalico.org/felixconfigurations" 6 2018-08-03 05:01:58.813 [INFO][81] watchercache.go 256: Stopping previous watcher ListRoot="/calico/resources/v3/projectcalico.org/felixconfigurations" 7 2018-08-03 05:01:58.814 [INFO][81] watchersyncer.go 196: Error received in main syncer event processing loop error=watch terminated (closedByRemote:true): terminating error event from Kubernetes watcher: closed by remote 8 2018-08-03 05:01:58.816 [INFO][81] watcher.go 83: Kubernetes watcher/converter started resource="FelixConfiguration (custom)" 9 2018-08-03 05:02:02.889 [INFO][81] watcher.go 124: Watch event indicates a terminated watcher resource="ClusterInformation (custom)" 10 2018-08-03 05:02:02.889 [INFO][81] watcher.go 85: Kubernetes watcher/converter stopped, closing result channel resource="ClusterInformation (custom)" 11 2018-08-03 05:02:02.889 [INFO][81] watchercache.go 156: Starting watch sync/resync processing ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations" 12 2018-08-03 05:02:02.889 [INFO][81] watchercache.go 256: Stopping previous watcher ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations" 13 2018-08-03 05:02:02.889 [INFO][81] watchersyncer.go 196: Error received in main syncer event processing loop error=watch terminated (closedByRemote:true): terminating error event from Kubernetes watcher: closed by remote 14 2018-08-03 05:02:02.893 [INFO][81] watcher.go 83: Kubernetes watcher/converter started resource="ClusterInformation (custom)" 15 2018-08-03 05:02:03.092 [INFO][81] int_dataplane.go 733: Applying dataplane updates 16 2018-08-03 05:02:03.092 [INFO][81] table.go 717: Invalidating dataplane cache ipVersion=0x4 reason="refresh timer" table="mangle" 17 2018-08-03 05:02:03.092 [INFO][81] table.go 438: Loading current iptables state and checking it is correct. ipVersion=0x4 table="mangle" 18 2018-08-03 05:02:03.095 [INFO][81] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=3.051123 19 2018-08-03 05:02:03.310 [INFO][81] int_dataplane.go 733: Applying dataplane updates 20 2018-08-03 05:02:03.311 [INFO][81] table.go 717: Invalidating dataplane cache ipVersion=0x4 reason="refresh timer" table="nat" 21 2018-08-03 05:02:03.311 [INFO][81] table.go 717: Invalidating dataplane cache ipVersion=0x4 reason="refresh timer" table="raw" 22 2018-08-03 05:02:03.311 [INFO][81] table.go 438: Loading current iptables state and checking it is correct. ipVersion=0x4 table="nat" 23 2018-08-03 05:02:03.311 [INFO][81] table.go 438: Loading current iptables state and checking it is correct. ipVersion=0x4 table="raw" 24 2018-08-03 05:02:03.319 [INFO][81] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=8.962677000000001 25 2018-08-03 05:02:03.669 [INFO][81] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true} 26 2018-08-03 05:02:03.815 [INFO][81] watcher.go 124: Watch event indicates a terminated watcher resource="HostEndpoint (custom)" 27 2018-08-03 05:02:03.815 [INFO][81] watcher.go 85: Kubernetes watcher/converter stopped, closing result channel resource="HostEndpoint (custom)" 28 2018-08-03 05:02:03.815 [INFO][81] watchercache.go 156: Starting watch sync/resync processing ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints" 29 2018-08-03 05:02:03.815 [INFO][81] watchercache.go 256: Stopping previous watcher ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints" 30 2018-08-03 05:02:03.815 [INFO][81] watchersyncer.go 196: Error received in main syncer event processing loop error=watch terminated (closedByRemote:true): terminating error event from Kubernetes watcher: closed by remote
此报错calico维护人员称为友好提示,后期会慎用error关键字。
github地址 : https://github.com/projectcalico/libcalico-go/issues/695
查看calico状态,需使用calicoctl工具
可参考http://ibash.cc/frontend/article/102/
1 root@ub1604-k8s231:~# kubectl exec -ti -n kube-system calicoctl -- calicoctl get profiles -o wide 2 NAME LABELS 3 kns.default map[] 4 kns.external-dns map[] 5 kns.ingress-nginx map[] 6 kns.kube-public map[] 7 kns.kube-system map[] 8 9 root@ub1604-k8s231:~# kubectl exec -ti -n kube-system calicoctl -- calicoctl get node -o wide 10 NAME ASN IPV4 IPV6 11 ub1604-k8s231 (unknown) 10.96.141.231/24 12 ub1604-k8s232 (unknown) 10.96.141.232/24 13 ub1604-k8s233 (unknown) 10.96.141.233/24 14 ub1604-k8s234 (unknown) 10.96.141.234/24 15 ub1604-k8s235 (unknown) 10.96.141.235/24
查看IP池:
root@ub1604-k8s231:~/k8s-manual-files/cni/calico/v3.1# kubectl exec -it -n kube-system calicoctl -- /calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE DISABLED
default-ipv4-ippool 10.244.0.0/16 true Never false
查看所有容器已经分配的IP地址(WorkloadEndpoint Resource)
root@ub1604-k8s231:~/k8s-manual-files/cni/calico/v3.1# kubectl exec -it -n kube-system calicoctl -- /calicoctl get wep --all-namespaces
NAMESPACE WORKLOAD NODE NETWORKS INTERFACE
kube-system coredns-6fd6cb9656-cnn5g ub1604-k8s235 10.244.4.2/32 cali35e2aa0e177
kube-system coredns-6fd6cb9656-rj76h ub1604-k8s235 10.244.4.3/32 cali426ad3252da
kube-system kubernetes-dashboard-6948bdb78-jcbp8 ub1604-k8s234 10.244.3.7/32 cali4fe503bc457
Enabling IPVS in Kubernetes
Calico has beta-level support for kube-proxy’s ipvs proxy mode. Calico ipvs support is activated automatically if Calico detects that kube-proxy is running in that mode.
ipvs mode promises greater scale and performance vs iptables mode. However, it comes with some limitations. In IPVS mode:
kube-proxy has a known issue affecting hosts with host interfaces that that are not named using the pattern ethN.
Calico requires additional iptables packet mark bits in order to track packets as they pass through IPVS.
Calico needs to be configured with the port range that is assigned to Kubernetes NodePorts. If services do use NodePorts outside Calico’s expected range, Calico will treat traffic to those ports as host traffic instead of pod traffic.
Calico does not yet support Kubernetes services that make use of a locally-assigned ExternalIP. Calico does support ExternalIPs that are implemented via an external load balancer.
Calico has not yet been scale tested with ipvs.
Calico will detect if you change kube-proxy’s proxy mode after Calico has been deployed. Any Kubernetes ipvs-specific configuration needs to be configured before changing the kube-proxy proxy mode to ipvs.
Calico对kube-proxy的ipvs代理模式有beta级支持。如果Calico检测到kube-proxy正在该模式下运行,则会自动激活Calico ipvs支持。
ipvs模式承诺比iptables模式更大的规模和性能。但是,它有一些限制。在IPVS模式下:
kube-proxy有一个已知问题,影响具有主机接口的主机,这些主机接口未使用模式ethN命名。
Calico需要额外的iptables数据包标记位,以便在数据包通过IPVS时跟踪数据包。
Calico需要配置分配给Kubernetes NodePorts的端口范围。如果服务确实使用了Calico预期范围之外的NodePort,Calico会将这些端口的流量视为主机流量而不是pod流量。 #需要提前就规划好端口范围
Calico尚不支持使用本地分配的ExternalIP的Kubernetes服务。 Calico确实支持通过外部负载平衡器实现的ExternalIP。
Calico尚未通过ipv进行规模测试。
Calico将在部署Calico后检测您是否更改kube-proxy的代理模式。在将kube-proxy代理模式更改为ipvs之前,需要配置任何Kubernetes ipvs特定的配置。