zoukankan      html  css  js  c++  java
  • kubernetes 网络组件简介

    链接地址:https://blog.csdn.net/kjh2007abc/article/details/86751730

    k8s的网络模型假定了所有Pod都在一个可以直接连通的扁平的网络空间中。这是因为k8s出自Google,而在GCE里面是提供了网络模型作为基础设施的,所以k8s就假定这个网络已经存在。而在大家私有的平台设施里搭建k8s集群,就不能假定这种网络已经存在了。我们需要自己实现这个网络,将不同节点上的Docker容器之间的互相访问先打通,然后运行k8s。

    目前已经有多个开源组件支持容器网络模型。本节介绍几个常见的网络组件及其安装配置方法,包括Flannel、Open vSwitch、直接路由和Calico。

    1. Flannel
    1.1 Flannel通信原理
    Flannel之所以可以搭建k8s依赖的底层网络,是因为它能实现以下两点。
    (1)它能协助k8s,给每一个Node上的Docker容器分配互相不冲突的IP地址。
    (2)它能在这些IP地址之间建立一个覆盖网络(Overlay Network),通过这个覆盖网络,将数据包原封不动地传递到目标容器内。

    我们通过下图来看看Flannel是如何实现这两点的。

    可以看到,Flannel首先创建了一个名为flannel0的网桥,这个网桥的一端连接docker0网桥,另一端连接一个叫作flanneld的服务进程。

    flanneld进程很重要:
    flanneld首先要连接etcd,利用etcd来管理可分配的IP地址段资源,同时监控etcd中每个Pod的实际地址,并在内存中建立了一个Pod节点路由表;
    然后flanneld进程下连docker0和物理网络,使用内存中的Pod节点路由表,将docker0发给它的数据包包装起来,利用物理网络的连接将数据包投递到目标flanneld上,从而完成Pod到Pod之间的直接地址通信。
    Flannel之间的底层通信协议的可选余地很多,有UDP, VxLAN, AWS VPC等多种方式,只要能通到对端的Flannel就可以了。源flanneld加包,目标flanneld解包,最终docker0看到的就是原始的数据,非常透明,根本感觉不到中间Flannel的存在。常用的是UDP。

    Flannel是如何做到为不同Node上的Pod分配IP且不产生冲突的?因为Flannel使用集中的etcd服务管理这些地址资源信息,它每次分配的地址段都在同一个公共区域获取,这样自然能随时协调,避免冲突了。在Flannel分配好地址段后,接下来的工作就转交给Docker完成了。Flannel通过修改Docker的启动参数将分配给它的地址段传递进去。
    --bip=172.17.18.1/24

    通过这些操作,Flannel就控制了每个Node节点上的docker0地址段的地址,也能保障所有Pod的IP地址在同一水平的网络中且不产生冲突了。

    Flannel完美地解决了对k8s网络的支持,但是它引入了多个网络组件,在网络通信时需要转到flannel0网络接口,再转到用户态的flanneld程序,到对端后还需要走这个过程的反过程,所以会引入一些网络的延时消耗。

    另外,Flannel模型默认使用了UDP作为底层传输协议,UDP协议本身的非可靠性,在大流量、高并发应用场景下还需要反复测试,确保没有问题。

    1.2 Flannel的安装和配置方法
    1)安装etcd
    由于Flannel使用etcd作为数据库,所以需要预先安装好,这里不做描述。

    2)安装Flannel
    需要在每台Node上都安装Flannel。Flannel软件的下载地址为:https://github.com/coreos/flannel/releases 。将下载好的flannel-<version>-linux-amd64.tar.gz解压,把二进制文件flanneld和mk-docker-opts.sh复制到/usr/bin中,即可完成对Flannel的安装。

    3)配置Flannel
    此处以使用systemd系统为例对flanneld服务进行配置。
    编辑服务配置文件/usr/lib/systemd/system/flanneld.service:
    [root@k8s-node1 sysconfig]# more /usr/lib/systemd/system/flanneld.service
    [Unit]
    Description=flanneld overlay address etcd agent
    After=network.target
    Before=docker.service

    [Service]
    Type=notify
    EnvironmentFile=/etc/sysconfig/flannel
    ExecStart=/usr/bin/flanneld -etcd-endpoints=http://10.0.2.15:2379 $FLANNEL_OPTIONS

    [Install]
    RequiredBy=docker.service
    WantedBy=multi-user.target

    编辑配置文件/etc/sysconfig/flannel,设置etcd的URL地址:
    [root@k8s-node2 sysconfig]# more flannel
    # flanneld configuration options
    # etcd url location. Point this to the server where etcd runs
    FLANNEL_ETCD="http://10.0.2.15:2379"

    # etcd config key. This is the configuration key that flannel queries
    # For address range assignment
    FLANNEL_ETCD_KEY="/coreos.com/network"

    在启动flanneld服务之前,需要在etcd中添加一条网络配置记录,这个配置将用于flanneld分配给每个Docker的虚拟IP地址段。
    [root@k8s-master ~]# etcdctl set /coreos.com/network/config '{ "Network": "172.16.0.0/16" }'
    { "Network": "172.16.0.0/16" }
    由于Flannel将覆盖docker0网桥,所以如果Docker服务已启动,则需要停止Docker服务。

    4)启动Flannel服务
    systemctl daemon-reload
    systemctl restart flanneld

    5)重新启动Docker服务
    systemctl daemon-reload
    systemctl restart docker

    6)设置docker0网桥的IP地址
    mk-docker-opts.sh -i
    source /run/flannel/subnet.env
    ifconfig docker0 ${FLANNEL_SUBNET}

    完成后确认网络接口docker0的IP属于flannel0的子网:
    [root@k8s-node1 system]# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:9f:89:14 brd ff:ff:ff:ff:ff:ff
        inet 10.0.2.4/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
           valid_lft 993sec preferred_lft 993sec
        inet6 fe80::a00:27ff:fe9f:8914/64 scope link
           valid_lft forever preferred_lft forever
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
        link/ether 02:42:c9:52:3d:15 brd ff:ff:ff:ff:ff:ff
        inet 172.16.70.1/24 brd 172.16.70.255 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:c9ff:fe52:3d15/64 scope link
           valid_lft forever preferred_lft forever
    6: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc pfifo_fast state UNKNOWN group default qlen 500
        link/none
        inet 172.16.70.0/16 scope global flannel0
           valid_lft forever preferred_lft forever
        inet6 fe80::4b31:c92f:8cc9:3a22/64 scope link flags 800
           valid_lft forever preferred_lft forever
    [root@k8s-node1 system]#

    至此,就完成了Flannel覆盖网络的设置。

    使用ping命令验证各Node上docker0之间的相互访问。例如在Node1(docker0 IP=172.16.70.1)机器上ping Node2的docker0(docker0 IP=172.16.13.1),通过Flannel能够成功连接到其他物理机的Docker网络:
    [root@k8s-node1 system]# ifconfig flannel0
    flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1472
            inet 172.16.70.0  netmask 255.255.0.0  destination 172.16.70.0
            inet6 fe80::524a:4b9c:3391:7514  prefixlen 64  scopeid 0x20<link>
            unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
            RX packets 5  bytes 420 (420.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 8  bytes 564 (564.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    [root@k8s-node1 system]# ifconfig docker0
    docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
            inet 172.16.70.1  netmask 255.255.255.0  broadcast 172.16.70.255
            inet6 fe80::42:c9ff:fe52:3d15  prefixlen 64  scopeid 0x20<link>
            ether 02:42:c9:52:3d:15  txqueuelen 0  (Ethernet)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 8  bytes 648 (648.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    [root@k8s-node1 system]# ping 172.16.13.1
    PING 172.16.13.1 (172.16.13.1) 56(84) bytes of data.
    64 bytes from 172.16.13.1: icmp_seq=1 ttl=62 time=1.63 ms
    64 bytes from 172.16.13.1: icmp_seq=2 ttl=62 time=1.55 ms
    ^C
    --- 172.16.13.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 1.554/1.595/1.637/0.057 ms

    我们可以在etcd中查看到Flannel设置的flannel0地址与物理机IP地址的对应规则:
    [root@k8s-master etcd]# etcdctl ls /coreos.com/network/subnets
    /coreos.com/network/subnets/172.16.70.0-24
    /coreos.com/network/subnets/172.16.13.0-24

    [root@k8s-master etcd]# etcdctl get /coreos.com/network/subnets/172.16.70.0-24
    {"PublicIP":"10.0.2.4"}
    [root@k8s-master etcd]# etcdctl get /coreos.com/network/subnets/172.16.13.0-24
    {"PublicIP":"10.0.2.5"}

    2 Open vSwitch
    2.1 基本原理
    Open vSwitch是一个开源的虚拟交换软件,有点儿像Linux中的bridge,但是功能要复杂得多。Open vSwitch的网桥可以直接建立多种通信通道(隧道),例如Open vSwitch with GRE/VxLAN。这些通道的建立可以很容易地通过OVS的配置命令实现。在k8s、Docker场景下,我们主要是建立L3到L3的隧道,例如下面样子的网络架构。

    首先,为了避免Docker创建的docker0地址产生冲突,我们需要手动配置和指定下各个Node节点上docker0网桥的地址段分布。
    其次,建立Open vSwitch的网桥ovs,然后使用ovs-vsctl命令给ovs网桥增加gre端口,添加gre端口时要将目标连接的NodeIP地址设置为对端的IP地址。对每一个对端IP地址都需要这么操作(对于大型网络,需要做自动化脚本来完成)。
    最后,将ovs的网桥作为网络接口,加入Docker的网桥上。重启ovs网桥和Docker的网桥,并添加一个Docker的地址段到Docker网桥的路由规则项,就可以将两个容器的网络连接起来了。

    2.2 网络通信过程
    当容器内的应用访问另一个容器的地址时,数据包会通过容器内的默认路由发送给docker0网桥。ovs的网桥是作为docker0网桥的端口存在的,安会将数据发送给ovs网桥。ovs网络已经通过配置建立了和其他ovs网桥的GRE/VxLAN隧道,自然能将数据送达对端的Node,并送往docker0及Pod。
    通过新增的路由项,使用得Node节点本身的应用的数据也路由到docker0网桥上,和刚才的通信过程一样,自然也可以访问其他Node上的Pod。

    2.3 OVS with GRE/VxLAN组网方式的特点
    OVS的优势是,作为开源虚拟交换机软件,它相对成熟和稳定,支持各类网络隧道协议,经过了OpenStack等项目的考验。
    另一方面,相对于Flannel不但可以建立OverlayNetwork,实现Pod到Pod的通信,还和k8s、Docker架构体系紧密结合,感知k8s的Service,动态维护自己的路由表,还通过etcd来协助Docker对整个k8s集群中的docker0的子网地址进行分配。使用OVS时,很多事情就需要手工完成了。
    此外,无外是OVS,还是Flannel,通过建立Overlay Network,实现Pod到Pod的通信,都会引入一些额外的通信开销。如果是对网络依赖特别重的应用,则需要评估对业务的影响。

    2.4 Open vSwitch的安装与配置
    以两个Node为例,目标网络拓扑如下图所示。

    1)在两个Node上安装ovs
    需要确认下关闭了Node节点上的selinux。
    同时在两个Node节点上:
    yum -y install openvswitch

    查看Open vSwitch服务状态,需要有ovsdb-server与ovs-vswitchd两个进程。
    [root@k8s-node2 system]# systemctl start openvswitch
    [root@k8s-node2 system]# systemctl status openvswitch
    ● openvswitch.service - Open vSwitch
       Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
       Active: active (exited) since Sun 2018-06-10 17:06:40 CST; 6s ago
      Process: 8368 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
    Main PID: 8368 (code=exited, status=0/SUCCESS)

    Jun 10 17:06:40 k8s-node2.test.com systemd[1]: Starting Open vSwitch...
    Jun 10 17:06:40 k8s-node2.test.com systemd[1]: Started Open vSwitch.
    [root@k8s-node2 system]# ps -ef|grep ovs
    root      8352     1  0 17:06 ?        00:00:00 ovsdb-server: monitoring pid 8353 (healthy)
    root      8353  8352  0 17:06 ?        00:00:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
    root      8364     1  0 17:06 ?        00:00:00 ovs-vswitchd: monitoring pid 8365 (healthy)
    root      8365  8364  0 17:06 ?        00:00:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

    2)创建网桥和GRE隧道
    接下来需要在每个Node上建立ovs的网桥br0,然后在网桥上创建一个GRE隧道连接对端的网桥,最后把ovs的网桥br0作为一个端口连接到docker0这个Linux网桥上。
    这样一来,两个节点机器上的docker0网段就能互通了。
    以Node1节点为例,具体操作步骤如下:
    (1)创建ovs网桥
    [root@k8s-node1 system]# ovs-vsctl add-br br0
    (2)创建GRE隧道连接对端,remote_ip为对端的eth0网卡地址
    [root@k8s-node1 system]# ovs-vsctl add-port br0 gre1 -- set interface gre1 type=gre option:remote_ip=10.0.2.5
    (3)添加br0到本地docker0,使得容器流量通过OVS流经tunnel
    [root@k8s-node1 system]# brctl addif docker0 br0
    (4)启动br0与docker0网桥
    [root@k8s-node1 system]# ip link set dev br0 up
    [root@k8s-node1 system]# ip link set dev docker0 up
    (5)添加路由规则
    由于10.0.2.5与10.0.2.4的docker0网段分别为172.16.20.0/24与172.16.10.0/24,这两个网段的路由都需要经过本机的docker0网桥路由,其中一个24网段是通过OVS的GRE隧道到达对端的。因此需要在每个Node上添加通过docker0网桥转发的172.16.0.0/16的路由规则:
    [root@k8s-node1 system]# ip route add 172.16.0.0/16 dev docker0
    (6)清空Docker自带的iptables规则及Linux的规则,后者存在拒绝icmp报文通过防火墙的规则
    [root@k8s-node1 system]# iptables -t nat -F
    [root@k8s-node1 system]# iptables -F

    在Node1节点上完成以上操作后,在Node2节点上进行相同的配置。

    配置完成后,Node1节点的IP地址、docker0的IP地址及路由等重要信息显示如下:
    [root@k8s-node1 system]# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:9f:89:14 brd ff:ff:ff:ff:ff:ff
        inet 10.0.2.4/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
           valid_lft 842sec preferred_lft 842sec
        inet6 fe80::a00:27ff:fe9f:8914/64 scope link
           valid_lft forever preferred_lft forever
    3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 02:42:c9:52:3d:15 brd ff:ff:ff:ff:ff:ff
        inet 172.16.10.1/24 brd 172.16.10.255 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:c9ff:fe52:3d15/64 scope link
           valid_lft forever preferred_lft forever
    10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 5e:a9:02:75:aa:98 brd ff:ff:ff:ff:ff:ff
    11: br0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UNKNOWN group default qlen 1000
        link/ether 82:e3:9a:29:3c:46 brd ff:ff:ff:ff:ff:ff
        inet6 fe80::a8de:24ff:fef4:f8ec/64 scope link
           valid_lft forever preferred_lft forever
    12: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
        link/gre 0.0.0.0 brd 0.0.0.0
    13: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
        link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    14: gre_system@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65490 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 1000
        link/ether 76:53:6f:11:e0:f8 brd ff:ff:ff:ff:ff:ff
        inet6 fe80::7453:6fff:fe11:e0f8/64 scope link
           valid_lft forever preferred_lft forever
    [root@k8s-node1 system]#

    [root@k8s-node1 system]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.4 metric 100
    172.16.0.0/16 dev docker0 scope link
    172.16.10.0/24 dev docker0 proto kernel scope link src 172.16.10.1

    3)两个Node上容器之间的互通测试
    [root@k8s-node1 system]# ping 172.16.20.1
    PING 172.16.20.1 (172.16.20.1) 56(84) bytes of data.
    64 bytes from 172.16.20.1: icmp_seq=1 ttl=64 time=2.39 ms
    64 bytes from 172.16.20.1: icmp_seq=2 ttl=64 time=3.36 ms
    ^C
    --- 172.16.20.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1004ms
    rtt min/avg/max/mdev = 2.398/2.882/3.366/0.484 ms
    [root@k8s-node1 system]#

    在Node2上抓包观察:
    [root@k8s-node2 system]# tcpdump -i docker0 -nnn
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
    23:43:59.020039 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 26, length 64
    23:43:59.020096 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 26, length 64
    23:44:00.020899 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 27, length 64
    23:44:00.020939 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 27, length 64
    23:44:01.021706 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 28, length 64
    23:44:01.021750 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 28, length 64

    接下来我们从前面曾做过的实验中找出来一份创建2实例的RC资源文件来,实际创建两个容器来测试下两个Pods间的网络通信:
    [root@k8s-master ~]# more frontend-rc.yaml
    apiVersion: v1
    kind: ReplicationController
    metadata:
      name: frontend
      labels:
        name: frontend
    spec:
       replicas: 2
       selector:
         name: frontend
       template:
         metadata:
           labels:
             name: frontend
         spec:
           containers:
           - name: php-redis
             image: kubeguide/guestbook-php-frontend
             ports:
             - containerPort: 80
               hostPort: 80
             env:
             - name: GET_HOSTS_FROM
               value: env
    [root@k8s-master ~]#

    创建并观察下结果:
    [root@k8s-master ~]# kubectl get rc
    NAME       DESIRED   CURRENT   READY     AGE
    frontend   2         2         2         33m
    [root@k8s-master ~]# kubectl get pods -o wide
    NAME             READY     STATUS    RESTARTS   AGE       IP            NODE
    frontend-b6krg   1/1       Running   1          33m       172.16.20.2   10.0.2.5
    frontend-qk6zc   1/1       Running   0          33m       172.16.10.2   10.0.2.4

    我们继续登录进入Node1节点上的容器内部:
    [root@k8s-master ~]# kubectl exec -it frontend-qk6zc -c php-redis /bin/bash
    root@frontend-qk6zc:/var/www/html# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
        link/gre 0.0.0.0 brd 0.0.0.0
    3: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
        link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    22: eth0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 02:42:ac:10:0a:02 brd ff:ff:ff:ff:ff:ff
        inet 172.16.10.2/24 brd 172.16.10.255 scope global eth0
           valid_lft forever preferred_lft forever
    root@frontend-qk6zc:/var/www/html#
    从Node1上运行的Pod中ping一个Nod2上运行的Pod的地址:
    root@frontend-qk6zc:/var/www/html# ping 172.16.20.2
    PING 172.16.20.2 (172.16.20.2): 56 data bytes
    64 bytes from 172.16.20.2: icmp_seq=0 ttl=63 time=2017.587 ms
    64 bytes from 172.16.20.2: icmp_seq=1 ttl=63 time=1014.193 ms
    64 bytes from 172.16.20.2: icmp_seq=2 ttl=63 time=13.232 ms
    64 bytes from 172.16.20.2: icmp_seq=3 ttl=63 time=1.122 ms
    64 bytes from 172.16.20.2: icmp_seq=4 ttl=63 time=1.379 ms
    64 bytes from 172.16.20.2: icmp_seq=5 ttl=63 time=1.474 ms
    64 bytes from 172.16.20.2: icmp_seq=6 ttl=63 time=1.371 ms
    64 bytes from 172.16.20.2: icmp_seq=7 ttl=63 time=1.583 ms
    ^C--- 172.16.20.2 ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 1.122/381.493/2017.587/701.350 ms
    root@frontend-qk6zc:/var/www/html#
    在Node2节点上抓包看到数据包交互:
    [root@k8s-node2 system]# tcpdump -i docker0 -nnn
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
    00:13:18.601908 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 4, length 64
    00:13:18.601947 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 4, length 64
    00:13:18.601956 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 4, length 64
    00:13:28.609109 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 5, length 64
    00:13:28.609165 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 5, length 64
    00:13:28.609179 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 5, length 64
    00:13:29.612564 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 6, length 64
    注:如果以上网络通信测试没有完全成功,不妨检查下Node节点上的firewalld防火墙配置。
    至此,基于OVS的网络搭建成功,由于GRE是点对点的隧道通信方式,所以如果有多个Node,则需要建立N*(N-1)条GRE隧道,即所有Node组成一个网状的网络,以实现全网互通。
    3 直接路由
    我们在前几节的实验中已经测试过通过直接手动写路由的方式,实现Node之间的网络通信功能了,配置方法不再讨论。该直接路由配置方法的问题是,在集群节点发生变化时,需要手动去维护每个Node上的路由表信息,效率很低。为了有效管理这些动态变化的网络路由信息,动态地让其他Node都感知到,就需要使用动态路由发现协议来同步这些变化。
    在实现这些动态路由发现协议的开源软件中,常用的有Quagga、Zebra等。
    下面简单介绍下配置步骤和注意事项。
    (1)仍然需要手动分配每个Node节点上的Docker bridge的地址段
    无论是修改默认的docker0使用的地址段,还是另建一个bridge并使用--bridge=XX来指定使用的网桥,都需要确保每个Node上Docker网桥使用的地址段不能重叠。
    (2)然后在每个Node上运行Quagga
    既可以选择在每台服务器上安装Quagga软件并启动,也可以使用Quagga容器来运行。在每台Node上下载Docker镜像:
    # docker pull georce/router
    在每台Node上启动Quagga容器,需要说明的是,Quagga需要以--privileged特权模式运行,并且指定--net=host,表示直接使用物理机的网络。
    # docker run -itd --name=router --privileged --net=host georce/router

    启动成功后,各Node上的Quagga会相互学习来完成到其他机器的docker0路由规则的添加。
    至此,所有Node上的docker0都可以互联互通了。
    注:如果集群规模在数千台Node以上,则需要测试和评估路由表的效率问题。

    4 Calico容器网络和网络策略
    4.1 Calico简介
    Calico 是容器网络的又一种解决方案,和其他虚拟网络最大的不同是,它没有采用 overlay 网络做报文的转发,提供了纯 3 层的网络模型。三层通信模型表示每个容器都通过 IP 直接通信,中间通过路由转发找到对方。在这个过程中,容器所在的节点类似于传统的路由器,提供了路由查找的功能。要想路由工作能够正常,每个虚拟路由器(容器所在的主机节点)必须有某种方法知道整个集群的路由信息,calico 采用的是 BGP 路由协议,全称是 Border Gateway Protocol。除了能用于 容器集群平台 kubernetes、共有云平台 AWS、GCE 等, 也能很容易地集成到 openstack 等 Iaas 平台。

    Calico在每个计算节点利用Linux Kernel实现了一个高效的vRouter来负责数据转发。每个vRouter通过BGP协议把在本节点上运行的容器的路由信息向整个Calico网络广播,并自动设置到达其他节点的路由转发规则。Calico保证所有容器之间的数据流量都是通过IP路由的方式完成互联互通的。Calico节点组网可以直接利用数据中心的网络结构(L2或者L3),不需要额外的NAT、隧道或者Overlay Network,没有额外的封包解包,能够节约CPU运算,提高网络通信效率。Calico的数据包结构示意图如下。

    Calico在小规模集群中可以直接互联,在大规模集群中可以通过额外的BGP route reflector来完成。

    此外,Calico基于iptables还提供了丰富的网络策略,实现了k8s的Network Policy策略,提供容器间网络可达性限制的功能。

    Calico的主要组件如下:
    Felix:Calico Agent,运行在每台Node上,负责为容器设置网络源(IP地址、路由规则、iptables规则等),保证主机容器网络互通。
    etcd:Calico使用的存储后端。
    BGP Client(BIRD):负责把Felix在各Node上设置的路由信息通过BGP协议广播到Calico网络。
    BGP Route Reflector(BIRD):通过一个或者多个BGP Route Reflector来完成大规模集群的分级路由分发。
    calicoctl:Calico命令行管理工具。
    4.2 部署Calico服务
    在k8s中部署Calico的主要步骤包括两部分。
    4.2.1 修改kubernetes服务的启动参数,并重启服务
    设置Master上kube-apiserver服务的启动参数:--allow-privileged=true(因为Calico-node需要以特权模式运行在各Node上)。
    设置各Node上kubelet服务的启动参数:--network-plugin=cni(使用CNI网络插件), --allow-privileged=true
    本例中的K8s集群包括两台Node:Node1(10.0.2.4)和Node2(10.0.2.5)

    4.2.2 创建Calico服务,主要包括Calico-node和Calico policy controller
    需要创建出以下的资源对象:
    创建ConfigMap calico-config,包含Calico所需的配置参数。
    创建Secret calico-etcd-secrets,用于使用TLS方式连接etcd。
    在每个Node上运行calico/node容器,部署为DaemonSet。
    在每个Node上安装Calico CNI二进制文件和网络配置参数(由install-cni容器完成)。
    部署一个名为calico/kube-policy-controller的Deployment,以对接k8s集群中为Pod设置的Network Policy。
    4.2.3 Calico服务安装与配置的详细说明
    从Calico官网下载Calico的yaml配置文件,下载地址为https://docs.projectcalico.org/v2.1/getting-started/kubernetes/installation/hosted/calico.yaml 。
    该配置文件中包括了启动Calico所需的全部资源对象的定义。下面对其逐个进行说明。
    (1)Calico所需的配置以ConfigMap对象进行创建,如下所示
    # Calico Version v2.1.5
    # https://docs.projectcalico.org/v2.1/releases#v2.1.5
    # This manifest includes the following component versions:
    #   calico/node:v1.1.3
    #   calico/cni:v1.8.0
    #   calico/kube-policy-controller:v0.5.4

    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
    data:
      # Configure this with the location of your etcd cluster.
      etcd_endpoints: "http://10.0.2.15:2379"

      # Configure the Calico backend to use.
      calico_backend: "bird"

      # The CNI network configuration to install on each node.
      cni_network_config: |-
        {
            "name": "k8s-pod-network",
            "type": "calico",
            "etcd_endpoints": "__ETCD_ENDPOINTS__",
            "etcd_key_file": "__ETCD_KEY_FILE__",
            "etcd_cert_file": "__ETCD_CERT_FILE__",
            "etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
            "log_level": "info",
            "ipam": {
                "type": "calico-ipam"
            },
            "policy": {
                "type": "k8s",
                "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
                "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
            },
            "kubernetes": {
                "kubeconfig": "__KUBECONFIG_FILEPATH__"
            }
        }

      # If you're using TLS enabled etcd uncomment the following.
      # You must also populate the Secret below with these files.
      etcd_ca: ""   # "/calico-secrets/etcd-ca"
      etcd_cert: "" # "/calico-secrets/etcd-cert"
      etcd_key: ""  # "/calico-secrets/etcd-key"

    主要参数如下:
    etcd_endpoints:Calico使用etcd来保存网络拓扑和状态,该参数指定etcd的地址,可以使用k8s Master所用的etcd,也可以另外搭建。
    calico_backend:Calico的后端,默认为bird。
    cni_network_config:符合CNI规范的网络配置。其中type=calico表示kubelet将从/opt/cni/bin目录下搜索名为“Calico”的可执行文件,并调用它完成容器网络的设置。ipam中type=calico-ipam表示kubelet将在/opt/cni/bin目录下搜索名为"calico-ipam"的可执行文件,用于完成容器IP地址的分配。
    etcd如果配置了TLS安全认证,则还需要指定相应的ca、cert、key等文件。

    (2)访问etcd所需的secret,对于无TLS的etcd服务,将data设置为空即可
    # The following contains k8s Secrets for use with a TLS enabled etcd cluster.
    # For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: calico-etcd-secrets
      namespace: kube-system
    data:
      # Populate the following files with etcd TLS configuration if desired, but leave blank if
      # not using TLS for etcd.
      # This self-hosted install expects three files with the following names.  The values
      # should be base64 encoded strings of the entire contents of each file.
      # etcd-key: null
      # etcd-cert: null
      # etcd-ca: null

    (3)calico-node,以Daemonset形式在每台Node上运行一个calico-node服务和一个install-cni服务
    # This manifest installs the calico/node container, as well
    # as the Calico CNI plugins and network config on
    # each master and worker node in a Kubernetes cluster.
    kind: DaemonSet
    apiVersion: extensions/v1beta1
    metadata:
      name: calico-node
      namespace: kube-system
      labels:
        k8s-app: calico-node
    spec:
      selector:
        matchLabels:
          k8s-app: calico-node
      template:
        metadata:
          labels:
            k8s-app: calico-node
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
            scheduler.alpha.kubernetes.io/tolerations: |
              [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
               {"key":"CriticalAddonsOnly", "operator":"Exists"}]
        spec:
          hostNetwork: true
          containers:
            # Runs calico/node container on each Kubernetes node.  This
            # container programs network policy and routes on each
            # host.
            - name: calico-node
              image: quay.io/calico/node:v1.1.3
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # Choose the backend to use.
                - name: CALICO_NETWORKING_BACKEND
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: calico_backend
                # Disable file logging so `kubectl logs` works.
                - name: CALICO_DISABLE_FILE_LOGGING
                  value: "true"
                # Set Felix endpoint to host default action to ACCEPT.
                - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
                  value: "ACCEPT"
                # Configure the IP Pool from which Pod IPs will be chosen.
                - name: CALICO_IPV4POOL_CIDR
                  value: "192.168.0.0/16"
                - name: CALICO_IPV4POOL_IPIP
                  value: "always"
                # Disable IPv6 on Kubernetes.
                - name: FELIX_IPV6SUPPORT
                  value: "false"
                # Set Felix logging to "info"
                - name: FELIX_LOGSEVERITYSCREEN
                  value: "info"
                # Location of the CA certificate for etcd.
                - name: ETCD_CA_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_ca
                # Location of the client key for etcd.
                - name: ETCD_KEY_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_key
                # Location of the client certificate for etcd.
                - name: ETCD_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_cert
                # Auto-detect the BGP IP address.
                - name: IP
                  value: ""
              securityContext:
                privileged: true
              resources:
                requests:
                  cpu: 250m
              volumeMounts:
                - mountPath: /lib/modules
                  name: lib-modules
                  readOnly: true
                - mountPath: /var/run/calico
                  name: var-run-calico
                  readOnly: false
                - mountPath: /calico-secrets
                  name: etcd-certs
            # This container installs the Calico CNI binaries
            # and CNI network config file on each node.
            - name: install-cni
              image: quay.io/calico/cni:v1.8.0
              command: ["/install-cni.sh"]
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # The CNI network config to install on each node.
                - name: CNI_NETWORK_CONFIG
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: cni_network_config
              volumeMounts:
                - mountPath: /host/opt/cni/bin
                  name: cni-bin-dir
                - mountPath: /host/etc/cni/net.d
                  name: cni-net-dir
                - mountPath: /calico-secrets
                  name: etcd-certs
          volumes:
            # Used by calico/node.
            - name: lib-modules
              hostPath:
                path: /lib/modules
            - name: var-run-calico
              hostPath:
                path: /var/run/calico
            # Used to install CNI.
            - name: cni-bin-dir
              hostPath:
                path: /opt/cni/bin
            - name: cni-net-dir
              hostPath:
                path: /etc/cni/net.d
            # Mount in the etcd TLS secrets.
            - name: etcd-certs
              secret:
                secretName: calico-etcd-secrets
    该Pod中包括如下两个容器:
    calico-node:Calico服务程序,用于设置Pod的网络资源,保证Pod的网络与各Node互联互通,它还需要以hostNetwork模式运行,直接使用宿主机网络。
    install-cni:在各Node上安装CNI二进制文件到/opt/cni/bin目录下,并安装相应的网络配置文件到/etc/cni/net.d目录下。
    calico-node服务的主要参数如下:
    CALICO_IPV4POOL_CIDR:Calico IPAM的IP地址池,Pod的IP地址将从该池中进行分配。
    CALICO_IPV4POOL_IPIP:是否启用IPIP模式。启用IPIP模式时,Calico将在Node上创建一个名为"tunl0"的虚拟隧道。
    FELIX_IPV6SUPPORT:是否启用IPV6。
    FELIX_LOGSEVERITYSCREEN:日志级别。
    IP Pool可以使用两种模式:BGP或IPIP模式。
    使用IPIP模式时,设置CALICO_IPV4POOL_IPIP=“always”,不使用IPIP模式时,设置CALICO_IPV4POOL_IPIP="off",此时将使用BGP模式。

    IPIP是一种将各Node的路由之间做一个tunnel,再把两个网络连接起来的模式。启用IPIP模式时,Calico将在各Node上创建一个名为"tunl0"的虚拟网络接口。如下图所示。

    BGP模式则直接使用物理机作为虚拟路由路(vRouter),不再创建额外的tunnel。

    (4)calico-policy-controller容器
    用于对接k8s集群中为Pod设置的Network Policy。
    # This manifest deploys the Calico policy controller on Kubernetes.
    # See https://github.com/projectcalico/k8s-policy
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: calico-policy-controller
      namespace: kube-system
      labels:
        k8s-app: calico-policy
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: |
          [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
           {"key":"CriticalAddonsOnly", "operator":"Exists"}]
    spec:
      # The policy controller can only have a single active instance.
      replicas: 1
      strategy:
        type: Recreate
      template:
        metadata:
          name: calico-policy-controller
          namespace: kube-system
          labels:
            k8s-app: calico-policy
        spec:
          # The policy controller must run in the host network namespace so that
          # it isn't governed by policy that would prevent it from working.
          hostNetwork: true
          containers:
            - name: calico-policy-controller
              image: quay.io/calico/kube-policy-controller:v0.5.4
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # Location of the CA certificate for etcd.
                - name: ETCD_CA_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_ca
                # Location of the client key for etcd.
                - name: ETCD_KEY_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_key
                # Location of the client certificate for etcd.
                - name: ETCD_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_cert
                # The location of the Kubernetes API.  Use the default Kubernetes
                # service for API access.
                - name: K8S_API
                  value: "https://kubernetes.default:443"
                # Since we're running in the host namespace and might not have KubeDNS
                # access, configure the container's /etc/hosts to resolve
                # kubernetes.default to the correct service clusterIP.
                - name: CONFIGURE_ETC_HOSTS
                  value: "true"
              volumeMounts:
                # Mount in the etcd TLS secrets.
                - mountPath: /calico-secrets
                  name: etcd-certs
          volumes:
            # Mount in the etcd TLS secrets.
            - name: etcd-certs
              secret:
                secretName: calico-etcd-secrets
    用户在k8s集群中设置了Pod的Network Policy之后,calico-policy-controller就会自动通知各个Node上的calico-node服务,在宿主机上设置相应的iptables规则,完成Pod间网络访问策略的设置。
    做好以上配置文件的准备工作后,就可以开始创建Calico的各资源对象了。
    [root@k8s-master ~]# kubectl create -f calico.yaml
    configmap "calico-config" created
    secret "calico-etcd-secrets" created
    daemonset "calico-node" created
    deployment "calico-policy-controller" created
    [root@k8s-master ~]#
    确保各服务正确运行:
    [root@k8s-master ~]# kubectl get pods --namespace=kube-system -o wide
    NAME                                        READY     STATUS    RESTARTS   AGE       IP         NODE
    calico-node-59n9j                           2/2       Running   1          9h        10.0.2.5   10.0.2.5
    calico-node-cksq5                           2/2       Running   1          9h        10.0.2.4   10.0.2.4
    calico-policy-controller-54dbfcd7c7-ctxzz   1/1       Running   0          9h        10.0.2.5   10.0.2.5
    [root@k8s-master ~]#

    [root@k8s-master ~]# kubectl get rs --namespace=kube-system
    NAME                                  DESIRED   CURRENT   READY     AGE
    calico-policy-controller-54dbfcd7c7   1         1         1         9h
    [root@k8s-master ~]#
    [root@k8s-master ~]# kubectl get deployment --namespace=kube-system
    NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    calico-policy-controller   1         1         1            1           9h
    [root@k8s-master ~]# kubectl get secret --namespace=kube-system
    NAME                  TYPE      DATA      AGE
    calico-etcd-secrets   Opaque    0         9h
    [root@k8s-master ~]# kubectl get configmap --namespace=kube-system
    NAME            DATA      AGE
    calico-config   6         9h
    [root@k8s-master ~]#

    我们看下Node1上:
    [root@k8s-node1 ~]# docker ps
    CONTAINER ID        IMAGE                                      COMMAND             CREATED             STATUS              PORTS               NAMES
    dd431155ed2d        quay.io/calico/cni                         "/install-cni.sh"   8 hours ago         Up 8 hours                              k8s_install-cni_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_0
    e7f20b684fc2        quay.io/calico/node                        "start_runit"       8 hours ago         Up 8 hours                              k8s_calico-node_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_1
    1c9010e4b661        gcr.io/google_containers/pause-amd64:3.0   "/pause"            8 hours ago         Up 8 hours                              k8s_POD_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_1
    [root@k8s-node1 ~]#
    [root@k8s-node1 ~]# docker images
    REPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
    cloudnil/pause-amd64                   3.0                 66c684b679d2        11 months ago       747kB
    gcr.io/google_containers/pause-amd64   3.0                 66c684b679d2        11 months ago       747kB
    quay.io/calico/cni                     v1.8.0              8de7b24bd7ec        13 months ago       67MB
    quay.io/calico/node                    v1.1.3              573ddcad1ff5        13 months ago       217MB
    kubeguide/guestbook-php-frontend       latest              47ee16830e89        23 months ago       510MB
    Node2上多出一个Pod:calico-policy-controller
    [root@k8s-node2 ~]# docker ps
    CONTAINER ID        IMAGE                                      COMMAND              CREATED             STATUS              PORTS               NAMES
    ff4dbcd77892        quay.io/calico/kube-policy-controller      "/dist/controller"   8 hours ago         Up 8 hours                              k8s_calico-policy-controller_calico-policy-controller-54dbfcd7c7-ctxzz_kube-system_e3f067be-6fe9-11e8-8a4a-080027800835_0
    60439cfbde00        quay.io/calico/cni                         "/install-cni.sh"    8 hours ago         Up 8 hours                              k8s_install-cni_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_1
    c55f279ef3c1        quay.io/calico/node                        "start_runit"        8 hours ago         Up 8 hours                              k8s_calico-node_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_0
    17d08ed5fd86        gcr.io/google_containers/pause-amd64:3.0   "/pause"             8 hours ago         Up 8 hours                              k8s_POD_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_1
    aa85ee06190f        gcr.io/google_containers/pause-amd64:3.0   "/pause"             8 hours ago         Up 8 hours                              k8s_POD_calico-policy-controller-54dbfcd7c7-ctxzz_kube-system_e3f067be-6fe9-11e8-8a4a-080027800835_0
    [root@k8s-node2 ~]#
    [root@k8s-node2 ~]#
    [root@k8s-node2 ~]# docker images
    REPOSITORY                              TAG                 IMAGE ID            CREATED             SIZE
    cloudnil/pause-amd64                    3.0                 66c684b679d2        11 months ago       747kB
    gcr.io/google_containers/pause-amd64    3.0                 66c684b679d2        11 months ago       747kB
    quay.io/calico/cni                      v1.8.0              8de7b24bd7ec        13 months ago       67MB
    quay.io/calico/node                     v1.1.3              573ddcad1ff5        13 months ago       217MB
    quay.io/calico/kube-policy-controller   v0.5.4              ac66b6e8f19e        14 months ago       22.6MB
    kubeguide/guestbook-php-frontend        latest              47ee16830e89        23 months ago       510MB
    georce/router                           latest              f3074d9a8369        3 years ago         190MB
    [root@k8s-node2 ~]#

    calico-node在正常运行之后,会根据CNI规范,在/etc/cni/net.d/目录下生成如下文件和目录,并在/opt/cni/bin目录下安装二进制文件calico和calico-ipam,供kubelet调用。
    10-calico.conf:符合CNI规范的网络配置,其中type=calico表示该插件的二进制文件名为calico。
    calico-kubeconfig:Calico所需的kubeconfig文件。
    calico-tls目录:以TLS方式连接etcd的相关文件。

    [root@k8s-node1 ~]# cd /etc/cni/net.d/
    [root@k8s-node1 net.d]# ls
    10-calico.conf  calico-kubeconfig  calico-tls
    [root@k8s-node1 net.d]#
    [root@k8s-node1 net.d]# ls /opt/cni/bin
    calico  calico-ipam  flannel  host-local  loopback
    [root@k8s-node1 net.d]#

    查看k8s node1服务器的网络接口设置,可以看到一个新的名为"tunl0"的接口,并设置了网络地址为192.168.196.128
    [root@k8s-node1 net.d]# ifconfig tunl0
    tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
            inet 192.168.196.128  netmask 255.255.255.255
            tunnel   txqueuelen 1000  (IPIP Tunnel)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    查看k8s node2服务器的网络接口设置,可以看到一个新的名为"tunl0"的接口,并设置了网络地址为192.168.19.192
    [root@k8s-node2 ~]# ifconfig tunl0
    tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
            inet 192.168.19.192  netmask 255.255.255.255
            tunnel   txqueuelen 1000  (IPIP Tunnel)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    这两个子网都是从calico-node的IP地址池192.168.0.0/16中进行分配的。同时,docker0对于k8s设置Pod的IP地址将不再起作用。

    查看两台主机的路由表。可以看到node1服务器上有一条到node2的私网192.168.19.192的路由转发规则:
    [root@k8s-node1 net.d]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.4 metric 100
    172.16.10.0/24 dev docker0 proto kernel scope link src 172.16.10.1
    192.168.19.192/26 via 10.0.2.5 dev tunl0 proto bird onlink
    blackhole 192.168.196.128/26 proto bird
    [root@k8s-node1 net.d]#

    然后查看node2服务器的路由表,也可以看到有一条到node1私网192.168.196.128的路由转发规则:
    [root@k8s-node2 ~]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.5 metric 100
    172.16.20.0/24 dev docker0 proto kernel scope link src 172.16.20.1
    blackhole 192.168.19.192/26 proto bird
    192.168.196.128/26 via 10.0.2.4 dev tunl0 proto bird onlink

    这样通过Calico就完成了Node间容器网络的设置。在后续的Pod创建过程中,kubelet将通过CNI接口调用Calico进行Pod网络的设置,包括IP地址、路由规则、iptables规则等。

    如果设置CALICO_IPV4POOL_IPIP="off",即不使用IPIP模式,则Calico将不会创建tunl0网络接口,路由规则直接使用物理机网卡作为路由器进行转发。

    4.3 使用网络策略实现Pod间的访问策略
    Calico支持设置Pod间的访问策略,基本原理如下图所示。

    下面以一个提供服务的Nginx Pod为例,为两个客户端Pod设置不同的网络访问权限,允许包含Label "role=nginxclient"的Pod访问Nginx容器,无此Label的其他容器则拒绝访问。
    步骤1:
    首先为需要设置网络隔离的Namespace进行标注,本例中的所有Pod都在Namespace default中,故对其进行默认网络隔离的设置:
    # kubectl annotate ns default
    "net.beta.kubernetes.io/network-policy={"ingress": {"isolation": "DefaultDeny"}}"
    设置完成后,default内的各Pod之间的网络就无法连通了。

    步骤2:创建Nginx Pod,并添加Label "app=nginx"
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      name: nginx
      image: nginx

    步骤3:为Nginx设置准入访问 策略
    networkpolicy-allow-nginxclient.yaml
    kind: NetworkPolicy
    apiVersion: extension/v1beta1
    metadata:
      name: allow-nginxclient
    spec:
      podSelector:
        matchLabels:
          app: nginx
      ingress:
        - from:
          - podSelector:
              matchLabels:
                role: nginxclient
          ports:
          - protocol: TCP
            port: 80

    目标Pod应包含Label "app=nginx",允许访问的客户端Pod包含Label "role=nginxclient",并允许客户端访问mysql容器的80端口。

    创建该NetworkPolicy资源对象:
    # kubectl create -f networkpolicy-allow-nginxclient.yaml

    步骤4:创建两个客户端Pod,一个包含Label "role=nginxclient",另一个无此Label。分别进入各Pod,访问Nginx容器,验证网络策略的效果。
    client1.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: client1
      labels:
        role: nginxclient
    spec:
      containers:
      - name: client1
        image: busybox
        command: [ "sleep", "3600" ]

    client2.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: client2
    spec:
      containers:
      - name: client2
        image: busybox
        command: [ "sleep", "3600" ]

    创建以上两个Pods,并进入每个容器中进行服务访问的验证。

    上面例子中的网络策略是由calico-policy-controller具体实现的,calico-poliey-controller持续监听k8s中NetworkPolicy的定义,与各Pod通过Label进行关联,将允许访问或拒绝访问的策略通知到各calico-node服务。
    最终calico-node完成对Pod间网络访问的设置,实现应用的网络隔离。

    参考资料:
    https://blog.csdn.net/watermelonbig/article/details/80720378
    http://cizixs.com/2017/10/19/docker-calico-network

  • 相关阅读:
    Count and Say leetcode
    Find Minimum in Rotated Sorted Array II leetcode
    Find Minimum in Rotated Sorted Array leetcode
    Search in Rotated Sorted Array II leetcode
    search in rotated sorted array leetcode
    Substring with Concatenation of All Words
    Subsets 子集系列问题 leetcode
    Sudoku Solver Backtracking
    Valid Sudoku leetcode
    《如何求解问题》-现代启发式方法
  • 原文地址:https://www.cnblogs.com/heboxiang/p/12183173.html
Copyright © 2011-2022 走看看