zoukankan      html  css  js  c++  java
  • 记一次网络故障——pod间无法通信

    一、背景

    1. 集群是二进制部署
    2. 部署完成后一起正常,各种资源对象均可正常创建、
    3. 部署应用后发现无法跨节点通信,且pod的ip都是172.17.0.0段的

    二、排查过程层

    1. 查看节点路由,发现docker0网卡居然是172.17.0.0段(what?)
    2. 查找如下资料:基于docker的CNM部署flanel时,需要将/run/flannel/subnet.env作为docker的环境变量,且启动时指定flannel的网段信息

    三、解决方案(修改配置文件:/usr/lib/systemd/system/docker.service)

    [Unit]
    Description=Docker Application Container Engine
    Documentation=https://docs.docker.com
    BindsTo=containerd.service
    After=network-online.target firewalld.service containerd.service
    Wants=network-online.target
    Requires=docker.socket
    
    [Service]
    Type=notify
    # the default is not to use systemd for cgroups because the delegate issues still
    # exists and systemd currently does not support the cgroup feature set required
    # for containers run by docker
    EnvironmentFile=/run/flannel/subnet.env
    ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS  -H fd:// --containerd=/run/containerd/containerd.sock
    ExecReload=/bin/kill -s HUP $MAINPID
    TimeoutSec=0
    RestartSec=2
    Restart=always
    
    # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
    # Both the old, and new location are accepted by systemd 229 and up, so using the old location
    # to make them work for either version of systemd.
    StartLimitBurst=3
    
    # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
    # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
    # this option work for either version of systemd.
    StartLimitInterval=60s
    
    # Having non-zero Limit*s causes performance problems due to accounting overhead
    # in the kernel. We recommend using cgroups to do container-local accounting.
    LimitNOFILE=infinity
    LimitNPROC=infinity
    LimitCORE=infinity
    
    # Comment TasksMax if your systemd version does not supports it.
    # Only systemd 226 and above support this option.
    TasksMax=infinity
    
    # set delegate yes so that systemd does not reset the cgroups of docker containers
    Delegate=yes
    
    # kill only the docker process, not all processes in the cgroup
    KillMode=process
    
    [Install]
    WantedBy=multi-user.target

    调用/run/flannel/subnet.env中的DOCKER_NETWORK_OPTIONS指定pod的网段信息

    四、补充

    1. CNI中,docker0的ip与Pod无关,Pod总是生成的时候才去动态的申请自己的IP
    2. CNM模式下,Pod的网段在docker engine启动时就已经决定
    3. 推荐使用CNI模式

    参考地址:https://jiayi.space/post/kubernetescong-ru-men-dao-fang-qi-3-wang-luo-yuan-li

  • 相关阅读:
    RPC细谈
    RPC浅谈
    动态规划
    libco 的定时器实现: 时间轮
    一次HTTP请求的完整过程——协议篇(DNS、TCP、HTTP)
    多个CPU、多核CPU以及超线程(Hyper-Threading)
    Linux下which、whereis、locate、find命令的区别
    warning C4819: 该文件包含不能在当前代码页(936)中表示的字符。请将该文件保存为 Unicode 格式以防止数据丢失
    使用OutputDebugString输出调试信息
    VS或windows用代码产生GUID
  • 原文地址:https://www.cnblogs.com/jayce9102/p/12075362.html
Copyright © 2011-2022 走看看