zoukankan      html  css  js  c++  java
  • hyper容器网络相关源码分析

    一、网络初始化

    1、hyperd/daemon/daemon.go

    func NewDaemon(cfg *apitypes.HyperConfig) (*Daemon, error)

    该函数直接调用daemon.initNetworks(cfg)

    2、hyperd/daemon/daemon.go

    func (daemon *Daemon) initNetworks(c *apitypes.HyperConfig) error

    该函数仅仅只是调用hypervisor.InitNetwork(c.Bridge, c.BridgeIP, c.DisableIptables),因此关于网络的内容都是在runv中完成的

    3、runv/hypervisor/hypervisor.go

    func InitNetwork(bIface, bIP string, disableIptables bool) error

    若HDriver.BuildinNetwork()为true,则return HDriver.InitNetwork(bIface, bIP, disableIptables)  // QEMU为false

    否则,return network.InitNetwork(bIface, bIP, disableIptables)

    4、runv/hypervisor/network/network_linux.go

    func InitNetwork(bIface, bIP string, disable bool) error

    (1)、首先设置BridgeIface和BridgeIP,BridgeIface默认为"hyper0",bIP默认为"192.168.123.0/24",并将disableIptables设置为disable

    (2)、调用addr, err := GetIfaceAddr(BridgeIface),若err 不为nil,则说明bridge不存在,需要创建一个,否则说明bridge存在,但是仍然需要对配置信息进行匹配检查

    (3)、若bridge不存在,则调用configureBridge(BridgeIP, BridgeIface)创建一个,再调用addr, err = GetIfaceAddr(BridgeIface)获取bridge信息,再调用BridgeIPv4Net = addr.(*net.IPNet)

    (4)、调用setupIPTables(addr)

    (5)、调用setupIPForwarding()

    (6)、最后调用IpAllocator.RequestIP(BridgeIPv4Net, BridgeIPv4Net.IP)

    // Return the first IPv4 address for the specified network interface

    5、runv/hypervisor/network_linux.go

    func GetIfaceAddr(name string) (net.Addr, error)

    (1)、首先调用iface, err := net.InterfaceByName(name)以及addrs, err := iface.Addrs()获取地址信息

    (2)、设置变量var addr4 []net.Addr,再从addrs中解析,最终返回addr4[0]

    // create and setup network bridge

    6、runv/hypervisor/network_linux.go

    func configureBridge(bridgeIP, bridgeIface string) error

    (1)、检测bridgeIP并将其赋值给ifaceAddr

    (2)、调用CreateBridgeIface(bridgeIface),并忽略已经"exists"的错误

    (3)、调用iface, err := net.InterfaceByName(bridgeIface)获取接口

    (4)、调用ipAddr, ipNet, err := net.ParseCIDR(ifaceAddr) (注:For example, ParseCIDR("198.51.100.1/24") returns the IP address 198.51.100.1 and the network 198.51.100.0/24.)

    (5)、若ipAddr.Equal(ipNet.IP)则调用ipAddr, err = IpAllocator.RequestIP(ipNet, nil)

    否则调用ipAddr, err = IpAllocator.RequestIP(ipNet, ipAddr)

    (6)、调用NetworkLinkAddIp(iface, ipAddr, ipNet)

    (7)、调用NetworkLinkUp(iface)   ---> 都是对进行底层的syscall.Syscall()的调用

    // Create the actual bridge device. This is more backward-compatible than netlink and works on RHEL 6.

    7、runv/hypervisor/network_linux.go

    func CreateBridgeIface(name string) error

    该函数进行最底层的syscall.Syscall(...)来创建网桥

    IPAllocator结构如下所示:

    type IPAllocator struct {
      allocatedIPs  networkSet
      mutex     sync.Mutex
    }
    

      

    networkSet的定义如下所示:

    type networkSet  map[string]*allocatedMap
    

      

    allocatedMap结构如下所示:

    type allocatedMap struct {
      p      map[string]struct{}
      last   *big.Int
      begin   *big.Int
      end    *big.Int
    }
    

      

    // 当参数ip为nil时,返回network中下一个可获取的IP地址,如果参数ip不为nil,则会校验给定的ip是否合法

    8、runv/hypervisor/network/ipallocator/ipallocator.go

    func (a *IPAllocator) RequestIP(network *net.IPNet, ip net.IP) (net.IP, error)

    (1) 、调用key := network.String()返回该network的字符串表示,并调用allocated, ok := a.allocatedIPs[key]

    (2)、若该network不存在,则调用allocated = newAllocatedMap(network)新建一个,并调用a.allocatedIPs[key] = allocated

    (3)、若ip == nil,则调用return allocated.getNextIP(),否则调用allocated.checkIP(ip)

    // This function is identical to: ip addr add $ip/$ipNet dev $iface

    9、runv/hypervisor/network/network_linux.go

    func NetworkLinkAddIp(iface *net.Interface, ip net.IP, ipNet *net.IPNet) error

    (1)、该函数直接调用return networkLinkIpAction(syscall.RTM_NEWADDR, syscall.NLM_F_CREAT|syscall.NLM_F_EXCL|syscall.NLM_F_ACK, IfAddr{iface, ip, ipNet})

    至于networkLinkIpAction(...)函数则仅仅只是利用netlink执行命令而已

    8、runv/hypervisor/network_linux.go

    func setupIPTables(addr net.Addr) error

    (1)、Enable NAT:

    `iptables  -t nat -I POSTROUTING -s 192.168.123.0/24 ! -o hyper0 -j MASQUERADE`,将进入host,但是目的地不是本地其他容器的容器流量做snat

    (2)、Create HYPER iptables Chain

    (3)、Goto HYPER chain

    `iptables -t filter -I FORWARD -o hyper0 -j HYPER`将转发到hyper0的流量交由HYPER链处理

    (4)、Accept all outgoing packets

    `iptables -t filter -I FORWARD -i hyper0 -j ACCEPT`从hyper0进入的流量全部接受

    (5)、Accept incoming packets for existing connections

    `iptables -t filter -I FORWARD -o hyper0  -m conntrack --ctstate RELATED, ESTABLISHED -j ACCETP`

    (6)、在nat中,Create HYPER iptables Chain

    `iptables -t nat -N HYPER`

    (7)、Goto HYPER chain

    `iptables -t nat -I OUTPUT -m addrtype --dst-type LOCAL ! -d 127.0.0.1/8 -j HYPER`
    
    `iptables -t nat -I PREROUTING -m addrtype --dst-type LOCAL -j HYPER`

    9、runv/hypervisor/network_linux.go

    func setupIPForwarding() error

    (1)、Get current IPv4 forward setup

    (2)、Enable IPv4 forwarding only if it is not already enabled

    二、hyperd部分网络配置

    // hyperd/daemon/pod/provision.go

    1、func CreateXPod(factory *PodFactory, spec *apitypes.UserPod) (*Xpod, error)

    ...

    (1)、调用p.initResources(spec, true)

    (2)、调用err = p.prepareResources()

    (3)、调用err = p.addResourcesToSandbox()

    ....

    // hyperd/daemon/pod/provision.go

    2、func (p *XPod) initResources (spec *apitypes.UserPod, allowCreate bool) error

    ....

    (1)、当len(spec.Interfaces) == 0时,调用spec.Interfaces = append(spec.Interfaces, &apitypes.UserInterface{})

    (2)、遍历spec.Interfaces,调用inf := newInterface(p, nspec)和p.interfaces[nspec.Ifname] = inf

    其中newInterface()函数仅仅返回一个Interface{}结构,如果spec.Ifname为""时,将其设置为"eth-default"

    ....

    Interface{}的数据结构如下所示:

    type Interface struct {
      p      *XPod
      spec    *apitypes.UserInterface
      descript *runv.InterfaceDescription
    }
    

    apitypes.UserInterface 结构如下所示:

    type UserInterface struct {
      Bridge    string
      Ip      string
      Ifname    string
      Mac     string
      Gateway   string
      Tap       string
    } 
    

      

    // hyperd/daemon/pod/provision.go  

    3、func (p *XPod) prepareResources() error

    ....

    (1)、遍历p.interfaces,调用inf.prepare()

    ....

    // hyperd/daemon/pod/networks.go

    4、func (inf *Interface) prepare() error

    (1)、当inf.spec.Ip == ""并且inf.spec.Bridge != ""时报错 --> if configured a bridg, must specify the IP address

    (2)、当inf.spec.Ip == ""时,调用setting, err := network.AllocateAddr(""),并且用setting的内容填充&runv.InterfaceDescription{}结构,并赋值给inf.descript

    否则,直接将用inf的内容填充&runv.InterfaceDescription{}结构,并赋值给inf.descript

    // runv/hypervisor/network/network_linux.go

    5、func AllocateAddr(requestedIP string) (*Settings, error)

    (1)、调用ip, err := IpAllocator.RequestIP(BridgeIPv4Net, net.parseIP(requestedIP))

    (2)、调用maskSize, _ := BridgeIPv4Net.Mask.Size()以及mac, err := GenRandomMac()

    (3)、返回return &Settings{...}

    // hyperd/daemon/pod/provision.go

    // addResourcesToSandbox() add resources to sandbox parallelly, it issues runV API parallelly to send the

    // NIC, Vols, and Containers to sandbox

    6、func (p *XPod) addResourcesToSandbox() error

    ...

    (1)、调用future := utils.NewFutureSet()

    (2)、调用函数future.Add("addInterface", func() error {}),其中在func函数中调用for _, inf := range p.interfaces,并调用err := inf.add()

    在遍历完p.interfaces之后,再调用p.sandbox.AddRoute()

    ...

    // hyperd/daemon/pod/networks.go

    func (inf *Interface) add() error

    (1)、若inf.descript == nil 或者inf.descript.Ip 为"",则报错

    (2)、调用inf.p.sandbox.AddNic(inf.descript)

    三、runv部分网络配置

    (1)、添加网卡

    网卡的数据结构如下所示:

    type InterfaceDescription struct {
    
      Id      string
      Lo      bool
      Bridge    string
      Ip       string
      Mac      string
      Gw      string
      TapName   string
      Options   string }

    // runv/hypervisor/vm.go

    1、func (vm *Vm) AddNic(info *api.InterfaceDescription)

    (1)、设置client := make(chan api.Result, 1),用于同步

    (2)、调用vm.ctx.AddInterface(info, client)

    (3)、调用ev, ok := <-client等待网卡创建完成

    (3)、调用return vm.ctx.updateInterface(info.Id)

    // runv/hypervisor/context.go

    2、func (ctx *VmContext) AddInterface(inf *api.InterfaceDescription, result chan api.Result)

    (1)、当ctx.current != StateRunning时,报错,调用result <- NewNotReadyError(ctx.Id)

    (2)、调用ctx.networks.addInterface(inf, result)

    // runv/hypervisor/network.go

    2、func (nc *NetworkContext) addInterface(inf *api.InterfaceDescription, result chan api.Result)

    (1)、当inf.Lo为true时,填充i := &InterfaceCreated{...},nc.lo[inf.Ip] = i, nc.idMap[inf.Id] = i,并成功返回

    (2)、启动一个goroutine,调用idx := nc.applySlot(),获取interface对应的ethernet slot

    nc.configureInterface(idx, nc.sandbox.netPciAddr(), fmt.Sprintf("eth%d", idx), inf, devChan)

    (3)、启动一个goroutine,等待device inserted情况,并通过result返回网卡插入成功或者失败的信息

    // runv/hypervisor/network.go

    3、func (nc *NetworkContext) configureInterface(index, pciAddr int, name string, inf *api.InterfaceDescription, result chan<- VmEvent)

    (1)、调用settings, err = network.Configure(nc.sandbox.Id, "", false, inf)

    (2)、调用created, err := interfaceGot(inf.Id, index, pciAddr, name)

    (3)、用created填充h := &HostNicInfo{}和g := &GuestNicInfo{}

    (4)、调用nc.eth[index] = created以及nc.idMap[created.Id] = created

    (5)、最后调用nc.sandbox.DCtx.AddNic(nc.sandbox, h, g, result)

    HostNicInfo结构如下所示:

    type HostNicInfo struct {
      Id    string
      Fd     uint64
      Device  string
      Mac    string
      Bridge  string
      Gateway string
    }
    

    GuestNicInfo结构如下所示:

    type GuestNicInfo struct {
      Device    string
      Ipaddr     string
      Index      int
      Busaddr    int
    }  
    

       

    // runv/hypervisor/network/network_linux.go

    4、func configure(vmId, requestIP string, addrOnly bool, inf *api.InterfaceDescription) (*Settings, error)

    (1)、调用ip, mask, err := ipParser(inf.Ip)获取配置的ip,再调用maskSize, _ := mask.Size()获取mask的长度

    (2)、调用mac := inf.Mac,如果mac为"",则调用mac, err := GenRandomMac()创建一个

    (3)、如果addrOnly为True,则return &Settings{...},其中Device为inf.TapName, File为nil,

    (4)、否则调用device, tapFile, err := GetTapFd(inf.TapName, inf.Bridge, inf.Options),GetTapFd创建一个tap设备,将它加到bridge中,并启动

    最终return &Settings{...},其中Device为device, File为tapFile

    Settings结构如下所示:

    type Settings struct {
      Mac       string
      IPAddress    string
      IPPrefixLen  int
      Gateway    string
      Bridge     string
      Device     string
      File      *os.File
      Automatic   bool
    }
    

      

    // runv/hypervisor/network.go

    5、func interfaceGot(id string, index int, pciAddr int, name string, inf *network.Settings) (*InterfaceCreated, error)

    (1)、调用ip, nw, err := net.ParseCIDR(fmt.Sprintf("%s/%d", inf.IPAddress, inf.IPPrefixLen))

     (2)、创建rt := []*RouteRule{},如果该interface为第一个且inf.Automatic为true(默认为false),或者配有gateway且inf.Automatic为false,则创建相应的RouteRule,调用:

    rt = append(rt, &RouteRule{Destination: "0.0.0.0/0", Gateway: inf.Gateway, ViaThis: true,})

    (3)、最后return &InterfaceCreated{}, nil

    InterfaceCreated结构如下所示:

    type InterfaceCreated struct {
      Id        string
      Index       int
      PCIAddr     int
      Fd         *os.File
      Bridge      string
      HostDevice     string
      DeviceName    string
      MacAddr     string
      IpAddr      string
      NetMask     string
      RouteTable   []*RouteRule
    }
    

    RouteRule结构如下所示:

    type RouteRule struct {
      Destination  string
      Gateway   string
      ViaThis    bool
    }  
    

    // 当虚拟机驱动为QEMU时

    // runv/hypervisor/qemu/qemu.go

    6、func (qc *QemuContext) AddNic(ctx *hypervisor.VmContext, host *hypervisor.HostNicInfo, guest *hypervisor.GuestNicInfo, result chan <- hypervisor.VmEvent)

    该函数直接调用newNetworkAddSession(ctx, qc, host.Id, host.Fd, guest.Device, host.Mac, guest.Index, guest.Busaddr, result)

    // runv/hypervisor/qemu/qmp_wrapper_amd64.go

    7、func newNetworkAddSession(ctx *hypervisor.VmContext, qc *QemuContext, id string, fd uint64, device, mac string, index, addr int, result chan<- hypervisor.VmEvent)

    (1)、先创建"getfd","netdev_add"和"device_add"三个QmpCommand命令

    (2)、再将这三个命令组建成一个QmpSession,发送给QEMU

    // runv/hypervisor/vm_states.go

    8、func (ctx *VmContext) updateInterface(id string) error

    (1)、首先调用inf := ctx.networks.getInterface(id)获取创建的interface

    (2)、若inf不为nil,则调用ctx.hyperstart.UpdateInterface(inf.DeviceName, inf.IpAddr, inf.NetMask)

    // runv/hyperstart/libhyperstart/json.go

    9、func (h *jsonBasedHyperstart) UpdateInterface(dev, ip, mask string) error

    该函数直接调用return h.hyperstartCommand(hyperstartapi.INIT_SETUPINTERFACE, hyperstartapi.NetworkInf{Device: dev, IpAddress: ip, NetMask: mask})

    hyperstartCommand()进一步调用hyperstartCommandWithRetMsg(),最后创建hyperstartCmd{}将请求发给QEMU完成命令

    (2)、添加路由

    // runv/hypervisor/vm.go

    1、func (vm *Vm) AddRoute()

    (1)、调用routes := vm.ctx.networks.getRoutes()获取路由信息

    (2)、再调用return vm.ctx.hyperstart.AddRoute(routes)添加路由

    2、func (h *jsonBasedHyperstart) AddRoute(r []hyperstartapi.Route) error

    该函数仅仅调用return h.hyperstartCommand(hyperstartapi.INIT_SETUPROUTE, hyperstartapi.Routes{Routes: r}),具体操作和更新网卡信息时相同

    四、hyperstart的网络配置

    // hyperstart/src/init.c

    1、static int hyper_setup_pod(struct hyper_pod *pod)

    ...

    (1)、调用hyper_setup_network(pod)

    ...

    // hyperstart/src/net.c

    2、int hyper_setup_network(struct hyper_pod *pod)

    (1)、首先调用hyper_rescan()

    (2)、创建变量struct rtnl_handle rth,并调用netlink_open(&rth)

    (3)、创建for循环遍历pod->iface[],调用ret = hyper_setup_interface(&rth, iface, pod)配置网卡

    (4)、调用ret  = hyper_up_nic(&rth, 1)启动lo

    (5)、创建for循环遍历pod->rt[],调用ret = hyper_setup_route(&rth, rt, pod)创建路由

    注意:网卡和路由既可以在创建pod时设置,也可以单独设置,通过直接给hyperstart发送cmd

    最终在hyperstart中通过hyper_cmd_setup_interface和hyper_cmd_setup_route完成(两者再直接调用hyper_setup_interface和hyper_setup_route)

    hyper_interface结构如下所示:

    struct hyper_interface {
      char    *device;
      struct list_head  ipaddresses;
      char    *new_device_name;
      unsigned int  mtu;
    }
    

      

    hyper_route结构如下所示:

    struct hyper_route {
      char    *dst;
      char    *gw;
      char    *device;
    }
    

      

    // hyperstart/src/net.c

    3、static int hyper_setup_interface(struct rtnl_handler *rth, struct hyper_interface *iface, struct hyper_pod *pod)

     (1)、构建netlink request,调用ifindex = hyper_get_ifindex(iface->device, pod)获取网卡的index  ---> hyper_get_index通过读取/sys/class/net/$NIC/ifindex来读取索引号

    (2)、遍历iface->device,设置网卡的IP地址

    (3)、如果iface->new_device_name不为空且和iface->device不同,则调用hyper_set_interface_name()设置网卡名字

    (4)、如果iface->mtu大于0,则调用hyper_set_interface_mtu设置网卡的MTU

    (5)、调用hyper_up_nic(rth, ifindex)启动网卡

    // hyperstart/src/net.c

    4、static int hyper_setup_route(struct rtnl_handle *rth, struct hyper_route *rt, struct hyper_pod *pod)

    (1)、构建netlink request

    (2)、如果rt->gw不为NULL,则先调用get_addr_ipv4(...)获取网关,再通过addattr_l(...)设置网关

    (3)、如果rt->device不为NULL,则先调用hyper_get_ifindex(...)获取网卡的index,再通过addattr_l(...)设置出口网络设备

    (4)、如果rt->dst不为"default","any"或者"all",则说明不是默认子网,首先调用char *slash = strchr(rt->dst. '/')

    之后再调用get_addr_ipv4(...),获取相应dst的IP地址,并调用addattr_l(...)添加。接着,若slash不为NULL,则调用get_netmask(...)获取子网掩码

    最后调用rtnl_talk(...)设置路由

  • 相关阅读:
    INSERT INTO ON DUPLICATE KEY UPDATE
    page to frame
    指定人提交
    在结构体嵌入接口
    排序 /src/sort/zfuncversion.go:136
    Modification should be made to copies of the returned MD.
    setTimeout 定时器的编号 实际延时比设定值更久的原因:最小延迟时间
    IOPS(Input/Output Operations Per Second)是一个用于计算机存储设备(如硬盘(HDD)、固态硬盘(SSD)或存储区域网络(SAN))性能测试的量测方式,可以视为是每秒的读写次数。和其他性能测试一样,存储设备制造商提出的IOPS不保证就是实际应用下的性能。
    Working Around Memory Leaks in Your Django Application
    celery 内存泄露
  • 原文地址:https://www.cnblogs.com/YaoDD/p/6226465.html
Copyright © 2011-2022 走看看