zoukankan      html  css  js  c++  java
  • Container isolation with Kata and gVisor in Docker

    Container isolation with Kata and gVisor in Docker

    Overview

    Containers are an efficient way to build and distribute workloads free of (most) host and OS dependencies, but they come at the cost of reduced security or isolation compared to virtual machines. This is why public cloud services spin up virtual machines per customer to deploy container based workloads in them. Ok, maybe there is also a commercial reason behind it, allowing to charge for the virtual machine with a fixed cpu core and memory allocation, but that's beside the point of this blog post.

    What if you could give each container its own virtual machine? Well, that's the purpose of Kata Containers, an Apache 2 licensed project of the OpenStack Foundation, emerged from Intel Clear Container project back in 2015:

    "Kata Containers is an open source community working to build a secure container runtime with lightweight virtual machines that feel and perform like containers, but provide stronger workload isolation using hardware virtualization technology as a second layer of defense."

    Google looked at the problem of Container isolation differently and came up with gVisor, an open-source and OCI-compatible sandbox runtime that provides a virtualized container environment:

    "gVisor integrates with Dockercontainerd and Kubernetes, making it easier to improve the security isolation of your containers while still using familiar tooling. Additionally, gVisor supports a variety of underlying mechanisms for intercepting application calls, allowing it to run in diverse host environments, including cloud-hosted virtual machines."

    I wanted to explore Kata container and gVisor runtime with the default runc based runtime of Docker, ideally all on a single server. Turns out, this is actually pretty simple to achieve. In this blog post I go thru the installation steps followed by launching various containers using all three runtimes via docker run and try out a few things, starting from basic connectivity (works) via what each reports for cpu and memory, up to reading and writing from volumes (works too), then running privileged mode and trying out a simple XDP code (didn't work).

    Overall Kata and gVisor are very easy to use and will be sufficient for many basic container workloads. More exploration and performance testing will be needed before taking the plunge, but if container isolation is important, e.g. because multiple tenants must share the same host, then both offer great and simple solutions.

    Installation

    I'm using a baremetal server running Ubuntu 18.04.4 on a

    15.18-custom #2 SMP Thu Feb 13 22:49:55 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    mwiget@jcrpd:~$ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 18.04.4 LTS
    Release:        18.04
    Codename:       bionic
    
    $ lscpu |grep CPU
    CPU op-mode(s):      32-bit, 64-bit
    CPU(s):              16
    On-line CPU(s) list: 0-15
    CPU family:          6
    Model name:          Intel(R) Xeon(R) D-2146NT CPU @ 2.30GHz
    CPU MHz:             1000.021
    CPU max MHz:         2301.0000
    CPU min MHz:         1000.0000
    NUMA node0 CPU(s):   0-15
    

    Docker-ce

    I followed the recommended installation path for docker-ce on Ubuntu:

    sudo apt-get update
    sudo apt-get install -y 
        apt-transport-https 
        ca-certificates 
        curl 
        gnupg-agent 
        software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository 
       "deb [arch=amd64] https://download.docker.com/linux/ubuntu 
       $(lsb_release -cs) 
       stable"
    sudo apt-get update
    sudo apt-get install -y docker-ce docker-ce-cli containerd.io
    

    This installs `/usr/bin/runc' as the docker runtime:

    $ which runc
    /usr/bin/runc
    
    $ runc --version
    runc version 1.0.0-rc10
    commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
    spec: 1.0.1-dev
    
    $ dpkg -S /usr/bin/runc
    containerd.io: /usr/bin/runc
    

    Kata Container

    Installation instructions can be found in the kata-containers documentation repo.

    I followed the ubuntu-installation-guide.md for Ubuntu 18.04:

    ARCH=$(arch)
    BRANCH="${BRANCH:-master}"
    sudo sh -c "echo 'deb http://download.opensuse.org/repositories/home:/katacontainers:/releases:/${ARCH}:/${BRANCH}/xUbuntu_$(lsb_release -rs)/ /' > /etc/apt/sources.list.d/kata-containers.list"
    curl -sL  http://download.opensuse.org/repositories/home:/katacontainers:/releases:/${ARCH}:/${BRANCH}/xUbuntu_$(lsb_release -rs)/Release.key | sudo apt-key add -
    sudo -E apt-get update
    sudo -E apt-get -y install kata-runtime kata-proxy kata-shim
    

    Then I followed the instructions Install Docker for Kata Containers on Ubntu via Docker /etc/docker/daemon.json. This will allow me later to define all three docker runtimes via the daemon.json file. This file doesn't exist yet and set the name for kata-runtime to kata. Default is set for now to kata. The Docker runtime is runc, which doesn't need to be defined in this file:

    sudo cat | tee /etc/docker/daemon.json
    {
        "default-runtime": "kata",
        "runtimes": {
            "kata": {
                "path": "/usr/bin/kata-runtime"
            }
        }
    }
    ^D
    

    Activate the new settings by restarting the Docker systemd service:

    sudo systemctl daemon-reload
    sudo systemctl restart docker
    

    Now lets run an alpine container executing uname -a to display the name of the kernel using runc followed by kata. First, let's verify the kernel on the host itself:

    ~$ uname -a
    Linux jcrpd 4.15.18-custom #2 SMP Thu Feb 13 22:49:55 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    

    Now using kata, first by specifying the runtime kata directly, then using the default runtime (which is also kata, as defined in /etc/docker/daemon.json):

    $ docker run -ti --runtime kata alpine uname -a
    Linux 6ac4d4664e64 5.4.15-45.container #1 SMP Mon Feb 24 19:34:15 UTC 2020 x86_64 Linux
    
    $ docker run -ti alpine uname -a
    Linux 5393c2ec9b64 5.4.15-45.container #1 SMP Mon Feb 24 19:34:15 UTC 2020 x86_64 Linux
    

    Clearly, the alpine container reports a much more recent kernel version than what is running on the host itself!

    Running now the same container using runc:

    $ docker run -ti --runtime runc alpine uname -a
    Linux 85d13cf9bc8b 4.15.18-custom #2 SMP Thu Feb 13 22:49:55 UTC 2020 x86_64 Linux
    

    As expected, runc is reporting the host kernel. More exploration of Kata to follow further down, but first, lets add gVisor runtime.

    gVisor

    Following the gVisor Installation Guide, I picked the install from an apt repository steps, starting with the dependencies:

    sudo apt-get update && 
    sudo apt-get install -y 
        apt-transport-https 
        ca-certificates 
        curl 
        gnupg-agent 
        software-properties-common
    

    Next, the key used to sign archives should be added to your apt keychain:

    curl -fsSL https://gvisor.dev/archive.key | sudo apt-key add -
    

    Now adding the repository for the latest release:

    sudo add-apt-repository "deb https://storage.googleapis.com/gvisor/releases release main"
    

    Now the runsc package can be installed:

    sudo apt-get update && sudo apt-get install -y runsc
    

    This added the binary /usr/bin/runsc and automatically updated our /etc/docker/daemon.json file:

    $ cat /etc/docker/daemon.json
    {
        "default-runtime": "kata",
        "runtimes": {
            "kata": {
                "path": "/usr/bin/kata-runtime"
            },
            "runsc": {
                "path": "/usr/bin/runsc"
            }
        }
    }
    

    Quick check if gVisor runtime is working: Run the alpine container via runsc:

    $ docker run -ti --runtime runsc alpine uname -a
    Linux d75634707851 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 Linux
    

    Yes. Working. And uname -a reports yet another kernel from within the alpine container running on the same baremetal host, cool!

    At this point, I have runc, kata and runsc (gVisor) runtimes at my disposal. Now let's explore them …

    Exploration

    Launch one alpine container per runtime

    Time to have fun! First, let's start 3 alpine containers, each with another container runtime:

    docker run -ti --runtime runc -d --rm --name r-runc --hostname r-runc alpine
    docker run -ti --runtime runsc -d --rm --name r-gvisor --hostname r-gvisor alpine
    docker run -ti --runtime kata -d --rm --name r-kata --hostname r-kata alpine
    

    Now show the running containers:

    kata-gvisor-docker$ docker ps
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
    020c88d6f2ae        alpine              "/bin/sh"           About an hour ago   Up About an hour                        r-kata
    7fe37b9d46cb        alpine              "/bin/sh"           About an hour ago   Up About an hour                        r-gvisor
    605e747f595c        alpine              "/bin/sh"           About an hour ago   Up About an hour                        r-runc
    

    To simplify executing the same command on all containers, I use the following helper script:

    $ cat exec.sh
    #!/bin/sh
    for runtime in kata gvisor runc; do
      echo docker exec r-$runtime $@ ...
      docker exec r-$runtime $@
      echo ""
    done
    

    Lets try it to show the assigned IP address in each container:

    $ ./exec.sh ip addr show eth0 
    docker exec r-kata ip addr show eth0 ...
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP qlen 1000
        link/ether 02:42:ac:11:00:04 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.4/16 brd 172.17.255.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:acff:fe11:4/64 scope link 
           valid_lft forever preferred_lft forever
    
    docker exec r-gvisor ip addr show eth0 ...
    2: eth0: <UP,LOWER_UP> mtu 1500 
        link/generic 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.3/32 scope global dynamic 
    
    docker exec r-runc ip addr show eth0 ...
    31: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
        link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
           valid_lft forever preferred_lft forever
    

    All containers share the same docker network bridge, which is 172.17.0.0/16 in my case. You cane use docker network inspect bridge to explore, but it produces a lot of output, including the IP's of all containers. What's interesting here is that gVisor runtime assigns a /32 address, whereas kata and default runc stick to the /16.

    Connectivity

    Pinging a public DNS server from each container works perfectly:

    $ ./exec.sh ping -c3 1.1.1.1
    docker exec r-kata ping -c3 1.1.1.1 ...
    PING 1.1.1.1 (1.1.1.1): 56 data bytes
    64 bytes from 1.1.1.1: seq=0 ttl=55 time=2.279 ms
    64 bytes from 1.1.1.1: seq=1 ttl=55 time=2.261 ms
    64 bytes from 1.1.1.1: seq=2 ttl=55 time=1.682 ms
    
    --- 1.1.1.1 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 1.682/2.074/2.279 ms
    
    docker exec r-gvisor ping -c3 1.1.1.1 ...
    PING 1.1.1.1 (1.1.1.1): 56 data bytes
    64 bytes from 1.1.1.1: seq=0 ttl=42 time=4.357 ms
    64 bytes from 1.1.1.1: seq=1 ttl=42 time=1.775 ms
    64 bytes from 1.1.1.1: seq=2 ttl=42 time=1.683 ms
    
    --- 1.1.1.1 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 1.683/2.605/4.357 ms
    
    docker exec r-runc ping -c3 1.1.1.1 ...
    PING 1.1.1.1 (1.1.1.1): 56 data bytes
    64 bytes from 1.1.1.1: seq=0 ttl=55 time=1.824 ms
    64 bytes from 1.1.1.1: seq=1 ttl=55 time=1.401 ms
    64 bytes from 1.1.1.1: seq=2 ttl=55 time=1.451 ms
    
    --- 1.1.1.1 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 1.401/1.558/1.824 ms
    

    What about connectivity between the containers? Well, seems to work too. Checking this by pinging one of the containers IP from the other ones, ok, including itself:

    $ ./exec.sh ping -c 3 172.17.0.3docker exec r-kata ping -c 3 172.17.0.3 ...PING 172.17.0.3 (172.17.0.3): 56 data bytes
    64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.352 ms
    64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.405 ms
    64 bytes from 172.17.0.3: seq=2 ttl=64 time=0.348 ms
    
    --- 172.17.0.3 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.348/0.368/0.405 ms
    
    docker exec r-gvisor ping -c 3 172.17.0.3 ...
    PING 172.17.0.3 (172.17.0.3): 56 data bytes
    64 bytes from 172.17.0.3: seq=0 ttl=42 time=0.400 ms
    64 bytes from 172.17.0.3: seq=1 ttl=42 time=0.414 ms
    64 bytes from 172.17.0.3: seq=2 ttl=42 time=0.408 ms
    
    --- 172.17.0.3 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.400/0.407/0.414 ms
    
    docker exec r-runc ping -c 3 172.17.0.3 ...
    PING 172.17.0.3 (172.17.0.3): 56 data bytes
    64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.302 ms
    64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.221 ms
    64 bytes from 172.17.0.3: seq=2 ttl=64 time=0.216 ms
    
    --- 172.17.0.3 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.216/0.246/0.302 ms
    

    Cool! Connectivity between containers and the Internet works despite using different runtimes.

    CPU

    More interesting is, what CPU type, how many cores and memory is reported by each container. Starting with lscpu, which is part of the alpine package util-linux that we first have to install:

    $ ./exec.sh apk add util-linux
    

    Now ask for lscpu. This is interesting, Kata reports a single CPU (I'm sure this can be tuned, after all, this runs within qemu-kvm). gVisor reports 16 CPU's but doesn't distinguish between actual cores and threads and runc reports the actual host CPU capabilities.

    mwiget@jcrpd:~/kata-gvisor-docker$ ./exec.sh lscpu
    docker exec r-kata lscpu ...
    Architecture:                    x86_64
    CPU op-mode(s):                  32-bit, 64-bit
    Byte Order:                      Little Endian
    Address sizes:                   40 bits physical, 48 bits virtual
    CPU(s):                          1
    On-line CPU(s) list:             0
    Thread(s) per core:              1
    Core(s) per socket:              1
    Socket(s):                       1
    Vendor ID:                       GenuineIntel
    CPU family:                      6
    Model:                           85
    Model name:                      Intel(R) Xeon(R) D-2146NT CPU @ 2.30GHz
    Stepping:                        4
    CPU MHz:                         2294.604
    BogoMIPS:                        4589.20
    Hypervisor vendor:               KVM
    Virtualization type:             full
    L1d cache:                       32 KiB
    L1i cache:                       32 KiB
    L2 cache:                        4 MiB
    L3 cache:                        16 MiB
    Vulnerability Itlb multihit:     Processor vulnerable
    Vulnerability L1tf:              Mitigation; PTE Inversion
    Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
    Vulnerability Meltdown:          Mitigation; PTI
    Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
    Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat pku md_clear
    
    docker exec r-gvisor lscpu ...
    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    Address sizes:       46 bits physical, 48 bits virtual
    CPU(s):              16
    On-line CPU(s) list: 0-15
    Vendor ID:           GenuineIntel
    CPU family:          6
    Model:               85
    Model name:          unknown
    Stepping:            unknown
    CPU MHz:             1248.188
    BogoMIPS:            1248.19
    Virtualization:      VT-x
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves pku ospke
    
    docker exec r-runc lscpu ...
    Architecture:                    x86_64
    CPU op-mode(s):                  32-bit, 64-bit
    Byte Order:                      Little Endian
    Address sizes:                   46 bits physical, 48 bits virtual
    CPU(s):                          16
    On-line CPU(s) list:             0-15
    Thread(s) per core:              2
    Core(s) per socket:              8
    Socket(s):                       1
    NUMA node(s):                    1
    Vendor ID:                       GenuineIntel
    CPU family:                      6
    Model:                           85
    Model name:                      Intel(R) Xeon(R) D-2146NT CPU @ 2.30GHz
    Stepping:                        4
    Frequency boost:                 enabled
    CPU MHz:                         1096.745
    CPU max MHz:                     2301.0000
    CPU min MHz:                     1000.0000
    BogoMIPS:                        4600.00
    Virtualization:                  VT-x
    L1d cache:                       256 KiB
    L1i cache:                       256 KiB
    L2 cache:                        8 MiB
    L3 cache:                        11 MiB
    NUMA node0 CPU(s):               0-15
    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
    Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
    Vulnerability Meltdown:          Mitigation; PTI
    Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
    Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
    

    Looking at the running process for the gVisor container, we see the 16 assigned cpu's:

    $ ps ax|grep qemu
    17233 ?        Sl     0:09 /usr/bin/qemu-vanilla-system-x86_64 -name sandbox-020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650 -uuid b61941a4-1faf-4b9e-be8a-2c0b440730ca -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/vm/020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=65110M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers-image_clearlinux_1.11.0-alpha0_agent_d26a505efd.img,size=134217728 -device virtio-scsi-pci,id=scsi0,disable-modern=false,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650/kata.sock,server,nowait -device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared,romfile= -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:04,disable-modern=false,mq=on,vectors=4,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -object memory-backend-ram,id=dimm1,size=2048M -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinuz-5.4.15.66-45.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=16 agent.use_vsock=false systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket -pidfile /run/vc/vm/020c88d6f2ae38b1b33788bf11d570d58931e7588a1b2412bf510c5ce2cd5650/pid -smp 1,cores=1,threads=1,sockets=16,maxcpus=16
    

    Memory

    $ ./exec.sh grep Mem /proc/meminfo
    docker exec r-kata grep Mem /proc/meminfo ...
    MemTotal:        2043288 kB
    MemFree:         2008312 kB
    MemAvailable:    1992656 kB
    
    docker exec r-gvisor grep Mem /proc/meminfo ...
    MemTotal:        2097152 kB
    MemFree:         2095112 kB
    MemAvailable:    2095112 kB
    
    docker exec r-runc grep Mem /proc/meminfo ...
    MemTotal:       65625052 kB
    MemFree:        50505940 kB
    MemAvailable:   51660856 kB
    

    Only runc reports the actual available physical memory, gVisor and Kata report roughly 2GB each. Again, I'm sure this can be defined per container.

    docker stats however still reports the actual memory usage of the container:

    $ docker stats --no-stream
    CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
    f8de0ffbee4b        r-kata              0.00%               6.656MiB / 62.58GiB   0.01%               4.75kB / 2.27kB     0B / 0B             9
    0a3024704e82        r-gvisor            0.03%               15.53MiB / 62.58GiB   0.02%               7.08kB / 0B         0B / 0B             158
    db036e69cf1a        r-runc              0.00%               4.348MiB / 62.58GiB   0.01%               7.3kB / 0B          0B / 0B             1
    

    Attaching a network at runtime

    Create a new virtual network:

    $ docker network create my-net
    4b77bec0a4e4198841dde0823d370ead5e0cbaaf06afad62cfe471cd222bcdf7
    

    And attaching it to all 3 containers:

    docker network connect my-net r-kata
    docker network connect my-net r-runc
    docker network connect my-net r-gvisor
    

    Cool, no errors are reported. Lets check if they are available within each container, but this only worked using runc, not in Kata nor gVisor. Looks like a trade off to be made in the name of isolation and multiple networks must be specified at launch.

    $ ./exec.sh ip addr show 
    docker exec r-kata ip addr show ...
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP qlen 1000
        link/ether 02:42:ac:11:00:04 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.4/16 brd 172.17.255.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:acff:fe11:4/64 scope link 
           valid_lft forever preferred_lft forever
    
    docker exec r-gvisor ip addr show ...
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/32 scope global dynamic 
    2: eth0: <UP,LOWER_UP> mtu 1500 
        link/generic 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.3/32 scope global dynamic 
    
    docker exec r-runc ip addr show ...
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    31: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
        link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
           valid_lft forever preferred_lft forever
    40: eth1@if41: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
        link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff
        inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1
           valid_lft forever preferred_lft forever
    

    Volumes

    To test volume mounts, I destroyed the containers and launch them with the volume mount option of a host folder:

    $ cat run-vol.sh 
    #!/bin/bash
    docker run -ti --runtime runc -v $PWD:/u -d --rm --name r-runc --hostname r-runc alpine
    docker run -ti --runtime runsc -v $PWD:/u -d --rm --name r-gvisor --hostname r-gvisor alpine
    docker run -ti --runtime kata -v $PWD:/u -d --rm --name r-kata --hostname r-kata alpine
    

    All containers successfully mounted the current folder:

    $ ./exec.sh ls /u
    docker exec r-kata ls /u ...
    exec.sh
    install-kata.sh
    install.sh
    run-vol.sh
    run.sh
    
    docker exec r-gvisor ls /u ...
    exec.sh
    install-kata.sh
    install.sh
    run-vol.sh
    run.sh
    
    docker exec r-runc ls /u ...
    exec.sh
    install-kata.sh
    install.sh
    run-vol.sh
    run.sh
    

    What about writing to the folder? For this test to work, I had to modify the exec.sh script slightly, so I can use redirects within the container:

    $ cat exec.sh 
    #!/bin/sh
    for runtime in kata gvisor runc; do
      echo docker exec r-$runtime sh -c "$@" ...
      docker exec r-$runtime sh -c "$@"
      echo ""
    done
    

    Now for the test itself:

    $ ./exec.sh 'cat /etc/hostname >> /u/hostnames.txt'
    docker exec r-kata sh -c cat /etc/hostname >> /u/hostnames.txt ...
    
    docker exec r-gvisor sh -c cat /etc/hostname >> /u/hostnames.txt ...
    
    docker exec r-runc sh -c cat /etc/hostname >> /u/hostnames.txt ...
    
    mwiget@jcrpd:~/kata-gvisor-docker$ cat hostnames.txt 
    r-kata
    r-gvisor
    r-runc
    

    Bingo. Write works too. Every container added its hostname to the file hostnames.txt on the host via read-write mount point.

    Don't trust my shell script ;-)? Lets try this one liner:

    kata-gvisor-docker$ ./exec.sh 'touch /u/`hostname`'
    docker exec r-kata sh -c touch /u/`hostname` ...
    
    docker exec r-gvisor sh -c touch /u/`hostname` ...
    
    docker exec r-runc sh -c touch /u/`hostname` ...
    
    $ ls
    exec.sh  hostnames.txt  install-kata.sh  install.sh  jcrpd  r-gvisor  r-kata  r-runc  run.sh  run-vol.sh
    

    As expected, we see files named with the containers hostname.

    XDP

    I gave my XDP drop test container a go using gVisor and Kata, but both failed to load XDP code via ip link set dev eth0 xdp obj /xdp-drop.o sec drop_icmp:

    Kata reported this:

    Installing xdp-drop.o app on eth0 ...
    mkdir /sys/fs/bpf failed: Operation not permitted
    Continuing without mounted eBPF fs. Too old kernel?
    
    Prog section 'drop_icmp' rejected: Function not implemented (38)!
     - Type:         6
     - Instructions: 11 (0 over limit)
     - License:      GPL
    

    While gVisor runsc reported this:

    Installing xdp-drop.o app on eth0 ...
    Error: either "dev" is duplicate, or "xdp" is a garbage.
    Makefile:16: recipe for target 'run-gvisor' failed
    

    References

  • 相关阅读:
    1046 A^B Mod C
    1019 逆序数
    1012 最小公倍数LCM
    1011 最大公约数GCD
    序列化
    bigdecimal
    equals 和hashcode
    java多线程-读写锁原理
    Java并发编程:volatile关键字解析
    面试
  • 原文地址:https://www.cnblogs.com/dream397/p/13801646.html
Copyright © 2011-2022 走看看