zoukankan      html  css  js  c++  java
  • Linux Container From Scratch

    See more container information

    Minimal Implementation of a Container

    Note the environement is Ubuntu16
    Install useful tools:

    sudo apt-get install vim screen lftp busybox-static systemd systemd-container yum qemu-utils aufs-tools pbzip2 htop

    create dir named nimi_container and a file named yum.conf in this dir. Under nimi_container, do 

    mkdir -p {minimal,minimal/usr}/{bin,sbin,etc}

    this creates thus dir tree shown in the following figures

    Then we need busybox which ensembles over 100 most commonly used Linux comands and tools such as ls, cat and echo. To check what tools it contains, do busybox --list-full. So we are going to have all the sybolic links in busybox created in the container:

    So check what happens in minimal/bin/:

    and copy the busybox binary:

    cp -f /bin/busybox minimal/bin/sh

    here -f means "force copy by removing the destination file if needed". Then

    touch minimal/etc/os-release

    Now let's get into busybox

    sudo chroot minimal /bin/sh

    It shows the following which means we got into busybox.

    In the busybox, we can do any similar like in the terminal.

    Is this a linux container?  Well there is no objective standard to judge that, so you can argue "sure, it is a container!" . If we look at the stuff like 

    it shows that we have no /proc here so we won't see any process namespace at all. But in terminal, it shows like the following. So we can say it is sort of like container.

    Then why is this? see from chroot to systemd-nspawn which explains why systemd-nspawn is better than chroot.  One thing that is worth mentioning is "chroot won't mount /proc and /sys but systemd-nspawn can". Do 

    systemd-nspawn -Dminimal /bin/sh
    #-Dminimal the dir we wanna use
    # /bin/sh the process we wanna launch

    and we enter busybox again. Do ps ax then we see

    If we check the network using ip a in chroot or systemd-nspawn, we actually see the system networking here.

    But for a container, we need its own private network, so we add the network, do

    systemd-nspawn --private-network -Dminimal /bin/sh

    and we check the network, we can see that we only have the loopback. 

    That is the minimal implementation of a container we made!   Now check what we have here

    Now sum how we did this, see the following steps:

    #sudo apt-get install vim screen lftp busybox-static systemd systemd-container yum qemu-utils aufs-tools pbzip2 htop
    
    #mkdir -p {minimal,minimal/usr}/{bin,sbin,etc}
    
    #for x in $(busybox --list-full); do
    >ln -s /bin/sh minimal/$x;done
    
    #cp -f /bin/busybox minimal/bin/sh
    
    #touch nimimal/etc/os-release

    And through the above experience, we list how we run the container:

    We can see we's better use systemd-nspawn instead of chroot. Last, if check ps ax in the terminal(not in busybox), we can see 

    They are 2 process in system level, but in container, /bin/sh's PID is 1, the exact same process. To see the process trees, use

    pstree

    Build a Container Image with cpio

    What we did in the above is not portable, so let's do it. Here cpio is our choice while you can use some other similar tools. cpio is a general file archiver utility and copies files into or out of a cpio or tar archive. First, do

    find minimal -print | cpio -o | pbzip2 -c > minimal.cpio.bz2
    #the input of cpio is the result of find
    # -o means the mode of output
    # pbzip2 parallel zip tool ; -c means output to stdout

    The reason why using pbzip2 is just that we wanna zip the output of cpio to save storage. See the reault

    So this file is the image of a container, you can ship around or put an app in it. That's the minimal container you could do.

    Limiting CPU Access with cgroups

    Cgroups is the base of implementing resource management of IaaS virtualization(such as kvm, lxc),PaaS Container(such as Docker) . Compare popular resource isolation techs.

    Before we create a cgroup, we first check the time of a pbzip2 cmd. First, lets create a datafile needed for pbzip2:

    dd if=/dev/urandom of=datafile bs=1M count=100
    #randomly generate numbers, into datafile which has 100 blocks, each block is 1M
    #if input file;  of  output file
    #bs byte size

    Then check how much time it costs:

    time pbzip2 -k -9 datafile
    #time: check how much time the cmd costs
    #-k: keep the original file
    #-9: 设置BWT预处理块大小,单位100k,1压缩速度最快,但是压缩率最低。默认900k

    Now lets create a cgroup.

    sudo mkdir /sys/fs/cgroup/cpuset/my_cpuset

    What is interesting is that after you created the dir my_cpuset, it will automatically show several files in the dir. See

    Then limiting resources for the cgroup. For CPU cgroup, there are 2 minimal required arguments: the CPU cores that you wanna access and mem nodes.

    Now we've got a cgroup, but there is no task in it. We can add the current shell to the cgroup, like this

    echo $$ | sudo tee /sys/fs/cgroup/cpuset/my_cpuset/tasks
    #echo $$ see the process id of this shell
    #tee 命令很好用,它从管道接受信息,一边向屏幕输出,一边向文件写入。

    Then re-run  time pbzip2 -k -9 datafile and one can see the time nearly doubles.

    Connect a Container to the Network

    In the above shows when we run a container with systemd-nspawn --private-network -Dminimal /bin/sh, the network only has a loopback, there is no way connecting to the world outside. Lets first manage network spaces. Do

    ip netns list
    #netns means network namespace

    It shows nothing, but it's not quite true because you actually have network namespaces but they are just unnamed. So what we do is to create one.

    sudo ip netns add minimal #create a network namespace named minimal

    Now add a virtualied ethernet cable.

    Now move the created ethernet into the namespace:

     We can see eth1 has gone into network namespace minimal, because any etherized device can only belong to one network namespace. Now bring the veth1 up.

    One feature of the ip cmd is to let you execute a process within a namespace. Here the process is chroot. 

    Now we are in the busybox, in the network namespace minimal, in the chroot namespace. Now we give eth1 an address and bring it up:

    At the hosting side, eth1 corresponds to veth1. In order to connect with eth1 in the container, we should add an address to veth1(we don't need to bring veth1 up because it has been brought up, see above). And ping eth1, it works!

    Demo: Splitting a Container Image into Layers with aufs

  • 相关阅读:
    AGC算法
    Cordic算法
    git Remote: HTTP Basic: Access denied Git failed with a fatal error.
    mysql 定义用户变量
    Docker 报错处理
    IIS,Docker 部署.Net Core
    SpringBoot向后台传参的若干种方式
    修改Mysql 数据库以及表字符集
    安装Docker
    获取北京时间
  • 原文地址:https://www.cnblogs.com/chaseblack/p/6049349.html
Copyright © 2011-2022 走看看