最近在升级apollo docker image nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04。真叫一个头大,但是往往在这个过程中能够得出很多体会,比如应该怎么做更好,更容易维护,更容易升级等等。
计划是分三个阶段来完成,目前还在第二阶段:
1. build image:这个阶段的体会最深的是在速度方面,主要是cache相关。正确合理的利用好cache机制,但要注意bad cache。
- 建议将较稳定偏系统不易变的 置于靠前位置。
- 另一个,之前制作image相关的script到处都是apt-get update,基本在apt之前大家都update一把,其实是非常耗时的。关于这个操作做了什么,什么时候需要做,这里面的细节可以了解下。
- 还有关于bad cache,cache机制以后再了解下,由于规模大,目前大部分image的dockerfile RUN xxx.sh,这个里面cache好像有点奇怪。
- 小技巧,由于做一次整的build很耗时,在失败时,可以运行上一层成功的layer进行调试,很方便好使。
2. compile code in new image:new image最终是用于承载product code,product code跟image的interaction主要体现在头文件和库文件。
- new image做出来,还有很多跟product code可能不兼容的地方,特别是OS层的升级,携带着许多默认小库的自动升级,比如boost,vtk.xx==>vtk.xx等,会携带大量头文件及库文件的改动,以及接口变更等。这些往往还会带来关联波及。
- 另一个就是之前的image使用大量打包的so,这些so都是基于以前的环境做出来,或者说运行很依赖于做它的环境,比如so是打包boost.54.xx相关东西,现在新环境很可能已经相应升级。这个很不便于跨度较大的升级。所以在考虑编译时间成本可以忍受的情况下,尽量不要去down so来用。
3. test code:做这个的原因是,上述相关还只是停留在symbol层次的工作,还需要验证上述做的相关symbol变更是否达到等价的效果。
refer:
https://www.joyfulbikeshedding.com/blog/2019-08-27-debugging-docker-builds.html
https://vsupalov.com/debug-docker-container/
$ docker run -it --entrypoint /bin/bash $IMAGE_NAME -s
docker private registery http://192.168.1.101:5000/v2/mooncar/mooncar/tags/list
一些好用的技巧:
#!/bin/bash
hgrep()
{
sudo find $1 -name "*.h" -o -name "*.hpp" |xargs grep -n $2
}
cgrep()
{
sudo find $1 -name "*.c" -o -name "*.cpp" |xargs grep -n $2
}
cmgrep()
{
sudo find $1 -name "CMakeList.txt" |xargs grep -n $2
}
把这个在.bashrc中source一把,很方便快捷好用。
aptitude show libboost-dev show version
apt-cache madison libboost-dev list candidate,sourcelist
patch -p0 < xx.patch
在每行的头添加字符,比如"HEAD",命令如下:sed 's/^/HEAD&/g' test.file
在每行的行尾添加字符,比如“TAIL”,命令如下:sed 's/$/&TAIL/g' test.file
ldd -r https://blog.csdn.net/xihuanzhi1854/article/details/89523247
nm -C xxx.so |grep "yyy" --color=auto
grep "Werror" . -R |grep Makefile.in |awk -F ":" '{print $1}'|xargs sed -i "s/ -Werror//g"
readelf -d libadolc.so |grep SONAME
objdump -TC libleveldb.so
look symbol in obj
moonx@moonx:/usr/download/apue/ttt$ g++ -c main.c
moonx@moonx:/usr/download/apue/ttt$ nm -C main.o
0000000000000000 T main
U hello(char const*)
moonx@moonx:/usr/download/apue/ttt$ gcc -c hello.c
moonx@moonx:/usr/download/apue/ttt$ nm -C hello.o
0000000000000000 T hello
U printf
moonx@moonx:/usr/download/apue/ttt$ g++ -c hello.c
moonx@moonx:/usr/download/apue/ttt$ nm -C hello.o
U printf
0000000000000000 T hello(char const*)
moonx@moonx:/usr/download/apue/ttt$ gcc -c main.c
moonx@moonx:/usr/download/apue/ttt$ nm -C main.o
U hello
0000000000000000 T main
make static lib and dynamic lib
ABI http://litaotju.github.io/2019/02/24/Why-we-need-D_GLIBCXX_USE_CXX11_ABI=0/
revise:http://litaotju.github.io/c++/2019/02/24/Why-we-need-D_GLIBCXX_USE_CXX11_ABI=0/
objdump -T -C libfoo.so
- -T stands for dynamic symbols
- -C will help making c++ methods more human-friendly
apollo@in_dev_docker:/apollo/ttt/abi$ objdump -TC libmy.so |grep print
0000000000000945 g DF .text 0000000000000068 Base print_string(std::string const&)
apollo@in_dev_docker:/apollo/ttt/abi$ g++ -fPIC mylib.cpp -shared -o libmy.so -D_GLIBCXX_USE_CXX11_ABI=0
apollo@in_dev_docker:/apollo/ttt/abi$ objdump -TC libmy.so |grep print
0000000000000945 g DF .text 0000000000000068 Base print_string(std::string const&)
apollo@in_dev_docker:/apollo/ttt/abi$ g++ -fPIC mylib.cpp -shared -o libmy.so
apollo@in_dev_docker:/apollo/ttt/abi$ objdump -TC libmy.so |grep print
00000000000009b5 g DF .text 0000000000000068 Base print_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
apollo@in_dev_docker:/apollo/ttt/abi$ objdump -TC libmy.so |grep string
0000000000000000 DF *UND* 0000000000000000 GLIBCXX_3.4.21 std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
00000000000009b5 g DF .text 0000000000000068 Base print_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
apollo@in_dev_docker:/apollo/ttt/abi$ nm libmy.so |grep print_string
00000000000009b5 T _Z12print_stringRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
apollo@in_dev_docker:/apollo/ttt/abi$ nm /usr/lib/x86_64-linux-gnu/libleveldb.so |grep open
nm: /usr/lib/x86_64-linux-gnu/libleveldb.so: no symbols
apollo@in_dev_docker:/apollo/ttt/abi$ nm -D /usr/lib/x86_64-linux-gnu/libleveldb.so |grep open
U fopen
0000000000013080 T leveldb_open
0000000000013f60 T leveldb_options_set_max_open_files
U open
U opendir
apollo@in_dev_docker:/apollo/ttt/abi$ nm -DC /usr/lib/x86_64-linux-gnu/libleveldb.so |grep open
U fopen
0000000000013080 T leveldb_open
0000000000013f60 T leveldb_options_set_max_open_files
U open
U opendir
apollo@in_dev_docker:/apollo/ttt/abi$ g++ -fPIC mylib.cpp -shared -o libmy.so -D_GLIBCXX_USE_CXX11_ABI=0
apollo@in_dev_docker:/apollo/ttt/abi$ g++ myapp.cpp -lmy -L./ -o myapp
/tmp/ccU6YIvW.o: In function `main':
myapp.cpp:(.text+0x43): undefined reference to `print_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
apollo@in_dev_docker:/apollo/ttt/abi$ nm libmy.so |grep print_string0000000000000945 T _Z12print_stringRKSs
linker相关:
$ dpkg -l|grep boost
echo "/usr/local/mysql/lib" >> /etc/ld.so.conf
sudo ldconfig -v | grep mysql # 查看mysql库文件是否被找到。
what apt leave: /var/lib/dpkg/info
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mysql/lib
LIBRARY_PATH
is used by gcc before compilation to search directories containing static and shared libraries that need to be linked to your program.
LD_LIBRARY_PATH
is used by your program to search directories containing shared libraries after it has been successfully compiled and linked.
EDIT: As pointed below, your libraries can be static or shared. If it is static then the code is copied over into your program and you don't need to search for the library after your program is compiled and linked. If your library is shared then it needs to be dynamically linked to your program and that's when LD_LIBRARY_PATH
comes into play.
#在PATH中找到可执行文件程序的路径。
export PATH =$PATH:$HOME/bin
#找到动态链接库的路径
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/MyLib
export LD_LIBRARY_PATH
#找到静态库的路径
LIBRARY_PATH=$LIBRARY_PATH:/MyLib
export LIBRARY_PATH
gcc -L / -l option flags
gcc -l links with a library file.
gcc -L looks in directory for library files.
https://alex.dzyoba.com/blog/gdb-source-path/
头文件相关:
#gcc找到头文件的路径
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/include/libxml2:/MyLib
export C_INCLUDE_PATH
#g++找到头文件的路径
CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/include/libxml2:/MyLib
export CPLUS_INCLUDE_PATH
cpp -Iheaders -v like gcc -Iheaders source.c
cpp -iquote hdr1 -v
ABI http://litaotju.github.io/2019/02/24/Why-we-need-D_GLIBCXX_USE_CXX11_ABI=0/
diff -ruNa s1 s2 >> s1.patch
patch -p0 < s1.patch , patch to s1
cp -a
在保留原文件属性的前提下复制文件
cp -r dirname destdir
动态链接库文件(windows里的dll)在linux里以.so结尾,称为shared object library 。该文件是elf(Executable and Linkable Format)文件的一种,有两个符号表,“.symtab”和“.dynsym”。“.dynsym”只保留“.symtab”中的全局符号(global symbols )。命令strip可以去掉elf文件中“.symtab”,但不会去掉“.dynsym”。/lib里的共享对象库.so文件在使用nm时提示no symbol是因为被strip了。所以需要查看动态符号表“.dynsym”,加上-D:
usr@usrpc:~$nm -Do /lib/*.so.*
类似的命令还有:
readelf --symbols *.so.*
objdump -TC *.so.*
$ mount -o remount,rw /
update nvidia-driver http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux
1. sudo apt-get purge nvidia*
2. Add the graphics drivers PPA
Let us go ahead and add the graphics-driver PPA –
- sudo add-apt-repository ppa:graphics-drivers
- And update
- sudo apt-get update
4. Install (and activate) the latest Nvidia graphics drivers. Enter the following command to install the version of Nvidia graphics supported by your graphics card –
- sudo apt-get install nvidia-370
https://codepyre.com/2019/01/installing-nvidia-docker2-on-ubuntu-18.0.4/ install nvidia-docker2
130 sudo apt update
131 apt-cache search linux|grep linux- |grep 4.15.0-128
132 sudo apt install linux-headers-4.15.0-128 linux-headers-4.15.0-128-generic linux-image-4.15.0-128-generic linux-modules-4.15.0-128-generic
switch kernel:
1.输入命令:sudogedit /etc/default/grub)
2.找到hidden_timeout 数字改为10,保存
3. 这行代码下面有个bool量设置 改为false
4. 终端执行命令:sudoupdate-grub
1. grep menuentry /boot/grub/grub.cfg
该命令显示内核的顺序,比如显示为:
menuentry 'Ubuntu, with Linux 3.2.17experimental' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 3.2.17experimental (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 3.2.17-chipsee' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 3.2.17-chipsee (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 3.2.0-23-generic' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 3.2.0-23-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry "Memory test (memtest86+)" {
menuentry "Memory test (memtest86+, serial console 115200)"
2. 假设你要以3.2.17内核版本启动,则将文件/etc/default/grub中
GRUB_DEFAULT=0 改为 GRUB_DEFAULT=2保存后
18.04
3. 然后使用命令sudo update-grub
1950 sudo apt-get install --reinstall nvidia-410
1953 sudo apt-get install --reinstall nvidia-410
1955 vi /var/lib/dkms/nvidia-410/410.78/build/Kbuild
1956 sudo vi /var/lib/dkms/nvidia-410/410.78/build/Kbuild
1957 cp /var/lib/dkms/nvidia-410/410.78/build/Kbuild .
1958 cp Kbuild /var/lib/dkms/nvidia-410/410.78/build/Kbuild
1959 sudo cp Kbuild /var/lib/dkms/nvidia-410/410.78/build/Kbuild
1961 dpkg -l | grep nvidia
1962 nvidia-smi
sder@sder-kvm-yangpeng:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-129-generic
Found initrd image: /boot/initrd.img-4.15.0-129-generic
Found linux image: /boot/vmlinuz-4.15.0-128-generic
Found initrd image: /boot/initrd.img-4.15.0-128-generic
Warning: Please don't use old title `Ubuntu, with Linux 4.15.0-128-generic' for GRUB_DEFAULT, use `Advanced options for Ubuntu>Ubuntu, with Linux 4.15.0-128-generic' (for versions before 2.00) or `gnulinux-advanced-648cad3b-7e55-4d6d-b38b-9247483aecb4>gnulinux-4.15.0-128-generic-advanced-648cad3b-7e55-4d6d-b38b-9247483aecb4' (for 2.00 or later)
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done
sder@sder-kvm-yangpeng:~$ sudo vi /etc/default/grub
sder@sder-kvm-yangpeng:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-129-generic
Found initrd image: /boot/initrd.img-4.15.0-129-generic
Found linux image: /boot/vmlinuz-4.15.0-128-generic
Found initrd image: /boot/initrd.img-4.15.0-128-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done
sder@sder-kvm-yangpeng:~$ cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 4.15.0-128-generic"
#GRUB_HIDDEN_TIMEOUT=0
#GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
查看系统自启动服务:systemctl list-unit-files --type=service | grep enable
samba:
[sambashare]
comment = Samba on Ubuntu
path = /home/corenoc/leax
read only = no
browsable = yes
guest ok = yes
Ubuntu18.04关闭内核自动更新
ubuntu默认启动了自动更新内核,为了避免出现重启系统后遇到错误进入不到系统中去,我们可以进一步关闭内核更新,使用当前内核。
执行:
root@linux:~# sudo apt-mark hold linux-image-generic linux-headers-generic linux-image-generic set on hold. linux-headers-generic set on hold.
如果要重启启动内核更新:
root@linux:~# sudo apt-mark unhold linux-image-generic linux-headers-generic
https://askubuntu.com/questions/540937/what-does-apt-get-install-do-under-the-hood
apollo@in_dev_docker:/apollo/bazel-bin/third_party/portable_file_dialogs$ update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/bin/gcc-8 60 auto mode
1 /usr/bin/gcc-4.8 10 manual mode
2 /usr/bin/gcc-8 60 manual mode
Press <enter> to keep the current choice[*], or type selection number: q
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
A: sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 40976EAF437D05B5
yangpeng@mx:/etc$ cat ./systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime