背景
让多人共同使用GPU资源而不相互干扰,同时系统资源分配比较灵活。
服务器配置
cpu
48 Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
2颗24核CPU
(指令:
cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
cat /proc/cpuinfo | grep physical | uniq -c)
安装显卡驱动
cd到.run文件目录
sudo apt-get purge nvidia*
sudo vim /etc/modprobe.d/blacklist-nouveau.conf
写上:
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
sudo apt-get install build-essential freeglut3-dev libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
sudo chmod +x NVIDIA-Linux-x86_64-410.104.run
sudo ./NVIDIA-Linux-x86_64-410.104.run --no-opengl-files -no-x-check
安装docker CE和nvidia-docker
参照https://www.cnblogs.com/journeyonmyway/p/10318624.html
docker安装错了卸载docker:
sudo apt-get purge docker
sudo apt-get purge docker-ce
sudo apt-get remove -y docker-*
sudo rm -rf /var/lib/docker
进行验证 docker --version
创建容器
docker pull nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
(ubuntu和cuda版本查询:https://hub.docker.com/r/nvidia/cuda/tags)
nvidia-docker run -dit --net host --name=cuda1 -h=LAB_VM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
docker exec -it cuda1 /bin/bash
apt-get update
apt-get install net-tools -y
apt-get install inetutils-ping
apt-get install vim
cp /etc/apt/sources.list /etc/apt/sources.list.bak
rm /etc/apt/sources.list
vim /etc/apt/sources.list
添加清华源 https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/
apt-get update
apt-get install openssh-server
把 #PermitRootLogin prohibit-password 改为 PermitRootLogin yes
passwd root
service ssh start
cd /home
vim startup.sh
#!/bin/bash
service ssh start
/bin/bash
chmod 777 startup.sh
exit
打包为镜像
参考: