如下启动报错,就是驱动掉了引发的问题。
ubuntu@yufeichang1:/data/pigfarm/packages/pigfarm-deploy-packages$ sudo docker-compose up -d
Creating nginx ... done
Creating pigfarm-app ... error
ERROR: for pigfarm-app Cannot start service pigfarm: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "process_linux.go:385: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411 --pid=5073 /var/lib/docker/overlay2/5b35e5dd0b1a3da05239a368e03113cb04c1a34c29d725baac7f6b5535f4f703/merged]\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\""": unknown
ERROR: for pigfarm Cannot start service pigfarm: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "process_linux.go:385: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411 --pid=5073 /var/lib/docker/overlay2/5b35e5dd0b1a3da05239a368e03113cb04c1a34c29d725baac7f6b5535f4f703/merged]\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\""": unknown
ERROR: Encountered errors while bringing up the project.
ubuntu@yufeichang1:/data/pigfarm/packages/pigfarm-deploy-packages$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.