系统为:
centos7.8(虚拟机)
遇到的问题
1、nouveau :failed to create kernel chanel,-22
关闭nouveau
vi /etc/modprobe.d/blacklist-nouveau.conf INSERT KEY blacklist nouveau options nouveau modeset=0 ESC-BUTTON :wq
重建 initramfs image
备份
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
重建
dracut /boot/initramfs-$(uname -r).img $(uname -r)
重启系统
reboot
查看nouveau是否已经禁用,没有输出就对了
lsmod| grep nouveau
2、NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
上面问题有可能是因为驱动版本不对造成的,下面我们看看如何获取驱动版本
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install nvidia-detect
经过测试默然在官方下载默认驱动版本可能没有检测出来的高(https://www.nvidia.com/Download/index.aspx?lang=en-us)
[root@10-64-2-16 ~]# nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1eb8] NVIDIA Corporation TU104GL [Tesla T4]
This device requires the current 470.86 NVIDIA driver kmod-nvidia
[1013:00b8] Cirrus Logic GD 5446
安装kernel-devel 包
yum install -y kernel-devel
3.10.0-1160.49.1.el7.x86_64 是我安装的kernel-devel版本
[root@10-64-2-16 ~]# ls /usr/src/kernels/
3.10.0-1160.49.1.el7.x86_64 3.10.0-1160.49.1.el7.x86_64.debug
[root@10-64-2-16 modules]# cd /lib/modules/$(uname -r)
[root@10-64-2-16 3.10.0-1127.el7.x86_64]# rm -f build
[root@10-64-2-163.10.0-1127.el7.x86_64]#ln -s /usr/src/kernels/3.10.0-1160.49.1.el7.x86_6 build
安装驱动
sh NVIDIA-Linux-x86_64-470.86.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.49.1.el7.x86_6/build