查看服务器GPU信息
##安装lspci
yum -y install pciutils-3.5.1-3.el7.x86_64
Linux查看显卡信息,gpu型号:
lspci | grep -i vga
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
lspci -v -s 17:00.0
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. Device 2503
Flags: bus master, fast devsel, latency 0, IRQ 68, NUMA node 0
Memory at b4000000 (32-bit, non-prefetchable) [size=16M]
Memory at 380060000000 (64-bit, prefetchable) [size=256M]
Memory at 380070000000 (64-bit, prefetchable) [size=32M]
I/O ports at 7000 [size=128]
[virtual] Expansion ROM at b5000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
使用nvidia GPU可以:
lspci | grep -i nvidia
驱动版本(可能不正确,和nvidia-smi 不一至):
dpkg --list | grep nvidia-*
根据pci 号查gpu的型号
lspci | grep -i vga
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci
nvidia驱动
https://download.nvidia.com/XFree86/Linux-x86_64/435.21/
根据驱动适配的cuda版本
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
下载cuda及cudnn
cuda
https://developer.nvidia.com/cuda-toolkit-archive
cudnn
https://developer.download.nvidia.cn/compute/machine-learning/repos/
cuda cudnn 版本
cat /usr/local/cuda/version.txt
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
进行 cudn的测试:
- 编译samples例子
进入到Samples安装目录,然后在该目录下终端输入make,等待十来分钟。 - 编译完成后测试
可以在Samples里面找到bin/x86_64/linux/release/目录,并切换到该目录
运行deviceQuery程序,sudo ./deviceQuery
查看输出结果,重点关注最后一行,Pass表示通过测试
tensorflow中GPU的测试,python3:
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
import tensorflow as tf
print('tensorflow version: %s
' %(tf.__version__))
print('tensorflow path: %s
' %(tf.__path__))
print("GPU Available: %s
" %( tf.test.is_gpu_available()))
卸载驱动
deb 安装
sudo apt-get remove --auto-remove nvidia-cuda-toolkit
sudo apt-get remove --auto-remove cudnn*
cuDNN卸载
sudo rm -rf /usr/local/cuda/include/cudnn.h
sudo rm -rf /usr/local/cuda/lib64/libcudnn*
run 安装
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
sudo rm -rf /usr/local/cuda-8.0/
cudaxxxxx.run 安装
(是否同意条款,必须同意才能继续安装)
accept/decline/quit: accept
(这里不要安装驱动,因为已经安装最新的驱动了,否则可能会安装旧版本的显卡驱动,导致重复登录的情况)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 10.0 Toolkit?(是否安装CUDA 10 ,这里必须要安装)
(y)es/(n)o/(q)uit: y
Enter Toolkit Location(安装路径,使用默认,直接回车就行)
[ default is /usr/local/cuda-10.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?(同意创建软链接)
(y)es/(n)o/(q)uit: y
Install the CUDA 10.0 Samples?(不用安装测试,本身就有了)
(y)es/(n)o/(q)uit: n
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...(开始安装)
安装完成之后,可以配置他们的环境变量,在vim ~/.bashrc的最后加上以下配置信息:
export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
最后使用命令source ~/.bashrc使它生效。
可以使用命令nvcc -V查看安装的版本信息:
test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
测试安装是否成功
执行以下几条命令:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery
正常情况下输出:
./deviceQuery Starting...
cudnn
cudnn-10.0-linux-x64-v7.4.2.24.tgz
然后对它进行解压,命令如下:
tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
解压之后可以得到以下文件:
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a
使用以下两条命令复制这些文件到CUDA目录下:
cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/
拷贝完成之后,可以使用以下命令查看CUDNN的版本信息:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
https://cloud.tencent.com/developer/article/1382703
cuda 安装完测试
cd /usr/local/cuda/samples
sudo make
cd /usr/local/cuda/samples/bin/x86_64/linux/release
sudo ./deviceQuery
Result = PASS
sudo ./bandwidthTest
Result = PASS
检测cuda 版本
nvcc --version #或
nvcc -V #或
cat /usr/local/cuda/version.txt
cudnn
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
全流程搭建深度学习环境:cuda cudnn nvidia驱动安装
https://www.linuxidc.com/Linux/2017-12/149577.htm