1 安装nvidia驱动
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C03sv00001B4Csd000011D7bc03sc00i00
vendor : NVIDIA Corporation
model : GP106 [GeForce GTX 1060 6GB]
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
安装指定版本的驱动,一般安装推荐的版本(recommended)即可,我此处安装的是450版本。
sudo apt install nvidia-driver-450
安装后重启
sudo reboot
进入系统后,输入nvidia-smi查看当前GPU的基础信息,确认该版本驱动是否安装成功
$ nvidia-smi
Sun Feb 21 16:58:51 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 0% 57C P8 10W / 120W | 567MiB / 6075MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 875 G /usr/lib/xorg/Xorg 194MiB |
| 0 N/A N/A 1188 G /usr/bin/kwin_x11 116MiB |
| 0 N/A N/A 1190 G /usr/bin/plasmashell 41MiB |
| 0 N/A N/A 1492 G /usr/bin/plasma-discover 16MiB |
| 0 N/A N/A 3595 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 3719 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 4053 G ...gAAAAAAAAA --shared-files 188MiB |
+-----------------------------------------------------------------------------+
2 安装CUDA 10.1
具体安装过程如下:
sudo apt install nvidia-cuda-toolkit
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
需要注意的是,在Ubuntu 20.04里,CUDA安装在不同的目录下。
$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda.h
3 安装与CUDA 10.1兼容版本的cuDNN
下载压缩包cudnn-10.1-linux-x64-v7.6.5.32.tgz:
https://developer.nvidia.com/rdp/cudnn-archive
下载需要登录nvidia账户,并选择版本cuDNN 7.6.5(其他版本cuDNN可能失败,已尝试安装8.0.5,tensorflow运行失败)
$ sudo cp cuda/include/cudnn.h /usr/lib/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/lib/cuda/lib64/
$ sudo chmod a+r /usr/lib/cuda/include/cudnn.h /usr/lib/cuda/lib64/libcudnn*
4 设置CUDA环境变量
$ echo 'export LD_LIBRARY_PATH=/usr/lib/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/lib/cuda/include:$LD_LIBRARY_PATH' >> ~/.bashrc
$ source ~/.bashrc
5 验证已安装
$ python3
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices("GPU")
2021-02-21 17:43:50.205210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-02-21 17:43:50.234635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.234911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBand 178.99GiB/s
2021-02-21 17:43:50.235095: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-02-21 17:43:50.236187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-02-21 17:43:50.237281: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-02-21 17:43:50.237489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-02-21 17:43:50.238605: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-02-21 17:43:50.239236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-02-21 17:43:50.241550: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-02-21 17:43:50.241657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.241960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.242156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
参考资料:
https://towardsdatascience.com/installing-tensorflow-gpu-in-ubuntu-20-04-4ee3ca4cb75d
https://cyfeng.science/2020/05/02/ubuntu-install-nvidia-driver-cuda-cudnn-suits/