zoukankan      html  css  js  c++  java
  • [ubuntu 18.04 + RTX 2070] Anaconda3

    RTX 2070 同样可以在 ubuntu 16.04 + cuda 9.0中使用。Ubuntu18.04可能只支持cuda10.0,在跑开源代码时可能会报一些奇怪的错误,所以建议大家配置 ubuntu16.04 + cuda 9.0。下文还是以ubuntu18.04 + cuda 10.0为例。ubuntu16.04 + cuda 9.0的配置方法大同小异。

    如果之前安装的是cuda9.0可以直接用pip安装Tensorflow-GPU,只需要安装Anaconda,virtualenv, CUDA, cuDNN, 之后pip安装tensorflow-gpu;

    如果安装的其他版本的CUDA,需要用源码安装,需要将下面的1,2,3,4,(5可选),之后用源码安装tensorflow-gpu, 并在configure时,根据自己的安装1,2,3,4,5的安装版本等情况自行调整配置选项。

    虽然CUDA官网中没有RTX20系列GPU所对应的版本,但是CUDA 10.0 支持Ubuntu18.04 + GPU GEFORCE RTX 2070。为了方便之后学习研究,需要配置:

    1. Anaconda3 5.2.0
    2. CUDA 10.0
    3. cuDNN 7.4.1
    4. Bazel 0.17
    5. TensorRT 5
    6. Tensorflow-gpu

    (以下为本人配置方法,由于配置过程中有过错误并重试等情况,以下内容如有错误还请指正~)

    (上面列出的各版本都是支持ubuntu18.04 和 RTX 2070的,大家也可以直接参照以上列表,自行安装~)

    (安装NVIDIA驱动的方法参考:https://blog.csdn.net/ghw15221836342/article/details/79571559 方法一中,把390替换为410即为RTX 2070 对应版本。)

    ----------------------------------------------------------------------------------

    Ubuntu 18 安装Anaconda3 - 5.2.0

    因为tensorflow支持python3.4, 3.5, 3.6,可能还未支持python3.7(python目前最高版本3.7.1 与anaconda3 对应最高python版本3.7.0),为了方便起见,选择安装Anaconda3 - 5.2.0,其对应的python版本为3.6.4. 安装了Anaconda之后,不需要再单独安装python及其各种库了。

    anaconda各版本的archive:

    https://repo.anaconda.com/archive/

    选择下载 Anaconda3-5.2.0-Linux-x86_64.sh

    之后到下载目录,

    bash Anaconda3-5.2.0-Linux-x86_64.sh

    可以通过查看

    python --version

    显示

    Python 3.6.5 :: Anaconda, Inc.

    表示安装成功。

    查看pip版本:

    $ pip --version
    pip 10.0.1 from /home/lsy/anaconda3/lib/python3.6/site-packages/pip (python 3.6)

    --------------(若完成以上,则无需进行下面的安装python的操作了)--------------------------------------------

    Ubuntu 18 安装 python 3.6

    sudo add-apt-repository ppa:jonathonf/python-3.6

    Ubuntu 18 安装 python3.7.1

    安装过程参考:

    https://blog.csdn.net/jaket5219999/article/details/80894517

    wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz && 
        tar -xvf Python-3.7.1.tar.xz && 
        cd Python-3.7.1 && 
        ./configure && make && sudo make altinstall

    从官网下载https://www.python.org/downloads/release/python-370/

    解压并打开指定目录

    ./configure && make && sudo make altinstall

    报错 zipimport.ZipImportError: can‘t decompress data; zlib not available

    解决方法:

    sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev 
    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev 
    xz-utils tk-dev

    python2,python3版本切换

    参考:https://stackoverflow.com/questions/43743509/how-to-make-python3-command-run-python-3-6-instead-of-3-5

    # 实现 python 链接 python3.6
    rm /usr/bin/python
    ln -s /usr/bin/python3.6 /usr/bin/python
    
    # 实现 python2 链接 Python2.7
    rm /usr/bin/python2
    ln -s /usr/bin/python2.7 /usr/bin/python2
    
    # 创建 alias
    alias python='/usr/bin/python3.6'
    ~/.bash_aliases

    pip安装

    sudo apt-get install python3-pip

    这里要用python3,否则匹配的是默认的python2。

    --------------------------------------------------------------------------------------------------------------------------------

    CUDA 10.0

    参考:

    https://medium.com/@vitali.usau/install-cuda-10-0-cudnn-7-3-and-build-tensorflow-gpu-from-source-on-ubuntu-18-04-3daf720b83fe

    1. 下载CUDA Toolkit : Linux / x86_64 / Ubuntu / 18.04 /deb (local)

    https://developer.nvidia.com/cuda-downloads

    2. 安装

    sudo dpkg -i cuda-repo-ubuntu1804–100-local-10.0.130410.48_1.0–1_amd64.deb
    sudo apt-key add /var/cuda-repo-100-local-10.0.130410.48/7fa2af80.pub
    sudo apt-get update
    sudo apt-get install cuda

    3. 添加环境变量

    nano ~/.bashrc

    末行添加并保存退出。

    export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}} 

    4. 检查驱动版本和CUDA toolkit

    cat /proc/driver/nvidia/version
    nvcc -V

    5. (Optional) Build CUDA samples and run it.

    cd /usr/local/cuda-10.0/samples
    sudo make

    这需要等一段时间。完成后,可以进入资源中,执行命令查看结果。

    cd /usr/local/cuda-10.0/samples/bin/x86_64/linux/release
    ./deviceQuery
    ./bandwidthTest

    ------------------------------------------------------------------

    cuDNN v7.4.1 for CUDA 10.0

    1. 下载cuDNN Library for Linux

    https://developer.nvidia.com/rdp/cudnn-download

    (下载前需要在NVIDIA注册账号:https://developer.nvidia.com/

    2. 解压下载好的文件,解压后cuDNN的文件夹名称为cuda

    3. 将cuDNN内容复制到CUDA安装文件中,即将cuDNN解压后的cuda文件中内容复制到/usr/local的CUDA中。

    $ sudo cp cuda/include/cudnn.h    /usr/local/cuda/include
    $ sudo cp cuda/lib64/libcudnn*    /usr/local/cuda/lib64
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda/lib64/libcudnn*

    (该方法参考:https://blog.csdn.net/u010801439/article/details/80483036

    ------------------------------------------------------------------------

    NCCL v2.3.7

    只有需要用源码安装tensorflow时才需要装这个哦~用pip的可以跳过

    安装方法参考:https://blog.csdn.net/zuyuhuo6777/article/details/81450258

    1. 下载

    https://developer.nvidia.com/nccl/nccl-download

    选择Local installers (x86)中的Local installer for Ubuntu 18.04
    2. 安装
    进入下载目录,安装本地NCCL存储库,更新APY数据库,安装libnccl2与APT打包。此外,若需要使用NCCL编译应用程序,则可以安装libnccl-dev的包裹。

    $ sudo dpkg -i nccl-repo-ubuntu1804-2.3.7-ga-cuda10.0_1-1_amd64.deb 
    $ sudo apt update
    $ sudo apt install libnccl2 libnccl-dev

    ------------------------------------------------------------------------

    方便起见,请直接下载Bazel 0.17

    (早先安装了0.19,--config == cuda 并不支持0.17以上版本,不清楚使用0.19对后续步骤有无影响,所以,卸载了0.19,重新安装了0.17。卸载方法:whereis bazel,找到bazel目录,直接rm -rf <path>即可。)

    Bazel 0.19.2

    只有需要用源码安装tensorflow时才需要装这个哦~用pip的可以跳过

    官网提供了多种安装方法,

    https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu

    以下使用了Installing using binary installer的方法。

    1. 下载需要的包

    $ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python

    2. 下载Bazel

    https://github.com/bazelbuild/bazel/releases

    选择安装了bazel-0.19.2-installer-linux-x86_64.sh

    3. Run the installer

    $ chmod +x bazel-<version>-installer-linux-x86_64.sh
    $ ./bazel-<version>-installer-linux-x86_64.sh --user

    4. 设置环境

    $ nano ~/.bashrc

    末行添加并保存退出

    export PATH="$PATH:$HOME/bin"

    执行以生效:

    $ source ~/.bashrc

    5. 检查是否安装成功

    $ bazel version

     --------------------------------------------

    TensotRT 5.0.2.6

    只有需要用源码安装tensorflow时才需要装这个哦~用pip的可以跳过。用源码安装,该项也可以不装,看自己需求。如果安装,在源码编译,configure时记得选择和自己安装匹配的选项哦~

    for Ubuntu 1804 and CUDA 10.0

    1. 下载

    https://developer.nvidia.com/nvidia-tensorrt-5x-download

    选择了Debian and RPM Install Package:

    TensorRT 5.0.2.6 GA for Ubuntu 1804 and CUDA 10.0 DEB local repo packages

    2. 安装,参考官方文档:

    https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#downloading

    $ sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.0.2.6-ga-20181009_1-1_amd64.deb 
    $ sudo apt-key add /var/nv-tensorrt-repo-cuda10.0-trt5.0.2.6-ga-20181009/7fa2af80.pub 
    $ sudo apt-get update
    $ sudo apt-get install tensorrt

     之前Anaconda3 中python是3.6版本,下面直接写python就好,不用改为python3.

    $ sudo apt-get install python-libnvinfer-dev

    安装后显示:

    Setting up python-libnvinfer-dev (5.0.2-1+cuda10.0) ...

    若计划通过tensorflow使用tensorRT

    $ sudo apt-get install uff-converter-tf

    安装后显示:

    Setting up graphsurgeon-tf (5.0.2-1+cuda10.0) ...
    Setting up uff-converter-tf (5.0.2-1+cuda10.0) ...

    3. 检查我们的安装结果:

    $ dpkg -l | grep TensorRT
    ii  graphsurgeon-tf                                             5.0.2-1+cuda10.0                    amd64        GraphSurgeon for TensorRT package
    ii  libnvinfer-dev                                              5.0.2-1+cuda10.0                    amd64        TensorRT development libraries and headers
    ii  libnvinfer-samples                                          5.0.2-1+cuda10.0                    all          TensorRT samples and documentation
    ii  libnvinfer5                                                 5.0.2-1+cuda10.0                    amd64        TensorRT runtime libraries
    ii  python-libnvinfer                                           5.0.2-1+cuda10.0                    amd64        Python bindings for TensorRT
    ii  python-libnvinfer-dev                                       5.0.2-1+cuda10.0                    amd64        Python development package for TensorRT
    ii  tensorrt                                                    5.0.2.6-1+cuda10.0                  amd64        Meta package of TensorRT
    ii  uff-converter-tf                                            5.0.2-1+cuda10.0                    amd64        UFF converter for TensorRT package

    --------------------------------------------------------

    Tensorflow

    推荐两种安装方式:1.在docker中安装;2. 在virtualenv中安装。一般2用的多一些。

    (1)docker中:

    1. Docker的安装:

    https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04

    2. Install nvidia-docker:

    https://github.com/NVIDIA/nvidia-docker

    3. Downloads TensorFlow release images to your machine:

    $ docker pull tensorflow/tensorflow:latest-devel-gpu

    (2)virtualenv中:

    sudo apt update
    sudo apt install python-dev python-pip
    sudo pip install -U virtualenv  # system-wide install
    virtualenv --system-site-packages -p python3 ./venv
    source ./venv/bin/activate
    (venv) $ pip install --upgrade pip
    (venv) $ pip list

    在(venv)中继续安装tensorflow.

    (1) Installed by pip: 如果之前安装的是cuda9.0可以直接用pip安装,否则,需要用源码安装,见(2)

    pip install tensorflow-gpu==1.12

    ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

    Solution: add the following to .bashrc

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/

    (2) Else: Build from source

    这里注意./configure时候,默认cuda版本是9.0,我们改为 10.0.

    安装完毕后可以退出venv:

    (venv) $ deactivate # don't exit until you're done using TensorFlow

    ------------------------------------------------------------------------------

    测试tensorflow-gpu在docker中是否能顺利运行:

    $ sudo docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu 
    >    python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
    [sudo] password for lsy: 
    Unable to find image 'tensorflow/tensorflow:latest-gpu' locally
    latest-gpu: Pulling from tensorflow/tensorflow
    18d680d61657: Already exists 
    0addb6fece63: Already exists 
    78e58219b215: Already exists 
    eb6959a66df2: Already exists 
    e3eb30fe4844: Already exists 
    852c9b7a4425: Already exists 
    0a298bf31111: Already exists 
    4b34ad03a386: Pull complete 
    ea4e8d636cf7: Pull complete 
    e641906af026: Pull complete 
    af41a77e326c: Pull complete 
    56234dc44f16: Pull complete 
    33999852f515: Pull complete 
    11679b84da5e: Pull complete 
    231eb8ba046b: Pull complete 
    7d894676fbc1: Pull complete 
    Digest: sha256:847690afb29977920dbdbcf64a8669a2aaa0a202844fe80ea5cb524ede9f0a0b
    Status: Downloaded newer image for tensorflow/tensorflow:latest-gpu
    2018-11-26 05:48:05.315151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2018-11-26 05:48:05.490068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2018-11-26 05:48:05.490510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
    name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
    pciBusID: 0000:01:00.0
    totalMemory: 7.76GiB freeMemory: 7.09GiB
    2018-11-26 05:48:05.490528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    2018-11-26 05:48:05.727215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-11-26 05:48:05.727251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
    2018-11-26 05:48:05.727257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
    2018-11-26 05:48:05.727423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6817 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
    tf.Tensor(-568.0144, shape=(), dtype=float32)

     ---------------------------------------------------------

    =======================================

         感谢您的支持!  [支付宝]  您愿意送我一个小礼物吗?O(∩_∩)O

  • 相关阅读:
    linux centos 常用命令(需掌握)
    centos轻松搭建NFS
    Centos6.1在yum安装软件的时候,居然报错了,如何解决
    安装好Centos后,需要做的几件事情。
    使用scp命令传输文件
    批量删除文件或者批量修改文件
    Centos7搭建常用的LNMP架构
    python实现自动抠名字签名,比PS还快
    inotify软件实现实时同步,ssh-key 秘钥连接方式,saltstack实战批量管理Linux,expect批量分发秘钥
    Cisco 路由器配置OSPF 动态路由 (开放式最短路径优先)
  • 原文地址:https://www.cnblogs.com/shiyublog/p/10011803.html
Copyright © 2011-2022 走看看