zoukankan      html  css  js  c++  java
  • 【深度学习】安装TensorFlow-GPU

    1、Windows版

    准备

    干净的系统,没有安装过Python,有的话就卸载了。

    另外我的系统安装了VS2015 VS2017(这里我不知道是不是必备的)。

    现在TensorFlow和cuda以及cuDNN品名升级,所以这里采用了几乎是最新版的了(2018年11月19日)

    安装

    1、安装Anaconda

    这里省略。注意一点,安装的选项加入path,都勾选。

    2、安装显卡驱动

    默认安装。

    3、安装cuda9.0

    默认安装。

    4、安装cuDNN 7.x

    将压缩包解压,放在C:ProgramDataNVIDIA GPU Computing Toolkitv9.0这个目录下。

    然后将目录C:ProgramDataNVIDIA GPU Computing Toolkitv9.0in添加到环境变量PATH里。

    验证

    1、启动Anaconda Prompt

    输入

    1 conda env list

    显示只有一个base或者root的环境。表示只有一个环境。

    2、修改Anaconda的软件源

    执行

    1 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
    2 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
    3 conda config --set show_channel_urls yes

    表示将anaconda的软件下载源修改成清华Tuna的了。

    3、创建用于TensorFlow的Python环境

    conda create -n tf-gpu-py3.5 python=3.5

    例子:

    D:Userszyb>conda create -n tf-gpu-py3.5 python=3.5
    Solving environment: done
    
    ## Package Plan ##
    
      environment location: C:anaconda35envs	f-gpu-py3.5
    
      added / updated specs:
        - python=3.5
    
    
    The following NEW packages will be INSTALLED:
    
        certifi:        2018.8.24-py35_1001 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
        pip:            18.0-py35_1001      https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
        python:         3.5.5-he025d50_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
        setuptools:     40.4.3-py35_0       https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
        vc:             14.1-h21ff451_1     https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/peterjc123
        vs2017_runtime: 15.4.27004.2010-1   https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/peterjc123
        wheel:          0.32.0-py35_1000    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
        wincertstore:   0.2-py35_1002       https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    
    Proceed ([y]/n)? y
    Preparing transaction: done
    Verifying transaction: done
    Executing transaction: done
    #
    # To activate this environment, use
    #
    #     $ conda activate tf-gpu-py3.5
    #
    # To deactivate an active environment, use
    #
    #     $ conda deactivate

    4、激活刚刚创建的环境

    conda activate tf-gpu-py3.5

    5、安装TensorFlow GPU版

    conda install tensorflow-gpu

    6、代码验证

    启动python

    输入如下代码

    import tensorflow as tf

    查看是否报错。

    如果报错,就使用conda install 包名(比如numpy)

    如果不报错,接着执行

     1 a = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],name='a')
     2 b = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[3,2],name='b')
     3 c = tf.matmul(a,b)
     4 sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
     5 #这步结束之后,会出现一个警告:
     6 #Device mapping: no known devices.
     7 #2018-11-19 22:18:15.899459: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimedirect_session.cc:288] Device mapping:
     8 #不用管,执行下一步
     9 print(sess.run(c))
    10 #输出如下:
    11 MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
    12 2018-11-19 22:18:23.059234: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimeplacer.cc:935] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:CPU:0
    13 a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
    14 2018-11-19 22:18:23.064109: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimeplacer.cc:935] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0
    15 b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
    16 2018-11-19 22:18:23.069134: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimeplacer.cc:935] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0
    17 [[22. 28.]
    18  [49. 64.]]

    验证成功。

    2、Ubuntu下安装GPU版TensorFlow

    准备

    1、Anaconda-Linux版本的——去清华tuna自行下载

    2、显卡驱动——去官网自行下载

    点我去百度云下载3、4需要的文件

    3、cuda9.0——去官网自行下载Linux版本的

    4、cuDNN7.x——去官网下载Linux版本的(需要注册并且join)

    安装

    1、Anaconda安装

    这里需要注意,直接把软件安装在自己的家目录下即可。

    2、安装显卡驱动

    官网下载驱动,然后使用sudo安装。

    安装的过程中,第一步需要你阅读安装协议。使用q退出。

    3、安装cuda9.0

    默认安装。

    安装的过程中,第一步需要你阅读安装协议。使用q退出。

    9.0有一个base安装包还有4个升级包。都是有序号的。

    使用sudo chmod +x *.run给这5个文件加上可执行权限

    然后一个个安装。

    然后将安装完后的路径加入PATH环境变量。

    1 export PATH=/usr/local/cuda-9.0/bin:/usr/local/cuda-9.0/lib64:$PATH
    2 export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH

    4、安装cuDNN

    解压出来两个文件夹一个是include 一个是lib64

    a.使用sudo将include里的cudnn.h文件复制到/usr/local/cuda-9.0/include  目录下

    b.使用sudo将lib64里面的libcudnn.so.7.3.1 libcudnn_static.a两个文件复制到/usr/local/cuda-9.0/lib64 目录下。

    c.做两个软连接,cd到/usr/local/cuda-9.0/lib64 目录下,执行:

    1 sudo ln -s libcudnn.so.7.3.1 libcudnn.so
    2 sudo ln -s libcudnn.so.7.3.1 libcudnn.so.7

    验证

    0、cuda验证

    #进入样本目录
    cd ~/home/NVIDIA_CUDA-9.0_Samples
    #编译样本
    make -j8
    #进入生成可执行文件的目录
    cd bin/x86_64/linux/release
    #执行设备测试程序
    ./deviceQuery
    #输出如下
    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "GeForce GTX 1070"
      CUDA Driver Version / Runtime Version          10.0 / 9.0
      CUDA Capability Major/Minor version number:    6.1
      Total amount of global memory:                 8116 MBytes (8510701568 bytes)
      (15) Multiprocessors, (128) CUDA Cores/MP:     1920 CUDA Cores
      GPU Max Clock rate:                            1683 MHz (1.68 GHz)
      Memory Clock rate:                             4004 Mhz
      Memory Bus Width:                              256-bit
      L2 Cache Size:                                 2097152 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.0, NumDevs = 1
    Result = PASS
    #看到PASS后执行带宽测试
    ./bandwidthTest 
    #输出如下:
    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: GeForce GTX 1070
     Quick Mode
    
     Host to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432            12758.2
    
     Device to Host Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432            12867.2
    
     Device to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432            191582.5
    
    Result = PASS
    
    NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
    #看到PASS表示测试通过,如果FAIL,重启然后重新执行即可。

    1、创建anaconda环境(和Windows一样)

    1 conda create -n tf-gpu-py3.5 python=3.5
    2 #
    3 # To activate this environment, use
    4 #
    5 #     $ conda activate tf-gpu-py3.5
    6 #
    7 # To deactivate an active environment, use
    8 #
    9 #     $ conda deactivate

    2、激活tf-gpu-py3.5

    conda activate ty-py-3.5-cpu

    3、安装tensorflow-gpu

    conda install tensorflow-gpu

    4、代码验证

    (tf-gpu-py3.5) tf@lolita-ThinkStation-P318:~/anaconda3/envs$ python
    Python 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56) 
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>> a = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],name='a')
    >>> b = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[3,2],name='b')
    >>> c = tf.matmul(a,b)
    >>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    2018-11-19 22:43:27.732910: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
    2018-11-19 22:43:27.824810: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2018-11-19 22:43:27.825419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
    name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
    pciBusID: 0000:01:00.0
    totalMemory: 7.93GiB freeMemory: 7.64GiB
    2018-11-19 22:43:27.825445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
    2018-11-19 22:43:27.995777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-11-19 22:43:27.995806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
    2018-11-19 22:43:27.995826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
    2018-11-19 22:43:27.996035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7377 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
    Device mapping:
    /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1
    2018-11-19 22:43:28.026839: I tensorflow/core/common_runtime/direct_session.cc:288] Device mapping:
    /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1
    
    >>> print(sess.run(c))
    MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
    2018-11-19 22:44:23.662448: I tensorflow/core/common_runtime/placer.cc:935] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
    a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
    2018-11-19 22:44:23.662561: I tensorflow/core/common_runtime/placer.cc:935] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
    b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
    2018-11-19 22:44:23.662589: I tensorflow/core/common_runtime/placer.cc:935] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
    [[22. 28.]
     [49. 64.]]

    验证完毕。

  • 相关阅读:
    OpenCL、CUDA
    最小和最廉价的超级计算机,DIY的
    组装属于您自己的Tesla个人超级计算机
    多处理器系统
    开源项目Spark简介
    基于Cassandra的日志和分布式小文件存储系统【1】
    网络广告js备忘【2】
    网络广告js备忘【1】
    成功产品的意外
    Cassandra HBase和MongoDb性能比较
  • 原文地址:https://www.cnblogs.com/dhu121/p/10006905.html
Copyright © 2011-2022 走看看