zoukankan      html  css  js  c++  java
  • y450 archlinux cuda6.5

    y450 archlinux cuda6.5

    January 28, 2018 4:11 PM

    archlinux是最新更新版本,gcc版本到了7.几,太新了。

    [qiangge@lqspc ~]$ gcc --version
    gcc (GCC) 7.2.1 20180116
    Copyright © 2017 Free Software Foundation, Inc.
    本程序是自由软件;请参看源代码的版权声明。本软件没有任何担保;
    包括没有适销性和某一专用目的下的适用性担保。
    

    这系统对中文翻译的不太习惯哈。

    总体步骤

    1. 确认安装的archlinux比较新,不想降级gcc等。
    2. 确认y450的笔记本显卡型号,g 110M。
    3. 确定可以安装的cuda版本。这个地方走过弯路,开始直接pa cuda,结果就给我装了个9.1的版本。反复测试发现安装失败。经过查询显卡型号(上一步)支持的计算能力(compute capability?希望没拼错)只是支持1.2以下,后来安装完发现是1.1.而1.2以下的最多安装cuda-6.5以前的版本。
    4. yaourt cuda找到相关版本安装(上一步),安装过程中遇到/tmp不够用,新建个目录挂载到/tmp,冲掉了内存挂载的/tmp,这样可以充分利用硬盘空间来操作。之所以不够用因为内存只有8G,这样默认/tmp就只有4G,废话了。
    5. 安装完后测试/opt/cuda/samples的devicequery例子,最好拷贝到自己的/home目录吧。
    6. 开始不能编译任何例子,有两个错误。主要参考cuda社区解决。
    (1)Here is a patch to /usr/include/bits/floatn.h for avoiding __FLOAT128 only when compiling via NVCC
    (2)Here is how to use other GCC compiing via NVCC
    
    1. 第一个错误是floatn.h错误。参考论坛解决,本质上是判断条件里面添加一个条件,就是不编译cuda代码的意思。
    2. 第二个错误是默认的gcc版本太新了,cuda65不支持,那就采用5试试看(参考下一步方法),发现这只能编译devicequery。于是经过google,知道必须4.7左右。本机yaourt编译4.7失败,当然依然要/tmp,编译个编译器真的很容易失败,浪费了好几天的电费哈。上海电费蛮贵的,尤其是租房,呜呜。那么总有解决办法吧,参考资料在archlinux的yaourt源里面。作者提到了要动态库加上软连接,
    sudo ln -s /usr/lib/libisl.so /usr/lib/libisl.so.10 && sudo ldconfig
    

    不然会失败,当然作为折腾专家,我必须先不加看看效果,果然不行

    /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/cc1plus: error while loading shared libraries: libisl.so.10: cannot open shared object file: No such file or directory
    make: *** [Makefile:196:bandwidthTest.o] 错误 1
    

    加上还提示另外一个错误,这个是作者没考虑的吧,哈哈

    /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/cc1plus: error while loading shared libraries: libmpfr.so.4: cannot open shared object file: No such file or directory
    

    解决办法是相同的思路,相似的代码,读者自行思考哈。
    9. 解决gcc问题的方法有两个,本质是一个事情,请看参考1参考2。最后的效果

    [qiangge@lqspc ~]$ ll /opt/cuda/
    bin/                          jre/                          libnvvp/                      samples/
    doc/                          lib/                          NVIDIA_SLA_cuDNN_Support.txt  share/
    extras/                       lib64/                        nvvm/                         src/
    include/                      libnsight/                    open64/                       tools/
    [qiangge@lqspc ~]$ ll /opt/cuda/bin/gcc/
    总用量 8.0K
    drwxr-xr-x 2 root 4.0K 1月  28 22:52 .
    lrwxrwxrwx 1 root   16 1月  28 22:52 gcc -> /usr/bin/gcc-4.7
    lrwxrwxrwx 1 root   16 1月  28 22:52 cpp -> /usr/bin/cpp-4.7
    lrwxrwxrwx 1 root   16 1月  28 22:52 g++ -> /usr/bin/g++-4.7
    drwxr-xr-x 4 root 4.0K 1月  22 09:45 ..
    [qiangge@lqspc ~]$ 
    
    [qiangge@lqspc 1_Utilities]$ cd bandwidthTest/
    [qiangge@lqspc bandwidthTest]$ nvidia-smi
    Mon Jan 29 00:01:22 2018       
    +------------------------------------------------------+                       
    | NVIDIA-SMI 340.106    Driver Version: 340.106        |                       
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce G 110M      Off  | 0000:01:00.0     N/A |                  N/A |
    | N/A   52C   P12    N/A /  N/A |     50MiB /   255MiB |     N/A      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Compute processes:                                               GPU Memory |
    |  GPU       PID  Process name                                     Usage      |
    |=============================================================================|
    |    0            Not Supported                                               |
    +-----------------------------------------------------------------------------+
    [qiangge@lqspc bandwidthTest]$ 
    
    [qiangge@lqspc bandwidthTest]$ ./bandwidthTest 
    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: GeForce G 110M
     Quick Mode
    
     Host to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			2551.5
    
     Device to Host Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			1675.0
    
     Device to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			6319.8
    
    Result = PASS
    [qiangge@lqspc bandwidthTest]$ 
    
    [qiangge@lqspc 1_Utilities]$ cd deviceQuery
    [qiangge@lqspc deviceQuery]$ ls
    deviceQuery  deviceQuery.cpp  deviceQuery.o  Makefile  NsightEclipse.xml  readme.txt
    [qiangge@lqspc deviceQuery]$ ./deviceQuery 
    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "GeForce G 110M"
      CUDA Driver Version / Runtime Version          6.5 / 6.5
      CUDA Capability Major/Minor version number:    1.1
      Total amount of global memory:                 256 MBytes (268107776 bytes)
      ( 2) Multiprocessors, (  8) CUDA Cores/MP:     16 CUDA Cores
      GPU Clock rate:                                1000 MHz (1.00 GHz)
      Memory Clock rate:                             700 Mhz
      Memory Bus Width:                              64-bit
      Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
      Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       16384 bytes
      Total number of registers available per block: 8192
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  768
      Maximum number of threads per block:           512
      Max dimension size of a thread block (x,y,z): (512, 512, 64)
      Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             256 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      No
      Device PCI Bus ID / PCI location ID:           1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce G 110M
    Result = PASS
    [qiangge@lqspc deviceQuery]$ 
    

    配置虽然低,学习可能够用吧,不行就去买个新点的台式二手显卡?二手是不是抠门了呢?的确是,但是其实自己不用买,公司有1080TI显卡,可以加班学习用就行了。这里只是想自己安装一次,并且可以简单用来学习、练习和测试。同时帮朋友解决了y550上cuda65,那个显卡是g 240m的样子,最多也是1.2的计算能力。但是他用的Ubuntu。臃肿的Ubuntu还不是我的菜。之后又发现自己硬盘快满了,原来是需要pacman -Sc一下了。回头考虑配置一下自动清除不安装的包吧。

  • 相关阅读:
    idea 导入maven项目各种红
    基于 Spring Security 的开源统一角色访问控制系统 URACS
    MySQL读写分离又一好办法 使用 com.mysql.jdbc.ReplicationDriver
    [转]CVS SVN VSS 使用对比
    推荐一个免费开源的虚拟机VBox(VirtualBox)
    JavaScript的对象属性的反射
    [转]需求分析的目的
    尝鲜安装iOS6及新特性
    EXP00003: 未找到段 (4,571) 的存储定义
    QQ邮箱里可以定阅博客园的文章了
  • 原文地址:https://www.cnblogs.com/liq07lzucn/p/8371375.html
Copyright © 2011-2022 走看看