zoukankan      html  css  js  c++  java
  • HPC cuda install

    It happens to use the very latest production release on NVIDIA cuda toolkit website.

    part 1 install CUDA driver  

     refLink http://superuser.com/questions/484991/nvidia-graphics-driver-in-ubuntu-12-04

    1. Blacklist

    Especially, blacklist nouveau works for me to remove bothering i2c warnings.

    2. stop lightdm

    ubuntu has switched from gdm to lightdm, so you have to stop this X session to install the driver.

    two ways(both use super-root):  service lightdm stop or stop lightdm

    3. normal setup

    I choose all the default setup directories

    part 2 Make all for samples

     Before using 5.0.35, which is the latest one, I firstly installed 5.0.24, which is the release candidate. The compilation experience for all samples is good, but ./deviceQuery failed to show any output. Also, with 5.0.25, you do not need to do sudo ldconfig /usr/local/cuda/lib, we all talk about it later.

    Compilation under 5.0.35

    1. mpi install 

     When compiling simplempi, the *** mpi not found error popped out, so you have to install some mpi package.

     Solution, ref linkhttp://cs.ucsb.edu/~hnielsen/cs140/openmpi-install.html
     

    Debian and Ubuntu

    These instructions will almost definitely work on Debian lenny, squeeze, and sid, as well as Ubuntu hardy, intrepid, jaunty, karmic, or lucid.
    Make sure your package repository is up to date. apt-get update will do this. You must run this command as root - you may have to su, or more likely run it with sudo (it'll look like sudo apt-get update).
    Be sure you've installed GCC! apt-get install gcc g++ will install the compilers if you don't have them already.
    Then, run apt-get install openmpi-bin openmpi-doc libopenmpi-dev, wrapping the command in sudo if necessary. This will install OpenMPI, all necessary libraries, and the documentation for the MPI calls.

     2. libcublas.so: error: undefined reference to 'dlsym'

       ref link: http://forum.luahub.com/index.php?topic=2390.0

       This error happens around when compiling /samples/6_Advanced/cdpLUDecomposition, I choose to trace within makefile, and apply -ldl compiling option, as depicted in the ref post. It did get around the compilation error.

    3. ./deviceQuery: error while loading shared libraries: libcudart.so.5.0: cannot open shared object file: No such file or directory

      When exe deviceQuery file, the error popped out.

      Solution, ref link http://stackoverflow.com/questions/10808958/libcudart-so-4-cannot-find-ubuntu-10-04
      The post is pretty good reference. It said, LD_LIBRARY_PATH would mess up between diff programs. The following is the full explanation:

    LD_LIBRARY_PATH is strongly deprecated. It may mess up other programs, and others may reset it. It should only be used to temporarily override the permanent paths for testing purposes (don't take my word, google it).

    Instead, add a line with your cuda lib directory on it to /etc/ld.so.conf, after any existing lines.

    For example, if you installed on /usr/local/cuda, you will need to add

    32-bit : /usr/local/cuda/lib

    64-bit : /usr/local/cuda/lib64

    Save, and run ldconfig. This should permanently fix the problem.

    The symbolic links are probably already set up by the installation. If not, then add them as Alex advised.

    Note - I received errors referencing /lib, but I needed to add lib64 to fix them. 

    Final result: 

    root@rui:/usr/local/cuda-5.0/samples/bin/linux/release# ./deviceQuery

    ./deviceQuery Starting...
     CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: "NVS 5400M"
      CUDA Driver Version / Runtime Version          5.0 / 5.0
      CUDA Capability Major/Minor version number:    2.1
      Total amount of global memory:                 1024 MBytes (1073414144 bytes)
      ( 2) Multiprocessors x ( 48) CUDA Cores/MP:    96 CUDA Cores
      GPU Clock rate:                                950 MHz (0.95 GHz)
      Memory Clock rate:                             900 Mhz
      Memory Bus Width:                              128-bit
      L2 Cache Size:                                 131072 bytes
      Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
      Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 32768
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  1536
      Maximum number of threads per block:           1024
      Maximum sizes of each dimension of a block:    1024 x 1024 x 64
      Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      No
      Device PCI Bus ID / PCI location ID:           1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = NVS 5400M
  • 相关阅读:
    高手详解:sscanf函数的高级用法
    堆排序——BuildHeap和Heapify函数的实现
    递归与动态规划求解最长公共子序列
    分享:crpcut 1.8.4 发布,C++ 的单元测试框架
    团队展示 京拍档 电商运营服务、电子商务服务外包 首家京东代运营电子商务服务平台
    Linux中link,unlink,close,fclose详解
    常用排序算法的c++实现(冒泡,选择,插入,堆,shell,快速,归并 )与sort()对比 coder_xia的专栏 博客频道 CSDN.NET
    CAJ文件转PDF文件方法
    递归与动态规划求解最长公共子序列
    NLP Job 专注自然语言处理&机器学习等领域的求职招聘 | 关注自然语言处理|机器学习|数据挖掘|搜索引擎|计算广告|推荐算法等相关领域的工作机会
  • 原文地址:https://www.cnblogs.com/chenrui/p/2725339.html
Copyright © 2011-2022 走看看