HPC cuda install - 走看看

zoukankan html css js c++ java

HPC cuda install

It happens to use the very latest production release on NVIDIA cuda toolkit website.
part 1 install CUDA driver
refLink http://superuser.com/questions/484991/nvidia-graphics-driver-in-ubuntu-12-04
1. Blacklist
Especially, blacklist nouveau works for me to remove bothering i2c warnings.
2. stop lightdm
ubuntu has switched from gdm to lightdm, so you have to stop this X session to install the driver.
two ways(both use super-root): service lightdm stop or stop lightdm
3. normal setup
I choose all the default setup directories
part 2 Make all for samples
Before using 5.0.35, which is the latest one, I firstly installed 5.0.24, which is the release candidate. The compilation experience for all samples is good, but ./deviceQuery failed to show any output. Also, with 5.0.25, you do not need to do sudo ldconfig /usr/local/cuda/lib, we all talk about it later.
Compilation under 5.0.35
1. mpi install
When compiling simplempi, the *** mpi not found error popped out, so you have to install some mpi package.
Solution, ref linkhttp://cs.ucsb.edu/~hnielsen/cs140/openmpi-install.html

Debian and Ubuntu
These instructions will almost definitely work on Debian lenny, squeeze, and sid, as well as Ubuntu hardy, intrepid, jaunty, karmic, or lucid.
Make sure your package repository is up to date. apt-get update will do this. You must run this command as root - you may have to su, or more likely run it with sudo (it'll look like sudo apt-get update).
Be sure you've installed GCC! apt-get install gcc g++ will install the compilers if you don't have them already.
Then, run apt-get install openmpi-bin openmpi-doc libopenmpi-dev, wrapping the command in sudo if necessary. This will install OpenMPI, all necessary libraries, and the documentation for the MPI calls.
2. libcublas.so: error: undefined reference to 'dlsym'
ref link: http://forum.luahub.com/index.php?topic=2390.0
This error happens around when compiling /samples/6_Advanced/cdpLUDecomposition, I choose to trace within makefile, and apply -ldl compiling option, as depicted in the ref post. It did get around the compilation error.
3. ./deviceQuery: error while loading shared libraries: libcudart.so.5.0: cannot open shared object file: No such file or directory
When exe deviceQuery file, the error popped out.
Solution, ref link http://stackoverflow.com/questions/10808958/libcudart-so-4-cannot-find-ubuntu-10-04
The post is pretty good reference. It said, LD_LIBRARY_PATH would mess up between diff programs. The following is the full explanation:
LD_LIBRARY_PATH is strongly deprecated. It may mess up other programs, and others may reset it. It should only be used to temporarily override the permanent paths for testing purposes (don't take my word, google it).
Instead, add a line with your cuda lib directory on it to /etc/ld.so.conf, after any existing lines.
For example, if you installed on /usr/local/cuda, you will need to add
32-bit : /usr/local/cuda/lib
64-bit : /usr/local/cuda/lib64
Save, and run ldconfig. This should permanently fix the problem.
The symbolic links are probably already set up by the installation. If not, then add them as Alex advised.
Note - I received errors referencing /lib, but I needed to add lib64 to fix them.
Final result:
root@rui:/usr/local/cuda-5.0/samples/bin/linux/release# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVS 5400M"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073414144 bytes)
( 2) Multiprocessors x ( 48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock rate: 950 MHz (0.95 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = NVS 5400M

查看全文

相关阅读:
HDU 2895 编辑距离
 AC自动机
 HDU 1707 简单模拟 Spring-outing Decision
HDU 1710 二叉树的遍历 Binary Tree Traversals
Codeforces Round #521 (Div. 3) E. Thematic Contests
Codeforces Round #521 (Div. 3) D. Cutting Out
Codeforces Round #515 (Div. 3) E. Binary Numbers AND Sum
Codeforces Round #510 (Div. 2) B. Vitamins
Codeforces Round #510 (Div. 2) D. Petya and Array（树状数组）
Codeforces Round #506 (Div. 3) 题解

原文地址：https://www.cnblogs.com/chenrui/p/2725339.html