zoukankan      html  css  js  c++  java
  • 在Ubuntu上配置Caffe并行计算环境

    1.实验配置:

    型号:中科曙光I450-G10双路塔式服务器

    CPU:Intel Xeon E5-2620 v2 @2.1GHz x24

    RAM:128GB

    DISK:2TB

    GPU0:NVIDIA Tesla K20C - 用于并行计算

    GPU1:NVIDIA Quadro K620 - 用于图形显示

    OS:Ubuntu 14.04 LTS 64bit Desktop


    2.安装各种开发包

    $ sudo apt-get update && sudo apt-get upgrade

    $ sudo apt-get install build-essential


    3.安装NVIDIA驱动

    1.)关闭lightdm

    进入Ubuntu,按Ctrl+Alt+F1进入tty,登陆tty后输入如下命令

    $ sudo service lightdm stop

    该命令可以关闭lightdm。

    2.)安装驱动

    输入下列命令添加驱动源:

    $ sudo add-apt-repository ppa:xorg-edgers/ppa

    $ sudo apt-get update

    安装340版本驱动:

    $ sudo apt-get install nvidia-340

    安装完成后,继续安装下列包:

    $ sudo apt-get install nvidia-340-uvm

    安装完成后,重启系统。


    4.安装CUDA

    1.)下载CUDA

    输入以下命令解压:

    $ ./cuda6.5.run --extract=/home/username/Documents/

    解压出来3个文件:

    CUDA安装包: cuda-linux64-rel-6.5.14-18749181.run

    NVIDIA驱动: NVIDIA-Linux-x86_64-340.29.run(也可以用这个安装显卡驱动)

    SAMPLE包: cuda-samples-linux-6.5.14-18745345.run

    给各个包增加权限:

    $ sudo chmod +x *.run

    2.)安装CUDA

    通过以下命令安装CUDA,安装英文说明一步一步安装至完成。

    $ sudo ./cuda-linux64-rel-6.5.14-18749181.run

    3.)添加环境变量

    安装后在/etc/profile中添加环境变量:

    # vim /etc/profile

    在最后一行添加:

    PATH=/usr/local/cuda-6.5/bin:$PATH

    export PATH

    :wq!保存后,执行下列命令,使得环境变量立即生效:

    # source /etc/profile

    4.)添加lib库路径

    在/etc/ld.so.conf.d/加入cuda.conf文件:

    # cd /etc/ld.so.conf.d/

    # vim cuda.conf

    内容如下:

    /usr/local/cuda-6.5/lib64

    :wq!保存后,执行下列命令使之立刻生效:

    # ldconfig


    5.安装CUDA SAMPLE

    1.)安装依赖包

    $ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa-dev

    2.)安装SAMPLE

    $ sudo ./cuda-sample-linux-6.5.14-18745345.run

    3.)编译SAMPLE

    $ sudo /usr/local/cuda-6.5/samples

    $ sudo make

    4.)检验安装

    全部编译完成后,运行deviceQuery

    $ cd samples/bin/x86_64/linux/release

    $ sudo ./deviceQuery

    如果出现以下显卡信息,则驱动和显卡安装成功。

    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 2 CUDA Capable device(s)
    
    Device 0: "Tesla K20c"
      CUDA Driver Version / Runtime Version          6.5 / 6.5
      CUDA Capability Major/Minor version number:    3.5
      Total amount of global memory:                 4800 MBytes (5032706048 bytes)
      (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
      GPU Clock rate:                                706 MHz (0.71 GHz)
      Memory Clock rate:                             2600 Mhz
      Memory Bus Width:                              320-bit
      L2 Cache Size:                                 1310720 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Bus ID / PCI location ID:           3 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 1: "Quadro K620"
      CUDA Driver Version / Runtime Version          6.5 / 6.5
      CUDA Capability Major/Minor version number:    5.0
      Total amount of global memory:                 2047 MBytes (2146762752 bytes)
      ( 3) Multiprocessors, (128) CUDA Cores/MP:     384 CUDA Cores
      GPU Clock rate:                                1124 MHz (1.12 GHz)
      Memory Clock rate:                             900 Mhz
      Memory Bus Width:                              128-bit
      L2 Cache Size:                                 2097152 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Bus ID / PCI location ID:           130 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    > Peer access from Tesla K20c (GPU0) -> Quadro K620 (GPU1) : No
    > Peer access from Quadro K620 (GPU1) -> Tesla K20c (GPU0) : No
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = Tesla K20c, Device1 = Quadro K620
    Result = PASS


    6.安装Intel Parallel Studio XE

    1.)下载软件

    进入https://software.intel.com/en-us/intel-parallel-studio-xe网址,

    注册Intel® Parallel Studio XE Cluster Edition for Linux*

    然后Intel会给邮箱发一封邮件,里面有下载地址和product serial number。

    我使用的是Intel Parallel Studio 2016。大概3664MB。

    2.)安装软件

    解压parallel_studio_xe_2016.tgz软件

    进入文件夹,运行安装程序:

    $ cd parallel_studio_xe_2016.tgz

    $ ./install_GUI.sh

    然后会出现图形安装界面,一步一步点击next安装完成。

    3.)添加lib库路径

    $ sudo vim /etc/ld.so.conf.d/intel_mkl.conf

    内容如下:

    /opt/intel/lib

    /opt/intel/mkl/lib/intel64

    :wq!保存后,执行下列命令使之立刻生效:

    $ sudo ldconfig


    7.安装OpenCV

    1.)安装依赖库

    $ sudo apt-get install gcc cmake git build-essential libgtk2.0-devpkg-config

    $ sudo apt-get install libavcodec-dev libavformat-dev libjpeg62-dev libtiff4-dev libswscale-dev

    $ sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libdc1394

    $ sudo apt-get install libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev

    2.)编译安装OpenCV

    [完全参考此文4-6点:http://blog.csdn.net/ws_20100/article/details/46493293 ]

    Fedora设置和Ubuntu无异。


    8.安装其他的依赖库

    $ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev

    $ sudo apt-get install libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler

    $ sudo apt-get install python-dev python-pip


    9.安装MATLAB

    [完全参考此文:http://blog.csdn.net/ws_20100/article/details/48859951 ]


    10.编译Caffe

    1.)解压Caffe文件

    $ unzip caffe-master.zip /home/username/

    2.)编译Caffe

    进入Caffe根目录,并复制一份Makefile

    $ cd /home/username/caffe-master

    $ cp Makefile.config.example Makefile.config

    修改里面的内容:

    ## Refer to http://caffe.berkeleyvision.org/installation.html
    # Contributions simplifying and improving our build system are welcome!
    
    # cuDNN acceleration switch (uncomment to build with cuDNN).
    # USE_CUDNN := 1
    
    # CPU-only switch (uncomment to build without GPU support).
    # CPU_ONLY := 1
    
    # uncomment to disable IO dependencies and corresponding data layers
    # USE_LEVELDB := 0
    # USE_LMDB := 0
    # USE_OPENCV := 0
    
    # To customize your choice of compiler, uncomment and set the following.
    # N.B. the default for Linux is g++ and the default for OSX is clang++
    # CUSTOM_CXX := g++
    
    # CUDA directory contains bin/ and lib/ directories that we need.
    CUDA_DIR := /usr/local/cuda
    # On Ubuntu 14.04, if cuda tools are installed via
    # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
    # CUDA_DIR := /usr
    
    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 lines for compatibility.
    CUDA_ARCH := -gencode arch=compute_20,code=sm_20 
    		-gencode arch=compute_20,code=sm_21 
    		-gencode arch=compute_30,code=sm_30 
    		-gencode arch=compute_35,code=sm_35 
    		-gencode arch=compute_50,code=sm_50 
    		-gencode arch=compute_50,code=compute_50
    
    # BLAS choice:
    # atlas for ATLAS (default)
    # mkl for MKL
    # open for OpenBlas
    BLAS := mkl
    # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
    # Leave commented to accept the defaults for your choice of BLAS
    # (which should work)!
    # BLAS_INCLUDE := /path/to/your/blas
    # BLAS_LIB := /path/to/your/blas
    
    # Homebrew puts openblas in a directory that is not on the standard search path
    # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
    # BLAS_LIB := $(shell brew --prefix openblas)/lib
    
    # This is required only if you will compile the matlab interface.
    # MATLAB directory should contain the mex binary in /bin.
    MATLAB_DIR := /usr/local/MATLAB/R2014a
    # MATLAB_DIR := /Applications/MATLAB_R2012b.app
    
    # NOTE: this is required only if you will compile the python interface.
    # We need to be able to find Python.h and numpy/arrayobject.h.
    PYTHON_INCLUDE := /usr/include/python2.7 
    		/usr/lib/python2.7/dist-packages/numpy/core/include
    # Anaconda Python distribution is quite popular. Include path:
    # Verify anaconda location, sometimes it's in root.
    # ANACONDA_HOME := $(HOME)/anaconda
    # PYTHON_INCLUDE := $(ANACONDA_HOME)/include 
    		# $(ANACONDA_HOME)/include/python2.7 
    		# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include 
    
    # We need to be able to find libpythonX.X.so or .dylib.
    PYTHON_LIB := /usr/lib
    # PYTHON_LIB := $(ANACONDA_HOME)/lib
    
    # Homebrew installs numpy in a non standard path (keg only)
    # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
    # PYTHON_LIB += $(shell brew --prefix numpy)/lib
    
    # Uncomment to support layers written in Python (will link against Python libs)
    # WITH_PYTHON_LAYER := 1
    
    # Whatever else you find you need goes here.
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
    
    # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
    # INCLUDE_DIRS += $(shell brew --prefix)/include
    # LIBRARY_DIRS += $(shell brew --prefix)/lib
    
    # Uncomment to use `pkg-config` to specify OpenCV library paths.
    # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
    USE_PKG_CONFIG := 1
    
    BUILD_DIR := build
    DISTRIBUTE_DIR := distribute
    
    # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
    DEBUG := 1
    
    # The ID of the GPU that 'make runtest' will use to run unit tests.
    TEST_GPUID := 0
    
    # enable pretty build (comment to see full commands)
    Q ?= @

    开始编译:

    $ make all -j24

    编译好了,可以再编译test和runtest

    $ make test

    $ make runtest

    3.)编译Matlab wrapper

    $ make matcaffe

    4.)编译Python wrapper

    $ make pycaffe


    Enjoy~ Written By Timely~

    如果有问题,可以与我交流~

  • 相关阅读:
    DDD 领域驱动设计-谈谈 Repository、IUnitOfWork 和 IDbContext 的实践
    UVA10071 Back to High School Physics
    UVA10071 Back to High School Physics
    UVA10055 Hashmat the Brave Warrior
    UVA10055 Hashmat the Brave Warrior
    UVA458 The Decoder
    UVA458 The Decoder
    HDU2054 A == B ?
    HDU2054 A == B ?
    POJ3414 Pots
  • 原文地址:https://www.cnblogs.com/lixuebin/p/10814877.html
Copyright © 2011-2022 走看看