zoukankan      html  css  js  c++  java
  • GPU的deviceQuery 和 Nvidia-smi的使用

    1.deviceQuery 非常重要,对于编程中遇到的blockgrid设置,memory hierarchy 的使用 具有指导意义。

    deviceQuery 实际上是一个sample,需要编译后才能使用。 在 /opt/cuda/cuda70/NVIDIA_CUDA-7.0_Samples 或者loca的cuda 文件夹(这个不确定)。

    因为是只读文件,需要copy 到 home 文件目录下面,由于会使用 NVIDIA_CUDA-7.0_Samples/common 文件夹中的文件,直接copy NVIDIA_CUDA-7.0_Samples。

    make 运行,就得到了deviceQuery 可运行文件。

    建议对于任何一个GPU编程,第一个工作就是编译 deviceQuery。

    有一个结果不明白,compute mode (我在nvidia-smi的说明书找到了说明):

    Compute mode 的意思是是否允许多个程序 同时使用GPU。

    Compute Mode The compute mode flag indicates whether individual or multiple compute applications may run on the GPU.

      "Default" means multiple contexts are allowed per device.

      "Exclusive Thread" means only one context is allowed per device, usable from one thread at a time.

      "Exclusive Process" means only one context is allowed per device, usable from multiple threads at a time. "

      “Prohibited" means no contexts are allowed per device (no compute apps).

      "EXCLUSIVE_PROCESS" was added in CUDA 4.0. Prior CUDA releases supported only one exclusive mode, which is equivalent to "EXCLUSIVE_THREAD" in CUDA 4.0 and beyond.

    For all CUDA-capable products

    2. Nvidia-smi 有 nvidia-smi 说明书,http://developer.download.nvidia.com/compute/cuda/6_0/rel/gdk/nvidia-smi.331.38.pdf

    Nvidia-smi:NVIDIA System Management Interface. 命令行, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

    GPU configuration options (such as ECC memory capability) may be enabled and disabled.

    Nvidia-smi命令是在install drive,因此有。

    nvidia-smi -i 0 -q 可以显示所有的信息。 (-i, 表示 gpu的编号)

    nvidia-smi -h 帮助命令

    3. 使用 cudaGetDeviceProperties()

    deviceQuery实际是调用 cudaGetDeviceProperties(),逐条答应各种信息。

    例如:程序+结果

    void PrintDeviceProperties(cudaDeviceProp devProp)
    {
    FILE *deviceProperties = fopen("DeviceProperties.txt", "a+");
    fprintf(deviceProperties, "Major revision number: %d
    ", devProp.major);
    fprintf(deviceProperties, "Minor revision number: %d
    ", devProp.minor);
    fprintf(deviceProperties, "Name: %s
    ", devProp.name);
    fprintf(deviceProperties, "Total global memory: %u
    ", devProp.totalGlobalMem);
    fprintf(deviceProperties, "Total shared memory per block: %u
    ", devProp.sharedMemPerBlock);
    fprintf(deviceProperties, "Total registers per block: %d
    ", devProp.regsPerBlock);
    fprintf(deviceProperties, "Warp size: %d
    ", devProp.warpSize);
    fprintf(deviceProperties, "Maximum memory pitch: %u
    ", devProp.memPitch);
    fprintf(deviceProperties, "Maximum threads per block: %d
    ", devProp.maxThreadsPerBlock);
    for (int i = 0; i < 3; ++i)
    fprintf(deviceProperties, "Maximum dimension %d of block: %d
    ", i, devProp.maxThreadsDim[i]);
    for (int i = 0; i < 3; ++i)
    fprintf(deviceProperties, "Maximum dimension %d of grid: %d
    ", i, devProp.maxGridSize[i]);
    fprintf(deviceProperties, "Clock rate: %d
    ", devProp.clockRate);
    fprintf(deviceProperties, "Total constant memory: %u
    ", devProp.totalConstMem);
    fprintf(deviceProperties, "Texture alignment: %u
    ", devProp.textureAlignment);
    fprintf(deviceProperties, "Concurrent copy and execution: %s
    ", (devProp.deviceOverlap ? "Yes" : "No"));
    fprintf(deviceProperties, "Number of multiprocessors: %d
    ", devProp.multiProcessorCount);
    fprintf(deviceProperties, "Kernel execution timeout: %s
    ",
    devProp.kernelExecTimeoutEnabled ? "Yes" : "No"));      
    fclose(deviceProperties);
    }
    And the result is as follows:Major revision number: 2 Minor revision number: 0 Name: Tesla C2075 Total global memory: 1341849600 Total shared memory per block: 49152 Total registers per block: 32768 Warp size: 32 Maximum memory pitch: 2147483647 Maximum threads per block: 1024 Maximum dimension 0 of block: 1024 Maximum dimension 1 of block: 1024 Maximum dimension 2 of block: 64 Maximum dimension 0 of grid: 65535 Maximum dimension 1 of grid: 65535 Maximum dimension 2 of grid: 65535 Clock rate: 1147000 Total constant memory: 65536 Texture alignment: 512 Concurrent copy and execution: Yes Number of multiprocessors: 14 Kernel execution timeout: No

      

     
    
    高山仰止,景行行止。虽不能至,然心向往之。
  • 相关阅读:
    06 | x86架构:有了开放的架构,才能打造开放的营商环境
    02 | 学习路径:爬过这六个陡坡,你就能对Linux了如指掌
    01 | 入学测验:你究竟对Linux操作系统了解多少?
    String、StringBuffer与StringBuilder区别
    JavaSE语言基础之字符串
    JavaSE语言基础之数组及其排序
    JavaSE语言基础之流程控制语句
    JavaSE语言基础之数据类型
    Java开发环境配置
    shell 脚本 自增
  • 原文地址:https://www.cnblogs.com/xingzifei/p/5183644.html
Copyright © 2011-2022 走看看