zoukankan      html  css  js  c++  java
  • CUDA ---- device管理

    device管理

    NVIDIA提供了集中凡是来查询和管理GPU device,掌握GPU信息查询很重要,因为这可以帮助你设置kernel的执行配置。

    本博文将主要介绍下面两方面内容:

    • CUDA runtime API function
    • NVIDIA系统管理命令行

    使用runtime API来查询GPU信息

    你可以使用下面的function来查询所有关于GPU device 的信息:

    cudaError_t cudaGetDeviceProperties(cudaDeviceProp *prop, int device);

    GPU的信息放在cudaDeviceProp这个结构体中。

    代码

    #include <cuda_runtime.h>
    #include <stdio.h>
    int main(int argc, char **argv) {

      printf(
    "%s Starting... ", argv[0]); int deviceCount = 0; cudaError_t error_id = cudaGetDeviceCount(&deviceCount); if (error_id != cudaSuccess) { printf("cudaGetDeviceCount returned %d -> %s ", (int)error_id, cudaGetErrorString(error_id)); printf("Result = FAIL "); exit(EXIT_FAILURE); } if (deviceCount == 0) { printf("There are no available device(s) that support CUDA "); } else { printf("Detected %d CUDA Capable device(s) ", deviceCount); }
    int dev, driverVersion = 0, runtimeVersion = 0; dev =0; cudaSetDevice(dev); cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, dev); printf("Device %d: "%s" ", dev, deviceProp.name); cudaDriverGetVersion(&driverVersion); cudaRuntimeGetVersion(&runtimeVersion); printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d ",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10); printf(" CUDA Capability Major/Minor version number: %d.%d ",deviceProp.major, deviceProp.minor); printf(" Total amount of global memory: %.2f MBytes (%llu bytes) ",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem); printf(" GPU Clock rate: %.0f MHz (%0.2f GHz) ",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f); printf(" Memory Clock rate: %.0f Mhz ",deviceProp.memoryClockRate * 1e-3f); printf(" Memory Bus Width: %d-bit ",deviceProp.memoryBusWidth); if (deviceProp.l2CacheSize) { printf(" L2 Cache Size: %d bytes ", deviceProp.l2CacheSize); }
    printf(
    " Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d) ", deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1], deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
    printf(
    " Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d ", deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
    printf(
    " Total amount of constant memory: %lu bytes ",deviceProp.totalConstMem); printf(" Total amount of shared memory per block: %lu bytes ",deviceProp.sharedMemPerBlock); printf(" Total number of registers available per block: %d ",deviceProp.regsPerBlock); printf(" Warp size: %d ", deviceProp.warpSize); printf(" Maximum number of threads per multiprocessor: %d ",deviceProp.maxThreadsPerMultiProcessor); printf(" Maximum number of threads per block: %d ",deviceProp.maxThreadsPerBlock);
    printf(
    " Maximum sizes of each dimension of a block: %d x %d x %d ", deviceProp.maxThreadsDim[0], deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]);
    printf(
    " Maximum sizes of each dimension of a grid: %d x %d x %d ", deviceProp.maxGridSize[0], deviceProp.maxGridSize[1], deviceProp.maxGridSize[2]);
    printf(
    " Maximum memory pitch: %lu bytes ", deviceProp.memPitch);
    exit(EXIT_SUCCESS); }

    编译运行:

    $ nvcc checkDeviceInfor.cu -o checkDeviceInfor
    $ ./checkDeviceInfor

    输出:

    ./checkDeviceInfor Starting...
    Detected 2 CUDA Capable device(s)
    Device 0: "Tesla M2070"
    CUDA Driver Version / Runtime Version 5.5 / 5.5
    CUDA Capability Major/Minor version number: 2.0
    Total amount of global memory: 5.25 MBytes (5636554752 bytes)
    GPU Clock rate: 1147 MHz (1.15 GHz)
    Memory Clock rate: 1566 Mhz
    Memory Bus Width: 384-bit
    L2 Cache Size: 786432 bytes
    Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
    Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 32768
    Warp size: 32
    Maximum number of threads per multiprocessor: 1536
    Maximum number of threads per block: 1024
    Maximum sizes of each dimension of a block: 1024 x 1024 x 64
    Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
    Maximum memory pitch: 2147483647 bytes

    决定最佳GPU

    对于支持多GPU的系统,是需要从中选择一个来作为我们的device的,抉择出最佳计算性能GPU的一种方法就是由其拥有的处理器数量决定,可以用下面的代码来选择最佳GPU。

    int numDevices = 0;
    cudaGetDeviceCount(&numDevices);
    if (numDevices > 1) {
        int maxMultiprocessors = 0, maxDevice = 0;
        for (int device=0; device<numDevices; device++) {
            cudaDeviceProp props;
            cudaGetDeviceProperties(&props, device);
            if (maxMultiprocessors < props.multiProcessorCount) {
                maxMultiprocessors = props.multiProcessorCount;
                maxDevice = device;
            }
        }
        cudaSetDevice(maxDevice);
    }    

    使用nvidia-smi来查询GPU信息

    nvidia-smi是一个命令行工具,可以帮助你管理操作GPU device,并且允许你查询和更改device状态。

    nvidia-smi用处很多,比如,下面的指令:

    $ nvidia-smi -L
    GPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)
    GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

    然后可以使用下面的命令来查询GPU 0 的详细信息:

    $nvidia-smi –q –i 0

    下面是该命令的一些参数,可以精简nvidia-smi的显示信息:

    MEMORY

    UTILIZATION

    ECC

    TEMPERATURE

    POWER

    CLOCK

    COMPUTE

    PIDS

    PERFORMANCE

    SUPPORTED_CLOCKS

    PAGE_RETIREMENT

    ACCOUNTING

    比如,显示只device memory的信息:

    $nvidia-smi –q –i 0 –d    MEMORY | tail –n 5
    Memory Usage
    Total : 5375 MB
    Used : 9 MB
    Free : 5366 MB

    设置device

    对于多GPU系统,使用nvidia-smi可以查看各GPU属性,每个GPU从0开始依次标注,使用环境变量CUDA_VISIBLE_DEVICES可以指定GPU而不用修改application。

    可以设置环境变量CUDA_VISIBLE_DEVICES-2来屏蔽其他GPU,这样只有GPU2能被使用。当然也可以使用CUDA_VISIBLE_DEVICES-2,3来设置多个GPU,他们的device ID分别为0和1.

    代码下载:CodeSamples.zip

  • 相关阅读:
    《复杂网络环境下访问控制技术》读书笔记(2)
    《复杂网络环境下访问控制技术》读书笔记(1)
    20199319《网络攻防实践》假期作业
    云班课实验补充
    20199319 缓冲区溢出漏洞试验
    20199319《Linux内核原理与分析》第十二周作业
    20199319《Linux内核原理与分析》第十一周作业
    图书管理系统UML建模
    《深入理解计算机系统(第三版)》第二章学习总结
    2019-2020-1 20199319《Linux内核原理与分析》第九周作业
  • 原文地址:https://www.cnblogs.com/1024incn/p/4539697.html
Copyright © 2011-2022 走看看