zoukankan      html  css  js  c++  java
  • CUDA ---- device管理

    device管理

    NVIDIA提供了集中凡是来查询和管理GPU device,掌握GPU信息查询很重要,因为这可以帮助你设置kernel的执行配置。

    本博文将主要介绍下面两方面内容:

    • CUDA runtime API function
    • NVIDIA系统管理命令行

    使用runtime API来查询GPU信息

    你可以使用下面的function来查询所有关于GPU device 的信息:

    cudaError_t cudaGetDeviceProperties(cudaDeviceProp *prop, int device);

    GPU的信息放在cudaDeviceProp这个结构体中。

    代码

    #include <cuda_runtime.h>
    #include <stdio.h>
    int main(int argc, char **argv) {

      printf(
    "%s Starting... ", argv[0]); int deviceCount = 0; cudaError_t error_id = cudaGetDeviceCount(&deviceCount); if (error_id != cudaSuccess) { printf("cudaGetDeviceCount returned %d -> %s ", (int)error_id, cudaGetErrorString(error_id)); printf("Result = FAIL "); exit(EXIT_FAILURE); } if (deviceCount == 0) { printf("There are no available device(s) that support CUDA "); } else { printf("Detected %d CUDA Capable device(s) ", deviceCount); }
    int dev, driverVersion = 0, runtimeVersion = 0; dev =0; cudaSetDevice(dev); cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, dev); printf("Device %d: "%s" ", dev, deviceProp.name); cudaDriverGetVersion(&driverVersion); cudaRuntimeGetVersion(&runtimeVersion); printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d ",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10); printf(" CUDA Capability Major/Minor version number: %d.%d ",deviceProp.major, deviceProp.minor); printf(" Total amount of global memory: %.2f MBytes (%llu bytes) ",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem); printf(" GPU Clock rate: %.0f MHz (%0.2f GHz) ",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f); printf(" Memory Clock rate: %.0f Mhz ",deviceProp.memoryClockRate * 1e-3f); printf(" Memory Bus Width: %d-bit ",deviceProp.memoryBusWidth); if (deviceProp.l2CacheSize) { printf(" L2 Cache Size: %d bytes ", deviceProp.l2CacheSize); }
    printf(
    " Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d) ", deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1], deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
    printf(
    " Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d ", deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
    printf(
    " Total amount of constant memory: %lu bytes ",deviceProp.totalConstMem); printf(" Total amount of shared memory per block: %lu bytes ",deviceProp.sharedMemPerBlock); printf(" Total number of registers available per block: %d ",deviceProp.regsPerBlock); printf(" Warp size: %d ", deviceProp.warpSize); printf(" Maximum number of threads per multiprocessor: %d ",deviceProp.maxThreadsPerMultiProcessor); printf(" Maximum number of threads per block: %d ",deviceProp.maxThreadsPerBlock);
    printf(
    " Maximum sizes of each dimension of a block: %d x %d x %d ", deviceProp.maxThreadsDim[0], deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]);
    printf(
    " Maximum sizes of each dimension of a grid: %d x %d x %d ", deviceProp.maxGridSize[0], deviceProp.maxGridSize[1], deviceProp.maxGridSize[2]);
    printf(
    " Maximum memory pitch: %lu bytes ", deviceProp.memPitch);
    exit(EXIT_SUCCESS); }

    编译运行:

    $ nvcc checkDeviceInfor.cu -o checkDeviceInfor
    $ ./checkDeviceInfor

    输出:

    ./checkDeviceInfor Starting...
    Detected 2 CUDA Capable device(s)
    Device 0: "Tesla M2070"
    CUDA Driver Version / Runtime Version 5.5 / 5.5
    CUDA Capability Major/Minor version number: 2.0
    Total amount of global memory: 5.25 MBytes (5636554752 bytes)
    GPU Clock rate: 1147 MHz (1.15 GHz)
    Memory Clock rate: 1566 Mhz
    Memory Bus Width: 384-bit
    L2 Cache Size: 786432 bytes
    Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
    Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 32768
    Warp size: 32
    Maximum number of threads per multiprocessor: 1536
    Maximum number of threads per block: 1024
    Maximum sizes of each dimension of a block: 1024 x 1024 x 64
    Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
    Maximum memory pitch: 2147483647 bytes

    决定最佳GPU

    对于支持多GPU的系统,是需要从中选择一个来作为我们的device的,抉择出最佳计算性能GPU的一种方法就是由其拥有的处理器数量决定,可以用下面的代码来选择最佳GPU。

    int numDevices = 0;
    cudaGetDeviceCount(&numDevices);
    if (numDevices > 1) {
        int maxMultiprocessors = 0, maxDevice = 0;
        for (int device=0; device<numDevices; device++) {
            cudaDeviceProp props;
            cudaGetDeviceProperties(&props, device);
            if (maxMultiprocessors < props.multiProcessorCount) {
                maxMultiprocessors = props.multiProcessorCount;
                maxDevice = device;
            }
        }
        cudaSetDevice(maxDevice);
    }    

    使用nvidia-smi来查询GPU信息

    nvidia-smi是一个命令行工具,可以帮助你管理操作GPU device,并且允许你查询和更改device状态。

    nvidia-smi用处很多,比如,下面的指令:

    $ nvidia-smi -L
    GPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)
    GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

    然后可以使用下面的命令来查询GPU 0 的详细信息:

    $nvidia-smi –q –i 0

    下面是该命令的一些参数,可以精简nvidia-smi的显示信息:

    MEMORY

    UTILIZATION

    ECC

    TEMPERATURE

    POWER

    CLOCK

    COMPUTE

    PIDS

    PERFORMANCE

    SUPPORTED_CLOCKS

    PAGE_RETIREMENT

    ACCOUNTING

    比如,显示只device memory的信息:

    $nvidia-smi –q –i 0 –d    MEMORY | tail –n 5
    Memory Usage
    Total : 5375 MB
    Used : 9 MB
    Free : 5366 MB

    设置device

    对于多GPU系统,使用nvidia-smi可以查看各GPU属性,每个GPU从0开始依次标注,使用环境变量CUDA_VISIBLE_DEVICES可以指定GPU而不用修改application。

    可以设置环境变量CUDA_VISIBLE_DEVICES-2来屏蔽其他GPU,这样只有GPU2能被使用。当然也可以使用CUDA_VISIBLE_DEVICES-2,3来设置多个GPU,他们的device ID分别为0和1.

    代码下载:CodeSamples.zip

  • 相关阅读:
    什么是ORM
    ORM优缺点
    Azure 中快速搭建 FTPS 服务
    连接到 Azure 上的 SQL Server 虚拟机(经典部署)
    在 Azure 虚拟机中配置 Always On 可用性组(经典)
    SQL Server 2014 虚拟机的自动备份 (Resource Manager)
    Azure 虚拟机上的 SQL Server 常见问题
    排查在 Azure 中新建 Windows 虚拟机时遇到的经典部署问题
    上传通用化 VHD 并使用它在 Azure 中创建新 VM
    排查在 Azure 中新建 Windows VM 时遇到的部署问题
  • 原文地址:https://www.cnblogs.com/1024incn/p/4539697.html
Copyright © 2011-2022 走看看