zoukankan      html  css  js  c++  java
  • docker 使用 Nvidia 显卡

    docker19.03读取NVIDIA显卡

    作者: 张首富
    时间: 2019-09-06
    w x: y18163201
    

    前言

    2019年7月的docker 19.03已经正式发布了,这次发布对我来说有两大亮点。
    1,就是docker不需要root权限来启动喝运行了
    2,就是支持GPU的增强功能,我们在docker里面想读取nvidia显卡再也不需要额外的安装nvidia-docker

    安装nvidia驱动

    确认已检测到NVIDIA卡:

    $ lspci -vv | grep -i nvidia
    00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
            Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
            Kernel modules: nvidiafb
    

    这里不再详细介绍:如果不知道请移步ubuntu离线安装TTS服务

    安装NVIDIA Container Runtime

    $ cat nvidia-container-runtime-script.sh
     
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | 
      sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | 
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    

    执行脚本

    sh nvidia-container-runtime-script.sh
    
    OK
    deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
    deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
    Hit:1 http://archive.canonical.com/ubuntu bionic InRelease
    Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]                
    Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]           
    Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease                                       
    Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]                 
    Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [3084 B]            
    Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease
    Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease
    Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease
    Fetched 9435 B in 1s (17.8 kB/s)                   
    Reading package lists... Done
    
    $ apt-get install nvidia-container-runtime
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      grub-pc-bin libnuma1
    Use 'sudo apt autoremove' to remove them.
    The following additional packages will be installed:
    Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container1 1.0.2-1 [59.1 kB]
    Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container-tools 1.0.2-1 [15.4 kB]
    Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]
    
    ...
    Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
    Setting up libnvidia-container1:amd64 (1.0.2-1) ...
    Setting up libnvidia-container-tools (1.0.2-1) ...
    Processing triggers for libc-bin (2.27-3ubuntu1) ...
    Setting up nvidia-container-runtime-hook (1.4.0-1) ...
    Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
    
    which nvidia-container-runtime-hook
    /usr/bin/nvidia-container-runtime-hook
    

    安装docker-19.03

    # step 1: 安装必要的一些系统工具
    yum install -y yum-utils device-mapper-persistent-data lvm2
    # Step 2: 添加软件源信息
    yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
    # Step 3: 更新并安装 Docker-CE
    yum makecache fast
    yum -y install docker-ce-19.03.2
    # Step 4: 开启Docker服务
    systemctl start docker && systemctl enable docker
    

    验证docker版本是否安装正常

    $ docker version
    Client: Docker Engine - Community
     Version:           19.03.2
     API version:       1.40
     Go version:        go1.12.8
     Git commit:        6a30dfc
     Built:             Thu Aug 29 05:28:55 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server: Docker Engine - Community
     Engine:
      Version:          19.03.2
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.12.8
      Git commit:       6a30dfc
      Built:            Thu Aug 29 05:27:34 2019
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.6
      GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
     runc:
      Version:          1.0.0-rc8
      GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683
    

    验证下-gpus选项

    $ docker run --help | grep -i gpus
          --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)
    

    运行利用GPU的Ubuntu容器

     $ docker run -it --rm --gpus all ubuntu nvidia-smi
    Unable to find image 'ubuntu:latest' locally
    latest: Pulling from library/ubuntu
    f476d66f5408: Pull complete 
    8882c27f669e: Pull complete 
    d9af21273955: Pull complete 
    f5029279ec12: Pull complete 
    Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
    Status: Downloaded newer image for ubuntu:latest
    Tue May  7 15:52:15 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.116                Driver Version: 390.116                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    :~$ 
    

    故障排除

    您是否遇到以下错误消息:

    $ docker run -it --rm --gpus all debian
    docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].
    

    上述错误意味着Nvidia无法正确注册Docker。它实际上意味着驱动程序未正确安装在主机上。这也可能意味着安装了nvidia容器工具而无需重新启动docker守护程序:您需要重新启动docker守护程序。

    我建议你回去验证是否安装了nvidia-container-runtime或者重新启动Docker守护进程。

    列出GPU设备

    $ docker run -it --rm --gpus all ubuntu nvidia-smi -L
    GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
    
    $ docker run -it --rm --gpus all ubuntu nvidia-smi  --query-gpu=index,name,uui
    d,serial --format=csv
    index, name, uuid, serial
    0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224
    

    原文转载至: https://collabnix.com/introducing-new-docker-cli-api-support-for-nvidia-gpus-under-docker-engine-19-03-0-beta-release/

  • 相关阅读:
    使用javap分析Java的字符串操作
    使用javap深入理解Java整型常量和整型变量的区别
    分享一个WebGL开发的网站-用JavaScript + WebGL开发3D模型
    Java动态代理之InvocationHandler最简单的入门教程
    Java实现 LeetCode 542 01 矩阵(暴力大法,正反便利)
    Java实现 LeetCode 542 01 矩阵(暴力大法,正反便利)
    Java实现 LeetCode 542 01 矩阵(暴力大法,正反便利)
    Java实现 LeetCode 541 反转字符串 II(暴力大法)
    Java实现 LeetCode 541 反转字符串 II(暴力大法)
    Java实现 LeetCode 541 反转字符串 II(暴力大法)
  • 原文地址:https://www.cnblogs.com/shoufu/p/12904832.html
Copyright © 2011-2022 走看看