zoukankan      html  css  js  c++  java
  • docker 使用 Nvidia 显卡

    docker19.03读取NVIDIA显卡

    作者: 张首富
    时间: 2019-09-06
    w x: y18163201
    

    前言

    2019年7月的docker 19.03已经正式发布了,这次发布对我来说有两大亮点。
    1,就是docker不需要root权限来启动喝运行了
    2,就是支持GPU的增强功能,我们在docker里面想读取nvidia显卡再也不需要额外的安装nvidia-docker

    安装nvidia驱动

    确认已检测到NVIDIA卡:

    $ lspci -vv | grep -i nvidia
    00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
            Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
            Kernel modules: nvidiafb
    

    这里不再详细介绍:如果不知道请移步ubuntu离线安装TTS服务

    安装NVIDIA Container Runtime

    $ cat nvidia-container-runtime-script.sh
     
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | 
      sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | 
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    

    执行脚本

    sh nvidia-container-runtime-script.sh
    
    OK
    deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
    deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
    Hit:1 http://archive.canonical.com/ubuntu bionic InRelease
    Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]                
    Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]           
    Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease                                       
    Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]                 
    Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [3084 B]            
    Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease
    Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease
    Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease
    Fetched 9435 B in 1s (17.8 kB/s)                   
    Reading package lists... Done
    
    $ apt-get install nvidia-container-runtime
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      grub-pc-bin libnuma1
    Use 'sudo apt autoremove' to remove them.
    The following additional packages will be installed:
    Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container1 1.0.2-1 [59.1 kB]
    Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container-tools 1.0.2-1 [15.4 kB]
    Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]
    
    ...
    Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
    Setting up libnvidia-container1:amd64 (1.0.2-1) ...
    Setting up libnvidia-container-tools (1.0.2-1) ...
    Processing triggers for libc-bin (2.27-3ubuntu1) ...
    Setting up nvidia-container-runtime-hook (1.4.0-1) ...
    Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
    
    which nvidia-container-runtime-hook
    /usr/bin/nvidia-container-runtime-hook
    

    安装docker-19.03

    # step 1: 安装必要的一些系统工具
    yum install -y yum-utils device-mapper-persistent-data lvm2
    # Step 2: 添加软件源信息
    yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
    # Step 3: 更新并安装 Docker-CE
    yum makecache fast
    yum -y install docker-ce-19.03.2
    # Step 4: 开启Docker服务
    systemctl start docker && systemctl enable docker
    

    验证docker版本是否安装正常

    $ docker version
    Client: Docker Engine - Community
     Version:           19.03.2
     API version:       1.40
     Go version:        go1.12.8
     Git commit:        6a30dfc
     Built:             Thu Aug 29 05:28:55 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server: Docker Engine - Community
     Engine:
      Version:          19.03.2
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.12.8
      Git commit:       6a30dfc
      Built:            Thu Aug 29 05:27:34 2019
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.6
      GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
     runc:
      Version:          1.0.0-rc8
      GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683
    

    验证下-gpus选项

    $ docker run --help | grep -i gpus
          --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)
    

    运行利用GPU的Ubuntu容器

     $ docker run -it --rm --gpus all ubuntu nvidia-smi
    Unable to find image 'ubuntu:latest' locally
    latest: Pulling from library/ubuntu
    f476d66f5408: Pull complete 
    8882c27f669e: Pull complete 
    d9af21273955: Pull complete 
    f5029279ec12: Pull complete 
    Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
    Status: Downloaded newer image for ubuntu:latest
    Tue May  7 15:52:15 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.116                Driver Version: 390.116                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    :~$ 
    

    故障排除

    您是否遇到以下错误消息:

    $ docker run -it --rm --gpus all debian
    docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].
    

    上述错误意味着Nvidia无法正确注册Docker。它实际上意味着驱动程序未正确安装在主机上。这也可能意味着安装了nvidia容器工具而无需重新启动docker守护程序:您需要重新启动docker守护程序。

    我建议你回去验证是否安装了nvidia-container-runtime或者重新启动Docker守护进程。

    列出GPU设备

    $ docker run -it --rm --gpus all ubuntu nvidia-smi -L
    GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
    
    $ docker run -it --rm --gpus all ubuntu nvidia-smi  --query-gpu=index,name,uui
    d,serial --format=csv
    index, name, uuid, serial
    0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224
    

    原文转载至: https://collabnix.com/introducing-new-docker-cli-api-support-for-nvidia-gpus-under-docker-engine-19-03-0-beta-release/

  • 相关阅读:
    区块链技术栈-区块链账本
    (引文)可扩展的分布式数据库架构
    CentOS7 通过systemd 添加开机重启服务
    spring发布RMI服务(-)
    使用jdbc连接上oracle的两种方法
    用户态和内核态
    MySQL数据库备份还原(基于binlog的增量备份)
    分布式事务
    shuffle 过程
    MapReduce的流程
  • 原文地址:https://www.cnblogs.com/shoufu/p/12904832.html
Copyright © 2011-2022 走看看