zoukankan      html  css  js  c++  java
  • 视频流GPU解码在ffempg的实现(一)-基本概念

    这段时间在实现Gpu的视频流解码,遇到了很多的问题。

    得到了阿里视频处理专家蔡鼎老师以及英伟达开发季光老师的指导,在这里表示感谢!

    基本命令(linux下)

    1.查看物理显卡

    lspci  | grep -i vga
    root@g1060server:/home/user# lspci  | grep -i vga
    09:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
    81:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
    82:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)


    2.直接查看英伟达的物理显卡信息
    有的时候因为服务器型号,GPU型号等不兼容等问题,会导致主板无法识别到插入的显卡,
    我们可用下面的命令来查看主板是否识别到了显卡:

    root@g1060server:/home/user# lspci | grep -i nvidia
    81:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
    81:00.1 Audio device: NVIDIA Corporation Device 10f1 (rev a1)
    82:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
    82:00.1 Audio device: NVIDIA Corporation Device 10f1 (rev a1)

    出现上面的东西,说明主板已经识别到显卡信息


    cuda版本,驱动信息

    root@g1060server:/home/user# nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2013 NVIDIA Corporation
    Built on Wed_Jul_17_18:36:13_PDT_2013
    Cuda compilation tools, release 5.5, V5.5.0


    英伟达显卡运行状态信息

    root@g1060server:/home/user# nvidia-smi
    modprobe: ERROR: could not insert 'nvidia_340': No such device
    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

    查看失败,一般没安装驱动

    user@g1060server:~$ nvidia-smi
    Fri Jan  5 21:50:34 2018       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 106...  Off  | 00000000:81:00.0  On |                  N/A |
    | 32%   35C    P8    10W / 120W |   3083MiB /  6071MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  GeForce GTX 106...  Off  | 00000000:82:00.0 Off |                  N/A |
    | 32%   37C    P8    10W / 120W |   2542MiB /  6072MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+

    查看成功


    查看cuda驱动是否安装成功

    root@g1060server:/home/user# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
    root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# ls
    deviceQuery  deviceQuery.cpp  deviceQuery.o  Makefile  NsightEclipse.xml  readme.txt
    root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# make
    make: 没有什么可以做的为 `all'
    root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# ./deviceQuery
    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    cudaGetDeviceCount returned 35
    -> CUDA driver version is insufficient for CUDA runtime version
    Result = FAIL

    再次确认cuda驱动安装失败

    查看cuda是否安装成功
    /usr/local/cuda/extras/demo_suite/deviceQuery
    
    root@g1060server:/home/user/mjl/test# /usr/local/cuda/extras/demo_suite/deviceQuery
    /usr/local/cuda/extras/demo_suite/deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 2 CUDA Capable device(s)
    
    Device 0: "GeForce GTX 1060 6GB"
      CUDA Driver Version / Runtime Version          9.0 / 8.0
      CUDA Capability Major/Minor version number:    6.1
      Total amount of global memory:                 6071 MBytes (6366363648 bytes)
      (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
      GPU Max Clock rate:                            1709 MHz (1.71 GHz)
      Memory Clock rate:                             4004 Mhz
      Memory Bus Width:                              192-bit
      L2 Cache Size:                                 1572864 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 129 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 1: "GeForce GTX 1060 6GB"
      CUDA Driver Version / Runtime Version          9.0 / 8.0
      CUDA Capability Major/Minor version number:    6.1
      Total amount of global memory:                 6073 MBytes (6367739904 bytes)
      (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
      GPU Max Clock rate:                            1709 MHz (1.71 GHz)
      Memory Clock rate:                             4004 Mhz
      Memory Bus Width:                              192-bit
      L2 Cache Size:                                 1572864 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 130 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    > Peer access from GeForce GTX 1060 6GB (GPU0) -> GeForce GTX 1060 6GB (GPU1) : Yes
    > Peer access from GeForce GTX 1060 6GB (GPU1) -> GeForce GTX 1060 6GB (GPU0) : Yes
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 2, Device0 = GeForce GTX 1060 6GB, Device1 = GeForce GTX 1060 6GB
    Result = PASS

    查看成功

    主要流程

    要想实现ffempg的GPU化,必须要要对ffempg的解码流程有基本的认识才能改造(因为GPU也是这个流程,不过中间一部分用GPU运算)

    我在http://www.cnblogs.com/baldermurphy/p/7828337.html 中曾经帖出过CPU解码的流程

    主要流程如下

        avformat_network_init();
        av_register_all();//1.注册各种编码解码模块,如果3.3及以上版本,里面包含GPU解码模块
      
        std::string tempfile = “xxxx”;//视频流地址
    
        avformat_find_stream_info(format_context_, nullptr)//2.拉取一小段数据流分析,便于得到数据的基本格式
        if (AVMEDIA_TYPE_VIDEO == enc->codec_type && video_stream_index_ < 0)//3.筛选出视频流
        codec_ = avcodec_find_decoder(enc->codec_id);//4.找到对应的解码器
        codec_context_ = avcodec_alloc_context3(codec_);//5.创建解码器对应的结构体
        
        av_read_frame(format_context_, &packet_); //6.读取数据包
        
        avcodec_send_packet(codec_context_, &packet_) //7.发出解码
        avcodec_receive_frame(codec_context_, yuv_frame_) //8.接收解码 
        
        sws_scale(y2r_sws_context_, yuv_frame_->data, yuv_frame_->linesize, 0, codec_context_->height, rgb_data_, rgb_line_size_) //9.数据格式转换

     GPU解码需要改变4,7,8,9这几个步骤,也就是

    找到gpu解码器,

    拉取数据给GPU解码器,

    得到解码后的数据,

    数据格式使用gpu转换(如果需要的话,如nv12转bgra),

    最终的格式由具体的需求确定,比如很多opengl的互操作,转成指定的格式(bgra),共用一段内存,数据立刻刷新,连拷贝都不用;

    如果是转化成图片,又是另一种需求(bgr);

    适用场景的匹配

    不得不提的一点是,GPU运算是一个很好的功能,可是也要看需求和场景,如果不考虑这个,可能得不偿失

    比如一个极端的例子,opencv里面也有实现图片的解码,可是在追求效率的时候不会使用它的,

    因为一张图片数据上传到GPU(非并行,很耗时),解码(非常快),GPU显存拷贝到内存(非并行,很耗时)

    在上传和拷贝出来的就花了几百毫秒,而图片数据的操作很频繁,这会导致cpu占用率的得不到很好的缓解,甚至有的时候,不降反升,解码虽然快,可是用户的体验是慢,而且CPU,GPU都占用了

    主要的几个网站

    英伟达推荐的ffempg的gpu解码sdk

    https://developer.nvidia.com/nvidia-video-codec-sdk

    检查显存泄露的工具

    http://docs.nvidia.com/cuda/cuda-memcheck/index.html#device-side-allocation-checking

  • 相关阅读:
    git---如何解决The authenticity of host can't be established.
    前端模板引擎artTemplate.js
    微信小程序
    小程序的项目结构设计
    拖拽插件SortableJS
    iscroll.js的简单使用方法
    头疼的闭包
    关于setTimeout的妙用前端函数节流
    webpack 加载动态图片
    在React中实现条件渲染的7种方法
  • 原文地址:https://www.cnblogs.com/baldermurphy/p/8093402.html
Copyright © 2011-2022 走看看