zoukankan      html  css  js  c++  java
  • OpenCL For Opencv

    一、Opencv-OCL编程基础

    1. Opencv OCL基本编程API

    a) Opencv4.2 OCL API

    b) Opencv3.1 OCL API

    2. 图像处理Kernel实现及CU单元配置

    3. Demo实验

            我目前编译使用的opencv版本是opencv4.2【如果版本不同请下载不同版本下的Demo程序】,使用如下官方提供的Opencv-OCL代码,如果只是单纯的测试运行此官方提供的代码不需要有特定的加速设备,直接使用多核心CPU-PC平台即可,因为OpenCL本身就支持了CPU加速,具体代码如下:

      1 // This file is part of OpenCV project.
      2 // It is subject to the license terms in the LICENSE file found in the top-level directory
      3 // of this distribution and at http://opencv.org/license.html
      4 
      5 #include "opencv2/core.hpp"
      6 #include "opencv2/core/ocl.hpp"
      7 #include "opencv2/highgui.hpp"
      8 #include "opencv2/imgcodecs.hpp"
      9 #include "opencv2/imgproc.hpp"
     10 
     11 #include <iostream>
     12 
     13 using namespace std;
     14 using namespace cv;
     15 
     16 static const char* opencl_kernel_src =
     17 "__kernel void magnutude_filter_8u(
    "
     18 "       __global const uchar* src, int src_step, int src_offset,
    "
     19 "       __global uchar* dst, int dst_step, int dst_offset, int dst_rows, int dst_cols,
    "
     20 "       float scale)
    "
     21 "{
    "
     22 "   int x = get_global_id(0);
    "
     23 "   int y = get_global_id(1);
    "
     24 "   if (x < dst_cols && y < dst_rows)
    "
     25 "   {
    "
     26 "       int dst_idx = y * dst_step + x + dst_offset;
    "
     27 "       if (x > 0 && x < dst_cols - 1 && y > 0 && y < dst_rows - 2)
    "
     28 "       {
    "
     29 "           int src_idx = y * src_step + x + src_offset;
    "
     30 "           int dx = (int)src[src_idx]*2 - src[src_idx - 1]          - src[src_idx + 1];
    "
     31 "           int dy = (int)src[src_idx]*2 - src[src_idx - 1*src_step] - src[src_idx + 1*src_step];
    "
     32 "           dst[dst_idx] = convert_uchar_sat(sqrt((float)(dx*dx + dy*dy)) * scale);
    "
     33 "       }
    "
     34 "       else
    "
     35 "       {
    "
     36 "           dst[dst_idx] = 0;
    "
     37 "       }
    "
     38 "   }
    "
     39 "}
    ";
     40 
     41 int main(int argc, char** argv)
     42 {
     43     const char* keys =
     44         "{ i input    | | specify input image }"
     45         "{ h help     | | print help message }";
     46 
     47     cv::CommandLineParser args(argc, argv, keys);
     48     if (args.has("help"))
     49     {
     50         cout << "Usage : " << argv[0] << " [options]" << endl;
     51         cout << "Available options:" << endl;
     52         args.printMessage();
     53         return EXIT_SUCCESS;
     54     }
     55 
     56     cv::ocl::Context ctx = cv::ocl::Context::getDefault();
     57     if (!ctx.ptr())
     58     {
     59         cerr << "OpenCL is not available" << endl;
     60         return 1;
     61     }
     62     cv::ocl::Device device = cv::ocl::Device::getDefault();
     63     if (!device.compilerAvailable())
     64     {
     65         cerr << "OpenCL compiler is not available" << endl;
     66         return 1;
     67     }
     68 
     69 
     70     UMat src;
     71     {
     72         string image_file = args.get<string>("i");
     73         if (!image_file.empty())
     74         {
     75             Mat image = imread(samples::findFile(image_file));
     76             if (image.empty())
     77             {
     78                 cout << "error read image: " << image_file << endl;
     79                 return 1;
     80             }
     81             cvtColor(image, src, COLOR_BGR2GRAY);
     82         }
     83         else
     84         {
     85             Mat frame(cv::Size(640, 480), CV_8U, Scalar::all(128));
     86             Point p(frame.cols / 2, frame.rows / 2);
     87             line(frame, Point(0, frame.rows / 2), Point(frame.cols, frame.rows / 2), 1);
     88             circle(frame, p, 200, Scalar(32, 32, 32), 8, LINE_AA);
     89             string str = "OpenCL";
     90             int baseLine = 0;
     91             Size box = getTextSize(str, FONT_HERSHEY_COMPLEX, 2, 5, &baseLine);
     92             putText(frame, str, Point((frame.cols - box.width) / 2, (frame.rows - box.height) / 2 + baseLine),
     93                     FONT_HERSHEY_COMPLEX, 2, Scalar(255, 255, 255), 5, LINE_AA);
     94             frame.copyTo(src);
     95         }
     96     }
     97 
     98 
     99     cv::String module_name; // empty to disable OpenCL cache
    100 
    101     {
    102         cout << "OpenCL program source: " << endl;
    103         cout << "======================================================================================================" << endl;
    104         cout << opencl_kernel_src << endl;
    105         cout << "======================================================================================================" << endl;
    106         //! [Define OpenCL program source]
    107         cv::ocl::ProgramSource source(module_name, "simple", opencl_kernel_src, "");
    108         //! [Define OpenCL program source]
    109 
    110         //! [Compile/build OpenCL for current OpenCL device]
    111         cv::String errmsg;
    112         cv::ocl::Program program(source, "", errmsg);
    113         if (program.ptr() == NULL)
    114         {
    115             cerr << "Can't compile OpenCL program:" << endl << errmsg << endl;
    116             return 1;
    117         }
    118         //! [Compile/build OpenCL for current OpenCL device]
    119 
    120         if (!errmsg.empty())
    121         {
    122             cout << "OpenCL program build log:" << endl << errmsg << endl;
    123         }
    124 
    125         //! [Get OpenCL kernel by name]
    126         cv::ocl::Kernel k("magnutude_filter_8u", program);
    127         if (k.empty())
    128         {
    129             cerr << "Can't get OpenCL kernel" << endl;
    130             return 1;
    131         }
    132         //! [Get OpenCL kernel by name]
    133 
    134         UMat result(src.size(), CV_8UC1);
    135 
    136         //! [Define kernel parameters and run]
    137         size_t globalSize[2] = {(size_t)src.cols, (size_t)src.rows};
    138         size_t localSize[2] = {8, 8};
    139         bool executionResult = k
    140             .args(
    141                 cv::ocl::KernelArg::ReadOnlyNoSize(src), // size is not used (similar to 'dst' size)
    142                 cv::ocl::KernelArg::WriteOnly(result),
    143                 (float)2.0
    144             )
    145             .run(2, globalSize, localSize, true);
    146         if (!executionResult)
    147         {
    148             cerr << "OpenCL kernel launch failed" << endl;
    149             return 1;
    150         }
    151         //! [Define kernel parameters and run]
    152 
    153         imshow("Source", src);
    154         imshow("Result", result);
    155 
    156         for (;;)
    157         {
    158             int key = waitKey();
    159             if (key == 27/*ESC*/ || key == 'q' || key == 'Q')
    160                 break;
    161         }
    162     }
    163     return 0;
    164 }

    使用mingw编译上述程序并运行结果如下(根据运行结果,说明对图像进行了边缘提取的功能):

    当然如果你想处理其他的图像,也可以使用在CMD窗口当中调用编译完成的.exe文件加上文件的绝对路径。

    终端Terminal当中显示了Kernel的基本内容,如下所示:

            从Kernel的结构可以分析出来,kernel中实现的是求解了图像自身的x方向以及y方向的梯度,并求出了每一点的梯度方向,实际上比较类似与Canny边缘算子的检测算法,kernel使用了二维的方式处理图片上每一个点,在kernel核当中,使用了if判断是否存在指针越界的情况,具体的基础实现相关内容请移步OpenCL基础入门

    二、Demo代码变形

    实现图像3x3均值滤波

    三、嵌入式平台移植与编译(TI AM57x 系列)

    Opencv3.1.0版本master下,opencv-ocl代码官方Demo

       1 /*
       2 // The example of interoperability between OpenCL and OpenCV.
       3 // This will loop through frames of video either from input media file
       4 // or camera device and do processing of these data in OpenCL and then
       5 // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and
       6 // in OpenCV it does bluring in the right half of frame.
       7 */
       8 #include <cstdio>
       9 #include <cstdlib>
      10 #include <iostream>
      11 #include <fstream>
      12 #include <string>
      13 #include <sstream>
      14 #include <iomanip>
      15 #include <stdexcept>
      16 
      17 #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning
      18 
      19 #if __APPLE__
      20 #include <OpenCL/cl.h>
      21 #else
      22 #include <CL/cl.h>
      23 #endif
      24 
      25 #include <opencv2/core/ocl.hpp>
      26 #include <opencv2/core/utility.hpp>
      27 #include <opencv2/video.hpp>
      28 #include <opencv2/highgui.hpp>
      29 #include <opencv2/imgproc.hpp>
      30 
      31 
      32 using namespace std;
      33 using namespace cv;
      34 
      35 namespace opencl {
      36 
      37 class PlatformInfo
      38 {
      39 public:
      40     PlatformInfo()
      41     {}
      42 
      43     ~PlatformInfo()
      44     {}
      45 
      46     cl_int QueryInfo(cl_platform_id id)
      47     {
      48         query_param(id, CL_PLATFORM_PROFILE, m_profile);
      49         query_param(id, CL_PLATFORM_VERSION, m_version);
      50         query_param(id, CL_PLATFORM_NAME, m_name);
      51         query_param(id, CL_PLATFORM_VENDOR, m_vendor);
      52         query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions);
      53         return CL_SUCCESS;
      54     }
      55 
      56     std::string Profile()    { return m_profile; }
      57     std::string Version()    { return m_version; }
      58     std::string Name()       { return m_name; }
      59     std::string Vendor()     { return m_vendor; }
      60     std::string Extensions() { return m_extensions; }
      61 
      62 private:
      63     cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr)
      64     {
      65         cl_int res;
      66 
      67         size_t psize;
      68         cv::AutoBuffer<char> buf;
      69 
      70         res = clGetPlatformInfo(id, param, 0, 0, &psize);
      71         if (CL_SUCCESS != res)
      72             throw std::runtime_error(std::string("clGetPlatformInfo failed"));
      73 
      74         buf.resize(psize);
      75         res = clGetPlatformInfo(id, param, psize, buf, 0);
      76         if (CL_SUCCESS != res)
      77             throw std::runtime_error(std::string("clGetPlatformInfo failed"));
      78 
      79         // just in case, ensure trailing zero for ASCIIZ string
      80         buf[psize] = 0;
      81 
      82         paramStr = buf;
      83 
      84         return CL_SUCCESS;
      85     }
      86 
      87 private:
      88     std::string m_profile;
      89     std::string m_version;
      90     std::string m_name;
      91     std::string m_vendor;
      92     std::string m_extensions;
      93 };
      94 
      95 
      96 class DeviceInfo
      97 {
      98 public:
      99     DeviceInfo()
     100     {}
     101 
     102     ~DeviceInfo()
     103     {}
     104 
     105     cl_int QueryInfo(cl_device_id id)
     106     {
     107         query_param(id, CL_DEVICE_TYPE, m_type);
     108         query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id);
     109         query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units);
     110         query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions);
     111         query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes);
     112         query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size);
     113         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char);
     114         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short);
     115         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int);
     116         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long);
     117         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float);
     118         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double);
     119 #if defined(CL_VERSION_1_1)
     120         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half);
     121         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char);
     122         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short);
     123         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int);
     124         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long);
     125         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float);
     126         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double);
     127         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half);
     128 #endif
     129         query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency);
     130         query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits);
     131         query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size);
     132         query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support);
     133         query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args);
     134         query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args);
     135 #if defined(CL_VERSION_2_0)
     136         query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args);
     137 #endif
     138         query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width);
     139         query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height);
     140         query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width);
     141         query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height);
     142         query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth);
     143 #if defined(CL_VERSION_1_2)
     144         query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size);
     145         query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size);
     146 #endif
     147         query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers);
     148 #if defined(CL_VERSION_1_2)
     149         query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment);
     150         query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment);
     151 #endif
     152 #if defined(CL_VERSION_2_0)
     153         query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args);
     154         query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations);
     155         query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size);
     156 #endif
     157         query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size);
     158         query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align);
     159         query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config);
     160 #if defined(CL_VERSION_1_2)
     161         query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config);
     162 #endif
     163         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type);
     164         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size);
     165         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size);
     166         query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size);
     167         query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size);
     168         query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args);
     169 #if defined(CL_VERSION_2_0)
     170         query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size);
     171         query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size);
     172 #endif
     173         query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type);
     174         query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size);
     175         query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support);
     176 #if defined(CL_VERSION_1_1)
     177         query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory);
     178 #endif
     179         query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution);
     180         query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little);
     181         query_param(id, CL_DEVICE_AVAILABLE, m_available);
     182         query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available);
     183 #if defined(CL_VERSION_1_2)
     184         query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available);
     185 #endif
     186         query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities);
     187         query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties);
     188 #if defined(CL_VERSION_2_0)
     189         query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties);
     190         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties);
     191         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size);
     192         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size);
     193         query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues);
     194         query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events);
     195 #endif
     196 #if defined(CL_VERSION_1_2)
     197         query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels);
     198 #endif
     199         query_param(id, CL_DEVICE_PLATFORM, m_platform);
     200         query_param(id, CL_DEVICE_NAME, m_name);
     201         query_param(id, CL_DEVICE_VENDOR, m_vendor);
     202         query_param(id, CL_DRIVER_VERSION, m_driver_version);
     203         query_param(id, CL_DEVICE_PROFILE, m_profile);
     204         query_param(id, CL_DEVICE_VERSION, m_version);
     205 #if defined(CL_VERSION_1_1)
     206         query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version);
     207 #endif
     208         query_param(id, CL_DEVICE_EXTENSIONS, m_extensions);
     209 #if defined(CL_VERSION_1_2)
     210         query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size);
     211         query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync);
     212         query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device);
     213         query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices);
     214         query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties);
     215         query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain);
     216         query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type);
     217         query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count);
     218 #endif
     219         return CL_SUCCESS;
     220     }
     221 
     222     std::string Name() { return m_name; }
     223 
     224 private:
     225     template<typename T>
     226     cl_int query_param(cl_device_id id, cl_device_info param, T& value)
     227     {
     228         cl_int res;
     229         size_t size = 0;
     230 
     231         res = clGetDeviceInfo(id, param, 0, 0, &size);
     232         if (CL_SUCCESS != res && size != 0)
     233             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     234 
     235         if (0 == size)
     236             return CL_SUCCESS;
     237 
     238         if (sizeof(T) != size)
     239             throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch"));
     240 
     241         res = clGetDeviceInfo(id, param, size, &value, 0);
     242         if (CL_SUCCESS != res)
     243             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     244 
     245         return CL_SUCCESS;
     246     }
     247 
     248     template<typename T>
     249     cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value)
     250     {
     251         cl_int res;
     252         size_t size;
     253 
     254         res = clGetDeviceInfo(id, param, 0, 0, &size);
     255         if (CL_SUCCESS != res)
     256             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     257 
     258         if (0 == size)
     259             return CL_SUCCESS;
     260 
     261         value.resize(size / sizeof(T));
     262 
     263         res = clGetDeviceInfo(id, param, size, &value[0], 0);
     264         if (CL_SUCCESS != res)
     265             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     266 
     267         return CL_SUCCESS;
     268     }
     269 
     270     cl_int query_param(cl_device_id id, cl_device_info param, std::string& value)
     271     {
     272         cl_int res;
     273         size_t size;
     274 
     275         res = clGetDeviceInfo(id, param, 0, 0, &size);
     276         if (CL_SUCCESS != res)
     277             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     278 
     279         value.resize(size + 1);
     280 
     281         res = clGetDeviceInfo(id, param, size, &value[0], 0);
     282         if (CL_SUCCESS != res)
     283             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
     284 
     285         // just in case, ensure trailing zero for ASCIIZ string
     286         value[size] = 0;
     287 
     288         return CL_SUCCESS;
     289     }
     290 
     291 private:
     292     cl_device_type                            m_type;
     293     cl_uint                                   m_vendor_id;
     294     cl_uint                                   m_max_compute_units;
     295     cl_uint                                   m_max_work_item_dimensions;
     296     std::vector<size_t>                       m_max_work_item_sizes;
     297     size_t                                    m_max_work_group_size;
     298     cl_uint                                   m_preferred_vector_width_char;
     299     cl_uint                                   m_preferred_vector_width_short;
     300     cl_uint                                   m_preferred_vector_width_int;
     301     cl_uint                                   m_preferred_vector_width_long;
     302     cl_uint                                   m_preferred_vector_width_float;
     303     cl_uint                                   m_preferred_vector_width_double;
     304 #if defined(CL_VERSION_1_1)
     305     cl_uint                                   m_preferred_vector_width_half;
     306     cl_uint                                   m_native_vector_width_char;
     307     cl_uint                                   m_native_vector_width_short;
     308     cl_uint                                   m_native_vector_width_int;
     309     cl_uint                                   m_native_vector_width_long;
     310     cl_uint                                   m_native_vector_width_float;
     311     cl_uint                                   m_native_vector_width_double;
     312     cl_uint                                   m_native_vector_width_half;
     313 #endif
     314     cl_uint                                   m_max_clock_frequency;
     315     cl_uint                                   m_address_bits;
     316     cl_ulong                                  m_max_mem_alloc_size;
     317     cl_bool                                   m_image_support;
     318     cl_uint                                   m_max_read_image_args;
     319     cl_uint                                   m_max_write_image_args;
     320 #if defined(CL_VERSION_2_0)
     321     cl_uint                                   m_max_read_write_image_args;
     322 #endif
     323     size_t                                    m_image2d_max_width;
     324     size_t                                    m_image2d_max_height;
     325     size_t                                    m_image3d_max_width;
     326     size_t                                    m_image3d_max_height;
     327     size_t                                    m_image3d_max_depth;
     328 #if defined(CL_VERSION_1_2)
     329     size_t                                    m_image_max_buffer_size;
     330     size_t                                    m_image_max_array_size;
     331 #endif
     332     cl_uint                                   m_max_samplers;
     333 #if defined(CL_VERSION_1_2)
     334     cl_uint                                   m_image_pitch_alignment;
     335     cl_uint                                   m_image_base_address_alignment;
     336 #endif
     337 #if defined(CL_VERSION_2_0)
     338     cl_uint                                   m_max_pipe_args;
     339     cl_uint                                   m_pipe_max_active_reservations;
     340     cl_uint                                   m_pipe_max_packet_size;
     341 #endif
     342     size_t                                    m_max_parameter_size;
     343     cl_uint                                   m_mem_base_addr_align;
     344     cl_device_fp_config                       m_single_fp_config;
     345 #if defined(CL_VERSION_1_2)
     346     cl_device_fp_config                       m_double_fp_config;
     347 #endif
     348     cl_device_mem_cache_type                  m_global_mem_cache_type;
     349     cl_uint                                   m_global_mem_cacheline_size;
     350     cl_ulong                                  m_global_mem_cache_size;
     351     cl_ulong                                  m_global_mem_size;
     352     cl_ulong                                  m_max_constant_buffer_size;
     353     cl_uint                                   m_max_constant_args;
     354 #if defined(CL_VERSION_2_0)
     355     size_t                                    m_max_global_variable_size;
     356     size_t                                    m_global_variable_preferred_total_size;
     357 #endif
     358     cl_device_local_mem_type                  m_local_mem_type;
     359     cl_ulong                                  m_local_mem_size;
     360     cl_bool                                   m_error_correction_support;
     361 #if defined(CL_VERSION_1_1)
     362     cl_bool                                   m_host_unified_memory;
     363 #endif
     364     size_t                                    m_profiling_timer_resolution;
     365     cl_bool                                   m_endian_little;
     366     cl_bool                                   m_available;
     367     cl_bool                                   m_compiler_available;
     368 #if defined(CL_VERSION_1_2)
     369     cl_bool                                   m_linker_available;
     370 #endif
     371     cl_device_exec_capabilities               m_execution_capabilities;
     372     cl_command_queue_properties               m_queue_properties;
     373 #if defined(CL_VERSION_2_0)
     374     cl_command_queue_properties               m_queue_on_host_properties;
     375     cl_command_queue_properties               m_queue_on_device_properties;
     376     cl_uint                                   m_queue_on_device_preferred_size;
     377     cl_uint                                   m_queue_on_device_max_size;
     378     cl_uint                                   m_max_on_device_queues;
     379     cl_uint                                   m_max_on_device_events;
     380 #endif
     381 #if defined(CL_VERSION_1_2)
     382     std::string                               m_built_in_kernels;
     383 #endif
     384     cl_platform_id                            m_platform;
     385     std::string                               m_name;
     386     std::string                               m_vendor;
     387     std::string                               m_driver_version;
     388     std::string                               m_profile;
     389     std::string                               m_version;
     390 #if defined(CL_VERSION_1_1)
     391     std::string                               m_opencl_c_version;
     392 #endif
     393     std::string                               m_extensions;
     394 #if defined(CL_VERSION_1_2)
     395     size_t                                    m_printf_buffer_size;
     396     cl_bool                                   m_preferred_interop_user_sync;
     397     cl_device_id                              m_parent_device;
     398     cl_uint                                   m_partition_max_sub_devices;
     399     std::vector<cl_device_partition_property> m_partition_properties;
     400     cl_device_affinity_domain                 m_partition_affinity_domain;
     401     std::vector<cl_device_partition_property> m_partition_type;
     402     cl_uint                                   m_reference_count;
     403 #endif
     404 };
     405 
     406 } // namespace opencl
     407 
     408 
     409 class App
     410 {
     411 public:
     412     App(CommandLineParser& cmd);
     413     ~App();
     414 
     415     int initOpenCL();
     416     int initVideoSource();
     417 
     418     int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer);
     419     int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u);
     420     int process_cl_image_with_opencv(cl_mem image, cv::UMat& u);
     421 
     422     int run();
     423 
     424     bool isRunning() { return m_running; }
     425     bool doProcess() { return m_process; }
     426     bool useBuffer() { return m_use_buffer; }
     427 
     428     void setRunning(bool running)      { m_running = running; }
     429     void setDoProcess(bool process)    { m_process = process; }
     430     void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; }
     431 
     432 protected:
     433     bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); }
     434     void handleKey(char key);
     435     void timerStart();
     436     void timerEnd();
     437     std::string timeStr() const;
     438     std::string message() const;
     439 
     440 private:
     441     bool                        m_running;
     442     bool                        m_process;
     443     bool                        m_use_buffer;
     444 
     445     int64                       m_t0;
     446     int64                       m_t1;
     447     float                       m_time;
     448     float                       m_frequency;
     449 
     450     string                      m_file_name;
     451     int                         m_camera_id;
     452     cv::VideoCapture            m_cap;
     453     cv::Mat                     m_frame;
     454     cv::Mat                     m_frameGray;
     455 
     456     opencl::PlatformInfo        m_platformInfo;
     457     opencl::DeviceInfo          m_deviceInfo;
     458     std::vector<cl_platform_id> m_platform_ids;
     459     cl_context                  m_context;
     460     cl_device_id                m_device_id;
     461     cl_command_queue            m_queue;
     462     cl_program                  m_program;
     463     cl_kernel                   m_kernelBuf;
     464     cl_kernel                   m_kernelImg;
     465     cl_mem                      m_img_src; // used as src in case processing of cl image
     466     cl_mem                      m_mem_obj;
     467     cl_event                    m_event;
     468 };
     469 
     470 
     471 App::App(CommandLineParser& cmd)
     472 {
     473     cout << "
    Press ESC to exit
    " << endl;
     474     cout << "
          'p' to toggle ON/OFF processing
    " << endl;
     475     cout << "
           SPACE to switch between OpenCL buffer/image
    " << endl;
     476 
     477     m_camera_id  = cmd.get<int>("camera");
     478     m_file_name  = cmd.get<string>("video");
     479 
     480     m_running    = false;
     481     m_process    = false;
     482     m_use_buffer = false;
     483 
     484     m_t0         = 0;
     485     m_t1         = 0;
     486     m_time       = 0.0;
     487     m_frequency  = (float)cv::getTickFrequency();
     488 
     489     m_context    = 0;
     490     m_device_id  = 0;
     491     m_queue      = 0;
     492     m_program    = 0;
     493     m_kernelBuf  = 0;
     494     m_kernelImg  = 0;
     495     m_img_src    = 0;
     496     m_mem_obj    = 0;
     497     m_event      = 0;
     498 } // ctor
     499 
     500 
     501 App::~App()
     502 {
     503     if (m_queue)
     504     {
     505         clFinish(m_queue);
     506         clReleaseCommandQueue(m_queue);
     507         m_queue = 0;
     508     }
     509 
     510     if (m_program)
     511     {
     512         clReleaseProgram(m_program);
     513         m_program = 0;
     514     }
     515 
     516     if (m_img_src)
     517     {
     518         clReleaseMemObject(m_img_src);
     519         m_img_src = 0;
     520     }
     521 
     522     if (m_mem_obj)
     523     {
     524         clReleaseMemObject(m_mem_obj);
     525         m_mem_obj = 0;
     526     }
     527 
     528     if (m_event)
     529     {
     530         clReleaseEvent(m_event);
     531     }
     532 
     533     if (m_kernelBuf)
     534     {
     535         clReleaseKernel(m_kernelBuf);
     536         m_kernelBuf = 0;
     537     }
     538 
     539     if (m_kernelImg)
     540     {
     541         clReleaseKernel(m_kernelImg);
     542         m_kernelImg = 0;
     543     }
     544 
     545     if (m_device_id)
     546     {
     547         clReleaseDevice(m_device_id);
     548         m_device_id = 0;
     549     }
     550 
     551     if (m_context)
     552     {
     553         clReleaseContext(m_context);
     554         m_context = 0;
     555     }
     556 } // dtor
     557 
     558 
     559 int App::initOpenCL()
     560 {
     561     cl_int res = CL_SUCCESS;
     562     cl_uint num_entries = 0;
     563 
     564     res = clGetPlatformIDs(0, 0, &num_entries);
     565     if (CL_SUCCESS != res)
     566         return -1;
     567 
     568     m_platform_ids.resize(num_entries);
     569 
     570     res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0);
     571     if (CL_SUCCESS != res)
     572         return -1;
     573 
     574     unsigned int i;
     575 
     576     // create context from first platform with GPU device
     577     for (i = 0; i < m_platform_ids.size(); i++)
     578     {
     579         cl_context_properties props[] =
     580         {
     581             CL_CONTEXT_PLATFORM,
     582             (cl_context_properties)(m_platform_ids[i]),
     583             0
     584         };
     585 
     586         m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res);
     587         if (0 == m_context || CL_SUCCESS != res)
     588             continue;
     589 
     590         res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0);
     591         if (CL_SUCCESS != res)
     592             return -1;
     593 
     594         m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res);
     595         if (0 == m_queue || CL_SUCCESS != res)
     596             return -1;
     597 
     598         const char* kernelSrc =
     599             "__kernel "
     600             "void bitwise_inv_buf_8uC1("
     601             "    __global unsigned char* pSrcDst,"
     602             "             int            srcDstStep,"
     603             "             int            rows,"
     604             "             int            cols)"
     605             "{"
     606             "    int x = get_global_id(0);"
     607             "    int y = get_global_id(1);"
     608             "    int idx = mad24(y, srcDstStep, x);"
     609             "    pSrcDst[idx] = ~pSrcDst[idx];"
     610             "}"
     611             "__kernel "
     612             "void bitwise_inv_img_8uC1("
     613             "    read_only  image2d_t srcImg,"
     614             "    write_only image2d_t dstImg)"
     615             "{"
     616             "    int x = get_global_id(0);"
     617             "    int y = get_global_id(1);"
     618             "    int2 coord = (int2)(x, y);"
     619             "    uint4 val = read_imageui(srcImg, coord);"
     620             "    val.x = (~val.x) & 0x000000FF;"
     621             "    write_imageui(dstImg, coord, val);"
     622             "}";
     623         size_t len = strlen(kernelSrc);
     624         m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res);
     625         if (0 == m_program || CL_SUCCESS != res)
     626             return -1;
     627 
     628         res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0);
     629         if (CL_SUCCESS != res)
     630             return -1;
     631 
     632         m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res);
     633         if (0 == m_kernelBuf || CL_SUCCESS != res)
     634             return -1;
     635 
     636         m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res);
     637         if (0 == m_kernelImg || CL_SUCCESS != res)
     638             return -1;
     639 
     640         m_platformInfo.QueryInfo(m_platform_ids[i]);
     641         m_deviceInfo.QueryInfo(m_device_id);
     642 
     643         // attach OpenCL context to OpenCV
     644         cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id);
     645 
     646         break;
     647     }
     648 
     649     return m_context != 0 ? CL_SUCCESS : -1;
     650 } // initOpenCL()
     651 
     652 
     653 int App::initVideoSource()
     654 {
     655     try
     656     {
     657         if (!m_file_name.empty() && m_camera_id == -1)
     658         {
     659             m_cap.open(m_file_name.c_str());
     660             if (!m_cap.isOpened())
     661                 throw std::runtime_error(std::string("can't open video file: " + m_file_name));
     662         }
     663         else if (m_camera_id != -1)
     664         {
     665             m_cap.open(m_camera_id);
     666             if (!m_cap.isOpened())
     667             {
     668                 std::stringstream msg;
     669                 msg << "can't open camera: " << m_camera_id;
     670                 throw std::runtime_error(msg.str());
     671             }
     672         }
     673         else
     674             throw std::runtime_error(std::string("specify video source"));
     675     }
     676 
     677     catch (std::exception e)
     678     {
     679         cerr << "ERROR: " << e.what() << std::endl;
     680         return -1;
     681     }
     682 
     683     return 0;
     684 } // initVideoSource()
     685 
     686 
     687 // this function is an example of "typical" OpenCL processing pipeline
     688 // It creates OpenCL buffer or image, depending on use_buffer flag,
     689 // from input media frame and process these data
     690 // (inverts each pixel value in half of frame) with OpenCL kernel
     691 int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj)
     692 {
     693     cl_int res = CL_SUCCESS;
     694 
     695     CV_Assert(mem_obj);
     696 
     697     cl_kernel kernel = 0;
     698     cl_mem mem = mem_obj[0];
     699 
     700     if (0 == mem || 0 == m_img_src)
     701     {
     702         // allocate/delete cl memory objects every frame for the simplicity.
     703         // in real applicaton more efficient pipeline can be built.
     704 
     705         if (use_buffer)
     706         {
     707             cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR;
     708 
     709             mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res);
     710             if (0 == mem || CL_SUCCESS != res)
     711                 return -1;
     712 
     713             res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem);
     714             if (CL_SUCCESS != res)
     715                 return -1;
     716 
     717             res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]);
     718             if (CL_SUCCESS != res)
     719                 return -1;
     720 
     721             res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows);
     722             if (CL_SUCCESS != res)
     723                 return -1;
     724 
     725             int cols2 = frame.cols / 2;
     726             res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2);
     727             if (CL_SUCCESS != res)
     728                 return -1;
     729 
     730             kernel = m_kernelBuf;
     731         }
     732         else
     733         {
     734             cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
     735 
     736             cl_image_format fmt;
     737             fmt.image_channel_order     = CL_R;
     738             fmt.image_channel_data_type = CL_UNSIGNED_INT8;
     739 
     740             cl_image_desc desc_src;
     741             desc_src.image_type        = CL_MEM_OBJECT_IMAGE2D;
     742             desc_src.image_width       = frame.cols;
     743             desc_src.image_height      = frame.rows;
     744             desc_src.image_depth       = 0;
     745             desc_src.image_array_size  = 0;
     746             desc_src.image_row_pitch   = frame.step[0];
     747             desc_src.image_slice_pitch = 0;
     748             desc_src.num_mip_levels    = 0;
     749             desc_src.num_samples       = 0;
     750             desc_src.buffer            = 0;
     751             m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res);
     752             if (0 == m_img_src || CL_SUCCESS != res)
     753                 return -1;
     754 
     755             cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR;
     756 
     757             cl_image_desc desc_dst;
     758             desc_dst.image_type        = CL_MEM_OBJECT_IMAGE2D;
     759             desc_dst.image_width       = frame.cols;
     760             desc_dst.image_height      = frame.rows;
     761             desc_dst.image_depth       = 0;
     762             desc_dst.image_array_size  = 0;
     763             desc_dst.image_row_pitch   = 0;
     764             desc_dst.image_slice_pitch = 0;
     765             desc_dst.num_mip_levels    = 0;
     766             desc_dst.num_samples       = 0;
     767             desc_dst.buffer            = 0;
     768             mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res);
     769             if (0 == mem || CL_SUCCESS != res)
     770                 return -1;
     771 
     772             size_t origin[] = { 0, 0, 0 };
     773             size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 };
     774             res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event);
     775             if (CL_SUCCESS != res)
     776                 return -1;
     777 
     778             res = clWaitForEvents(1, &m_event);
     779             if (CL_SUCCESS != res)
     780                 return -1;
     781 
     782             res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src);
     783             if (CL_SUCCESS != res)
     784                 return -1;
     785 
     786             res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem);
     787             if (CL_SUCCESS != res)
     788                 return -1;
     789 
     790             kernel = m_kernelImg;
     791         }
     792     }
     793 
     794     m_event = clCreateUserEvent(m_context, &res);
     795     if (0 == m_event || CL_SUCCESS != res)
     796         return -1;
     797 
     798     // process left half of frame in OpenCL
     799     size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows };
     800     res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event);
     801     if (CL_SUCCESS != res)
     802         return -1;
     803 
     804     res = clWaitForEvents(1, &m_event);
     805     if (CL_SUCCESS != res)
     806         return - 1;
     807 
     808     mem_obj[0] = mem;
     809 
     810     return  0;
     811 }
     812 
     813 
     814 // this function is an example of interoperability between OpenCL buffer
     815 // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer
     816 // to OpenCV UMat and then do blur on these data
     817 int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u)
     818 {
     819     cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u);
     820 
     821     // process right half of frame in OpenCV
     822     cv::Point pt(u.cols / 2, 0);
     823     cv::Size  sz(u.cols / 2, u.rows);
     824     cv::Rect roi(pt, sz);
     825     cv::UMat uroi(u, roi);
     826     cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
     827 
     828     if (buffer)
     829         clReleaseMemObject(buffer);
     830     m_mem_obj = 0;
     831 
     832     return 0;
     833 }
     834 
     835 
     836 // this function is an example of interoperability between OpenCL image
     837 // and OpenCV UMat objects. It converts OpenCL image
     838 // to OpenCV UMat and then do blur on these data
     839 int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u)
     840 {
     841     cv::ocl::convertFromImage(image, u);
     842 
     843     // process right half of frame in OpenCV
     844     cv::Point pt(u.cols / 2, 0);
     845     cv::Size  sz(u.cols / 2, u.rows);
     846     cv::Rect roi(pt, sz);
     847     cv::UMat uroi(u, roi);
     848     cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
     849 
     850     if (image)
     851         clReleaseMemObject(image);
     852     m_mem_obj = 0;
     853 
     854     if (m_img_src)
     855         clReleaseMemObject(m_img_src);
     856     m_img_src = 0;
     857 
     858     return 0;
     859 }
     860 
     861 
     862 int App::run()
     863 {
     864     if (0 != initOpenCL())
     865         return -1;
     866 
     867     if (0 != initVideoSource())
     868         return -1;
     869 
     870     Mat img_to_show;
     871 
     872     // set running state until ESC pressed
     873     setRunning(true);
     874     // set process flag to show some data processing
     875     // can be toggled on/off by 'p' button
     876     setDoProcess(true);
     877     // set use buffer flag,
     878     // when it is set to true, will demo interop opencl buffer and cv::Umat,
     879     // otherwise demo interop opencl image and cv::UMat
     880     // can be switched on/of by SPACE button
     881     setUseBuffer(true);
     882 
     883     // Iterate over all frames
     884     while (isRunning() && nextFrame(m_frame))
     885     {
     886         cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY);
     887 
     888         UMat uframe;
     889 
     890         // work
     891         timerStart();
     892 
     893         if (doProcess())
     894         {
     895             process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj);
     896 
     897             if (useBuffer())
     898                 process_cl_buffer_with_opencv(
     899                     m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe);
     900             else
     901                 process_cl_image_with_opencv(m_mem_obj, uframe);
     902         }
     903         else
     904         {
     905             m_frameGray.copyTo(uframe);
     906         }
     907 
     908         timerEnd();
     909 
     910         uframe.copyTo(img_to_show);
     911 
     912         putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
     913         putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
     914         putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
     915         cv::String memtype = useBuffer() ? "buffer" : "image";
     916         putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
     917         putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
     918 
     919         imshow("opencl_interop", img_to_show);
     920 
     921         handleKey((char)waitKey(3));
     922     }
     923 
     924     return 0;
     925 }
     926 
     927 
     928 void App::handleKey(char key)
     929 {
     930     switch (key)
     931     {
     932     case 27:
     933         setRunning(false);
     934         break;
     935 
     936     case ' ':
     937         setUseBuffer(!useBuffer());
     938         break;
     939 
     940     case 'p':
     941     case 'P':
     942         setDoProcess( !doProcess() );
     943         break;
     944 
     945     default:
     946         break;
     947     }
     948 }
     949 
     950 
     951 inline void App::timerStart()
     952 {
     953     m_t0 = getTickCount();
     954 }
     955 
     956 
     957 inline void App::timerEnd()
     958 {
     959     m_t1 = getTickCount();
     960     int64 delta = m_t1 - m_t0;
     961     m_time = (delta / m_frequency) * 1000; // units msec
     962 }
     963 
     964 
     965 inline string App::timeStr() const
     966 {
     967     stringstream ss;
     968     ss << std::fixed << std::setprecision(1) << m_time;
     969     return ss.str();
     970 }
     971 
     972 
     973 int main(int argc, char** argv)
     974 {
     975     const char* keys =
     976         "{ help h ?    |          | print help message }"
     977         "{ camera c    | -1       | use camera as input }"
     978         "{ video  v    |          | use video as input }";
     979 
     980     CommandLineParser cmd(argc, argv, keys);
     981     if (cmd.has("help"))
     982     {
     983         cmd.printMessage();
     984         return EXIT_SUCCESS;
     985     }
     986 
     987     App app(cmd);
     988 
     989     try
     990     {
     991         app.run();
     992     }
     993 
     994     catch (const cv::Exception& e)
     995     {
     996         cout << "error: " << e.what() << endl;
     997         return 1;
     998     }
     999 
    1000     catch (const std::exception& e)
    1001     {
    1002         cout << "error: " << e.what() << endl;
    1003         return 1;
    1004     }
    1005 
    1006     catch (...)
    1007     {
    1008         cout << "unknown exception" << endl;
    1009         return 1;
    1010     }
    1011 
    1012     return EXIT_SUCCESS;
    1013 } // main()
    View Code

         官方给出的代码,有很多和OpenCL平台相关的处理部分,看起的比较冗杂,因此我们就不直接移植这个代码了(参考部分思想和API接口的基本调用),重新写一个新的基于Opencv3.1.0的OCL程序的Demo框架,基于TI的AM57x系列的板卡,其他的设备类似:

    1. Opencv3.1版本OpenCL支持情况:

            首先对于Opencv3.1.0版本,从正式发布Opencv3开始,其对OpenCL的支持已经发生了很大的变化,在之前需要使用cv::ocl相关函数API来实现kernel的编译调用等等,在其中还包括了很多的数据搬移,而Opencv3正式改变了这样的情况,重新封装了一个新的数据类型cv::UMat,这个数据类型能够无缝对接Opencv的普通接口,从而最少的改动代码而最大的完成OpenCL平台的加速功能[OpenCV3.x-OpenCL.pptx]

             如上图所示,只需要将原来的Mat格式换为UMat格式就可以实现Opencv函数在OpenCL设备上加速运行,而这其中具体实施的基本原理是什么呢?接下来看一下其底层实现的基本原理,具体参看Opencv中OpenCL部分实现的源代码:

            上图中表明了,当你使用的数据类型是UMat {data.isUmat()},并且开启了OpenCL使能{useOpenCL()},那么Opencv的接口将会跳转到OpenCL支持的设备中进行加速运行,当然你需要注意的是,在第一次使用OpenCL加速程序时,OpenCL需要编译生成对应平台的Kernel代码,而编译是需要花费大量的时间的,因此初次运行需要比较长的时间。

    2. 接下来开始AM57x系列的Opencv-OCL编程

    主要代码如下所示(Line28非常重要,使能OpenCL平台)

     1 #include <iostream>
     2 #include "opencv2/opencv.hpp"
     3 #include "opencv2/core/ocl.hpp"
     4 #include "opencv2/imgcodecs.hpp"
     5 #include "opencv2/videoio/videoio.hpp"
     6 #include "opencv2/highgui/highgui.hpp"
     7 #include "opencv2/imgproc/imgproc.hpp"
     8 
     9 using namespace std;
    10 using namespace cv;
    11 using namespace cv::ocl;
    12 
    13 #define DSP 1
    14 
    15 int main()
    16 {
    17     double t = 0.0;
    18 #if DSP
    19     std::vector<cv::ocl::PlatformInfo> plats;
    20     cv::ocl::getPlatfomsInfo(plats);
    21     const cv::ocl::PlatformInfo *platform = &plats[0];
    22     cout << "Platform Name:" << platform->name().c_str() << endl;
    23 
    24     cv::ocl::Device c_dev;
    25     platform->getDevice(c_dev,0);
    26     cout << "Device name:" << c_dev.name().c_str() << endl;
    27     c_dev.set(0);
    28     cv::ocl::setUseOpenCL(true);
    29     cout << "Use the OpenCL Deivice?" << cv::ocl::useOpenCL() << endl;
    30 
    31     cv::UMat Ori = cv::imread("/home/root/test.jpg").getUMat(cv::ACCESS_RW),Res,Canny;
    32 
    33     t = (double)cv::getTickCount();
    34     cv::cvtColor(Ori,Res,cv::COLOR_RGB2GRAY);
    35     cv::Canny(Res,Res,0,30);
    36     t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
    37     std::cout << "TI AM57x Accelerate Time Cost:" << t << "s" << std::endl;
    38     cv::imshow("Test",Ori);
    39     cv::imshow("Gray",Res);
    40 #else
    41     Mat I = imread("/home/root/test.jpg"),gray;
    42 
    43     t = (double)cv::getTickCount();
    44     cv::cvtColor(I,gray,cv::COLOR_RGB2GRAY);
    45     cv::Canny(gray,gray,0,30);
    46     t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
    47     std::cout << "CPU Time Cost:" << t << "s" << std::endl;
    48     cv::imshow("Ori",I);
    49     cv::imshow("Res",gray);
    50 #endif
    51 
    52     for (;;)
    53     {
    54         int key = waitKey();
    55         if (key == 27/*ESC*/ || key == 'q' || key == 'Q')
    56             break;
    57     }
    58     return 0;
    59 }

    编译Makfile:

     1 TARGET3 = ocl_demo
     2 CXX = arm-linux-gnueabihf-g++
     3 CFLAGS += -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/usr/lib 
     4           -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/lib 
     5           -I$(COMPILE_TOOL_PATH)/usr/include 
     6           -L$(COMPILE_TOOL_PATH)/usr/lib 
     7           -L$(COMPILE_TOOL_PATH)/lib -Wall -W 
     8           -std=c++98
     9 CFLAGS +=  -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt
    10 
    11 all:
    12     @$(CXX)  $(TARGET3).cpp -o $(TARGET3) $(CFLAGS)
    13 clean:
    14     rm -rf  $(TARGET3)
    View Code

    运行脚本opencv-ocl-runtime.sh

    1 export TI_OCL_LOAD_KERNELS_ONCHIP=Y
    2 export TI_OCL_CACHE_KERNELS=Y
    3 export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
    4 echo "OpenCL on, canny"
    5 ./ocl_demo
    6 export OPENCV_OPENCL_DEVICE='disabled'
    7 echo "OpenCL off, canny"
    8 ./ocl_demo

    使用make指令编译后在AM5718平台上运行结果如下所示:

     根据运行的结构可以看出,CPU运行时间是经过OpenCL-DSP平台加速后的十倍左右,因此能够明显体现出加速的效果。

    注1:我们需要注意的是在第一次运行此代码时,一定要注意,通过脚本运行ocl_demo,在脚本当中,第一行表示如果需要加载Kernel,是直接通过本地芯片上加载Kernel的,第二行表示如果编译了Kernel,将会将Kernel保存在Cache当中,下次调用(直到关机之前)就不需要编译了,如果没有这两个选项,代码就会编译Kernel而消耗大量的时间,因此这是必须的。

    注2:脚本中第三行是非常重要的,必须要在这里使能OpenCL设备,否则Opencv将永远不会调用DSP加速算法,而在CPU上运行,具体参考这里

    猜想与验证:当我在调用Opencv3.1中erode或者其他形态学相关的接口时,出现了经过DSP加速之后的效果居然比CPU端要差很多!!!这是为什么呢?猜想如下:

    猜想:首先我们使用的是TI的DSP Accelerator,型号是C66系列, 而这个系列的DSP处理器实际上只有不超过10个核心的Processor,而DSP最主要的优势是其包括的乘法器资源以及加法器资源,因此对于包括乘法加法等运算的这些情况,其加速效果比较明显,对于Opencv中的形态学相关API主要用到的是基本的if逻辑分支判断的情况,因此,不能够充分使用DSP当中的加法器或者乘法器,因此加速效果不明显,甚至更慢,如下所示(erode algorithm):

    验证:根据上述猜想,我们可以选择有多个核心的GPU来对erode算法加速,查看erode算法在GPU加速下的效率,加速效果如下所示:

    Reference:

    官方代码说明:https://www.khronos.org/opencl/

    Opencv(编译OCL模块)+VSC+MinGW环境搭建:https://www.cnblogs.com/uestc-mm/p/12758110.html

    TI官方OpenCV3.1-Release:https://git.ti.com/cgit/opencv/tiopencv/?h=tiopencvrelease_3.1

    TI官方AM57x Processor SDK Linux(OpenCV支持):http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_OpenCV.html

  • 相关阅读:
    使用Property+使用构造+构建自定义类+使用List<自定义类>(项目)
    几乎全面的食品英文总结
    网络资源深入剖析Binding(学习)
    WPF入门知识(学习)
    Get children gameobjects array and assign the script inside
    Unity3D + C#: Cloning the Content of a Serializable Class
    Use Web Fonts and the @fontface CSS Method
    Good XAML Patterns
    Coroutines & Yield
    How to make a mini map for your scene in Unity3d
  • 原文地址:https://www.cnblogs.com/uestc-mm/p/12963792.html
Copyright © 2011-2022 走看看