zoukankan      html  css  js  c++  java
  • OpenCL双边滤波实现美颜功能

        OpenCL是一个并行异构计算的框架,包括intel,AMD,英伟达等等许多厂家都有对它的支持,不过英伟达只到1.2版本,主要发展自己的CUDA去了。虽然没有用过CUDA,但个人感觉CUDA比OpenCL更好一点,但OpenCL支持面更管,CPU,GPU,DSP,FPGA等多种芯片都能支持OpenCL。OpenCL与D3D中的像素着色器非常相似。

    1.双边滤波原理

        双边滤波器的原理参考女神Rachel-Zhang的博客 双边滤波器的原理及实现. 引自Rachel-Zhang的博客,原理如下:

    双边滤波(Bilateral filter)是一种可以保边去噪的滤波器。之所以可以达到此去噪效果,是因为滤波器是由两个函数构成。一个函数是由几何空间距离决定滤波器系数。另一个由像素差值决定滤波器系数。可以与其相比较的两个filter:高斯低通滤波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值滤波器(去掉百分率为α的最小值和最大之后剩下像素的均值作为滤波器)。

    双边滤波器中,输出像素的值依赖于邻域像素的值的加权组合,

              权重系数w(i,j,k,l)取决于定义域核和值域核的乘积。同时考虑了空间域与值域的差别,而Gaussian Filter和α均值滤波分别只考虑了空间域和值域差别。

    本文基于这个公式用OpenCL实现双边滤波来做美颜。

    2.核函数

        磨皮算法原理参考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的肤色检测算法不好,我给去掉了,本来还要做个锐化处理的,但发现不做锐化效果也蛮好,所以就先没做,学下一步的OpenCL时在做锐化。

    const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
    
    kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst)  
    {
        int x = (int)get_global_id(0);  
        int y = (int)get_global_id(1);  
        if (x >= get_image_width(src) || y >= get_image_height(src))  
            return;  
    
        int ksize = 11;
        float sigma_d = 3.0;
        float sigma_r = 0.1;
    
        float4 fij = read_imagef(src, sampler, (int2)(x, y));
        float alpha = 0.2;
    
        float4 fkl;
        float dkl;
        float4 rkl;
        float4 wkl;
    
        float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f);
        float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
        for (int K = -ksize / 2; K <= ksize / 2; K++)
        {
            for (int L = -ksize / 2; L <= ksize / 2; L++)
            {
                fkl = read_imagef(src, sampler, (int2)(x + K, y + L));
    
                dkl = -(K*K + L*L) / (2 * sigma_d*sigma_d);
                rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / (2 * sigma_r*sigma_r);
                rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / (2 * sigma_r*sigma_r);
                rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / (2 * sigma_r*sigma_r);
    
                wkl.x = exp(dkl + rkl.x);
                wkl.y = exp(dkl + rkl.y);
                wkl.z = exp(dkl + rkl.z);
    
                numerator.x += fkl.x * wkl.x;
                numerator.y += fkl.y * wkl.y;
                numerator.z += fkl.z * wkl.z;
    
                denominator.x += wkl.x;
                denominator.y += wkl.y;
                denominator.z += wkl.z;
            }
        }
        
        float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
        if (denominator.x > 0 && denominator.y > 0 && denominator.z)
        {
            gij.x = numerator.x / denominator.x;
            gij.y = numerator.y / denominator.y;
            gij.z = numerator.z / denominator.z;
    
            //双边滤波后再做一个融合
             gij.x = fij.x*alpha + gij.x*(1.0 - alpha);
            gij.y = fij.y*alpha + gij.y*(1.0 - alpha);
            gij.z = fij.z*alpha + gij.z*(1.0 - alpha);
        }
    
        write_imagef(dst, (int2)(x, y), gij);
    }

    kernel函数里面基本就是把数学公式写出来,可以说是非常简单的。

    3.host端代码

        OpenCL代码分为host端的代码和device端的代码,kernel是跑在并行设备device上的,host一般适合跑串行的逻辑性强的代码,device则比较适合用来做计算,如卷积运算。计算机中,通常把CPU当host,把GPU当device。不过实际上CPU也可以作为device,因为intel也是支持OpenCL的。本文以CPU为host,GPU为device。

    #include "stdafx.h"
    
    #include <iostream>  
    #include <fstream>  
    #include <sstream>  
    #include <malloc.h> 
    #include <string.h>  
    #include <opencv2/opencv.hpp>  
    
    #include <CL/cl.h>  
     
     
     //----------获取OpenCL平台设备信息---------
    
    void DisplayPlatformInfo(
        cl_platform_id id,
        cl_platform_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;
    
        errNum = clGetPlatformInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
            return;
        }
    
        char * info = (char *)alloca(sizeof(char) * paramValueSize);
        errNum = clGetPlatformInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
            return;
        }
    
        std::cout << "	" << str << ":	" << info << std::endl;
    }
    
    template<typename T>
    void appendBitfield(T info, T value, std::string name, std::string & str)
    {
        if (info & value)
        {
            if (str.length() > 0)
            {
                str.append(" | ");
            }
            str.append(name);
        }
    }
    
    ///
    // Display information for a particular device.
    // As different calls to clGetDeviceInfo may return
    // values of different types a template is used. 
    // As some values returned are arrays of values, a templated class is
    // used so it can be specialized for this case, see below.
    //
    template <typename T>
    class InfoDevice
    {
    public:
        static void display(
            cl_device_id id,
            cl_device_info name,
            std::string str)
        {
            cl_int errNum;
            std::size_t paramValueSize;
    
            errNum = clGetDeviceInfo(
                id,
                name,
                0,
                NULL,
                &paramValueSize);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
                return;
            }
    
            T * info = (T *)alloca(sizeof(T) * paramValueSize);
            errNum = clGetDeviceInfo(
                id,
                name,
                paramValueSize,
                info,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
                return;
            }
    
            // Handle a few special cases
            switch (name)
            {
            case CL_DEVICE_TYPE:
            {
                std::string deviceType;
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_CPU,
                    "CL_DEVICE_TYPE_CPU",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_GPU,
                    "CL_DEVICE_TYPE_GPU",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_ACCELERATOR,
                    "CL_DEVICE_TYPE_ACCELERATOR",
                    deviceType);
    
                appendBitfield<cl_device_type>(
                    *(reinterpret_cast<cl_device_type*>(info)),
                    CL_DEVICE_TYPE_DEFAULT,
                    "CL_DEVICE_TYPE_DEFAULT",
                    deviceType);
    
                std::cout << "		" << str << ":	" << deviceType << std::endl;
            }
                break;
            case CL_DEVICE_SINGLE_FP_CONFIG:
            {
                std::string fpType;
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_DENORM,
                    "CL_FP_DENORM",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_INF_NAN,
                    "CL_FP_INF_NAN",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_NEAREST,
                    "CL_FP_ROUND_TO_NEAREST",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_ZERO,
                    "CL_FP_ROUND_TO_ZERO",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_ROUND_TO_INF,
                    "CL_FP_ROUND_TO_INF",
                    fpType);
    
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_FMA,
                    "CL_FP_FMA",
                    fpType);
    
    #ifdef CL_FP_SOFT_FLOAT
                appendBitfield<cl_device_fp_config>(
                    *(reinterpret_cast<cl_device_fp_config*>(info)),
                    CL_FP_SOFT_FLOAT,
                    "CL_FP_SOFT_FLOAT",
                    fpType);
    #endif
    
                std::cout << "		" << str << ":	" << fpType << std::endl;
            }
            case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
            {
                std::string memType;
    
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_NONE,
                    "CL_NONE",
                    memType);
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_READ_ONLY_CACHE,
                    "CL_READ_ONLY_CACHE",
                    memType);
    
                appendBitfield<cl_device_mem_cache_type>(
                    *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                    CL_READ_WRITE_CACHE,
                    "CL_READ_WRITE_CACHE",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_LOCAL_MEM_TYPE:
            {
                std::string memType;
    
                appendBitfield<cl_device_local_mem_type>(
                    *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                    CL_GLOBAL,
                    "CL_LOCAL",
                    memType);
    
                appendBitfield<cl_device_local_mem_type>(
                    *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                    CL_GLOBAL,
                    "CL_GLOBAL",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_EXECUTION_CAPABILITIES:
            {
                std::string memType;
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_EXEC_KERNEL,
                    "CL_EXEC_KERNEL",
                    memType);
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_EXEC_NATIVE_KERNEL,
                    "CL_EXEC_NATIVE_KERNEL",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            case CL_DEVICE_QUEUE_PROPERTIES:
            {
                std::string memType;
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
                    "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
                    memType);
    
                appendBitfield<cl_device_exec_capabilities>(
                    *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                    CL_QUEUE_PROFILING_ENABLE,
                    "CL_QUEUE_PROFILING_ENABLE",
                    memType);
    
                std::cout << "		" << str << ":	" << memType << std::endl;
            }
                break;
            default:
                std::cout << "		" << str << ":	" << *info << std::endl;
                break;
            }
        }
    };
    
    ///
    // Simple trait class used to wrap base types.
    //
    template <typename T>
    class ArrayType
    {
    public:
        static bool isChar() { return false; }
    };
    
    ///
    // Specialized for the char (i.e. null terminated string case).
    //
    template<>
    class ArrayType<char>
    {
    public:
        static bool isChar() { return true; }
    };
    
    ///
    // Specialized instance of class InfoDevice for array types.
    //
    template <typename T>
    class InfoDevice<ArrayType<T> >
    {
    public:
        static void display(
            cl_device_id id,
            cl_device_info name,
            std::string str)
        {
            cl_int errNum;
            std::size_t paramValueSize;
    
            errNum = clGetDeviceInfo(
                id,
                name,
                0,
                NULL,
                &paramValueSize);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << str
                    << "."
                    << std::endl;
                return;
            }
    
            T * info = (T *)alloca(sizeof(T) * paramValueSize);
            errNum = clGetDeviceInfo(
                id,
                name,
                paramValueSize,
                info,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << str
                    << "."
                    << std::endl;
                return;
            }
    
            if (ArrayType<T>::isChar())
            {
                std::cout << "	" << str << ":	" << info << std::endl;
            }
            else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
            {
                cl_uint maxWorkItemDimensions;
    
                errNum = clGetDeviceInfo(
                    id,
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                    sizeof(cl_uint),
                    &maxWorkItemDimensions,
                    NULL);
                if (errNum != CL_SUCCESS)
                {
                    std::cerr
                        << "Failed to find OpenCL device info "
                        << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
                        << std::endl;
                    return;
                }
    
                std::cout << "	" << str << ":	";
                for (cl_uint i = 0; i < maxWorkItemDimensions; i++)
                {
                    std::cout << info[i] << " ";
                }
                std::cout << std::endl;
            }
        }
    };
    
    ///
    //  Enumerate platforms and display information about them 
    //  and their associated devices.
    //
    void displayInfo(void)
    {
        cl_int errNum;
        cl_uint numPlatforms;
        cl_platform_id * platformIds;
        cl_context context = NULL;
    
        // First, query the total number of platforms
        errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
        if (errNum != CL_SUCCESS || numPlatforms <= 0)
        {
            std::cerr << "Failed to find any OpenCL platform." << std::endl;
            return;
        }
    
        // Next, allocate memory for the installed plaforms, and qeury 
        // to get the list.
        platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms);
        // First, query the total number of platforms
        errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find any OpenCL platforms." << std::endl;
            return;
        }
    
        std::cout << "Number of platforms: 	" << numPlatforms << std::endl;
        // Iterate through the list of platforms displaying associated information
        for (cl_uint i = 0; i < numPlatforms; i++) {
            // First we display information associated with the platform
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_PROFILE,
                "CL_PLATFORM_PROFILE");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_VERSION,
                "CL_PLATFORM_VERSION");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_VENDOR,
                "CL_PLATFORM_VENDOR");
            DisplayPlatformInfo(
                platformIds[i],
                CL_PLATFORM_EXTENSIONS,
                "CL_PLATFORM_EXTENSIONS");
    
            // Now query the set of devices associated with the platform
            cl_uint numDevices;
            errNum = clGetDeviceIDs(
                platformIds[i],
                CL_DEVICE_TYPE_ALL,
                0,
                NULL,
                &numDevices);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL devices." << std::endl;
                return;
            }
    
            cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices);
            errNum = clGetDeviceIDs(
                platformIds[i],
                CL_DEVICE_TYPE_ALL,
                numDevices,
                devices,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr << "Failed to find OpenCL devices." << std::endl;
                return;
            }
    
            std::cout << "	Number of devices: 	" << numDevices << std::endl;
            // Iterate through each device, displaying associated information
            for (cl_uint j = 0; j < numDevices; j++)
            {
                InfoDevice<cl_device_type>::display(
                    devices[j],
                    CL_DEVICE_TYPE,
                    "CL_DEVICE_TYPE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_VENDOR_ID,
                    "CL_DEVICE_VENDOR_ID");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_COMPUTE_UNITS,
                    "CL_DEVICE_MAX_COMPUTE_UNITS");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                    "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");
    
                InfoDevice<ArrayType<size_t> >::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_ITEM_SIZES,
                    "CL_DEVICE_MAX_WORK_ITEM_SIZES");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_MAX_WORK_GROUP_SIZE,
                    "CL_DEVICE_MAX_WORK_GROUP_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE");
    
    #ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
                    "CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_INT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
                    "CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
    #endif
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_CLOCK_FREQUENCY,
                    "CL_DEVICE_MAX_CLOCK_FREQUENCY");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_ADDRESS_BITS,
                    "CL_DEVICE_ADDRESS_BITS");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE,
                    "CL_DEVICE_MAX_MEM_ALLOC_SIZE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_IMAGE_SUPPORT,
                    "CL_DEVICE_IMAGE_SUPPORT");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_READ_IMAGE_ARGS,
                    "CL_DEVICE_MAX_READ_IMAGE_ARGS");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
                    "CL_DEVICE_MAX_WRITE_IMAGE_ARGS");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE2D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE2D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT,
                    "CL_DEVICE_IMAGE2D_MAX_HEIGHT");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_WIDTH,
                    "CL_DEVICE_IMAGE3D_MAX_WIDTH");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT,
                    "CL_DEVICE_IMAGE3D_MAX_HEIGHT");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_IMAGE3D_MAX_DEPTH,
                    "CL_DEVICE_IMAGE3D_MAX_DEPTH");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_SAMPLERS,
                    "CL_DEVICE_MAX_SAMPLERS");
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_MAX_PARAMETER_SIZE,
                    "CL_DEVICE_MAX_PARAMETER_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN,
                    "CL_DEVICE_MEM_BASE_ADDR_ALIGN");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
                    "CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE");
    
                InfoDevice<cl_device_fp_config>::display(
                    devices[j],
                    CL_DEVICE_SINGLE_FP_CONFIG,
                    "CL_DEVICE_SINGLE_FP_CONFIG");
    
                InfoDevice<cl_device_mem_cache_type>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
                    "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_GLOBAL_MEM_SIZE,
                    "CL_DEVICE_GLOBAL_MEM_SIZE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
                    "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE");
    
                InfoDevice<cl_uint>::display(
                    devices[j],
                    CL_DEVICE_MAX_CONSTANT_ARGS,
                    "CL_DEVICE_MAX_CONSTANT_ARGS");
    
                InfoDevice<cl_device_local_mem_type>::display(
                    devices[j],
                    CL_DEVICE_LOCAL_MEM_TYPE,
                    "CL_DEVICE_LOCAL_MEM_TYPE");
    
                InfoDevice<cl_ulong>::display(
                    devices[j],
                    CL_DEVICE_LOCAL_MEM_SIZE,
                    "CL_DEVICE_LOCAL_MEM_SIZE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT,
                    "CL_DEVICE_ERROR_CORRECTION_SUPPORT");
    
    #ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_HOST_UNIFIED_MEMORY,
                    "CL_DEVICE_HOST_UNIFIED_MEMORY");
    #endif
    
                InfoDevice<std::size_t>::display(
                    devices[j],
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION,
                    "CL_DEVICE_PROFILING_TIMER_RESOLUTION");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_ENDIAN_LITTLE,
                    "CL_DEVICE_ENDIAN_LITTLE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_AVAILABLE,
                    "CL_DEVICE_AVAILABLE");
    
                InfoDevice<cl_bool>::display(
                    devices[j],
                    CL_DEVICE_COMPILER_AVAILABLE,
                    "CL_DEVICE_COMPILER_AVAILABLE");
    
                InfoDevice<cl_device_exec_capabilities>::display(
                    devices[j],
                    CL_DEVICE_EXECUTION_CAPABILITIES,
                    "CL_DEVICE_EXECUTION_CAPABILITIES");
    
                InfoDevice<cl_command_queue_properties>::display(
                    devices[j],
                    CL_DEVICE_QUEUE_PROPERTIES,
                    "CL_DEVICE_QUEUE_PROPERTIES");
    
                InfoDevice<cl_platform_id>::display(
                    devices[j],
                    CL_DEVICE_PLATFORM,
                    "CL_DEVICE_PLATFORM");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_NAME,
                    "CL_DEVICE_NAME");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_VENDOR,
                    "CL_DEVICE_VENDOR");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DRIVER_VERSION,
                    "CL_DRIVER_VERSION");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_PROFILE,
                    "CL_DEVICE_PROFILE");
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_VERSION,
                    "CL_DEVICE_VERSION");
    
    #ifdef CL_DEVICE_OPENCL_C_VERSION
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_OPENCL_C_VERSION,
                    "CL_DEVICE_OPENCL_C_VERSION");
    #endif
    
                InfoDevice<ArrayType<char> >::display(
                    devices[j],
                    CL_DEVICE_EXTENSIONS,
                    "CL_DEVICE_EXTENSIONS");
    
    
                std::cout << std::endl << std::endl;
            }
        }
    }
    
    //-----------以上为获取并显示OpenCL设备信息的代码------------------
    
    cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)  
    {  
        cl_int errNum;  
        cl_program program;  
    
        std::ifstream kernelFile(fileName, std::ios::in);  
        if (!kernelFile.is_open())  
        {  
            std::cerr << "Failed to open file for reading: " << fileName << std::endl;  
            return NULL;  
        }  
    
        std::ostringstream oss;  
        oss << kernelFile.rdbuf();  
    
        std::string srcStdStr = oss.str();  
        const char *srcStr = srcStdStr.c_str();  
        program = clCreateProgramWithSource(context, 1,  
            (const char**)&srcStr,  
            NULL, NULL);  
        if (program == NULL)  
        {  
            std::cerr << "Failed to create CL program from source." << std::endl;  
            return NULL;  
        }  
    
        errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);  
        if (errNum != CL_SUCCESS)  
        {  
            // Determine the reason for the error  
            char buildLog[16384];  
            clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,  
                sizeof(buildLog), buildLog, NULL);  
    
            std::cerr << "Error in kernel: " << std::endl;  
            std::cerr << buildLog;  
            clReleaseProgram(program);  
            return NULL;  
        }  
    
        return program;  
    }  
    
    
    void Cleanup(cl_context context, cl_command_queue commandQueue,  
                 cl_program program, cl_kernel kernel, cl_mem imageObjects[2])  
    {  
        for (int i = 0; i < 2; i++)  
        {  
            if (imageObjects[i] != 0)  
                clReleaseMemObject(imageObjects[i]);  
        }  
        if (commandQueue != 0)  
            clReleaseCommandQueue(commandQueue);  
    
        if (kernel != 0)  
            clReleaseKernel(kernel);  
    
        if (program != 0)  
            clReleaseProgram(program);  
    
        if (context != 0)  
            clReleaseContext(context);  
    
    }  
      
    cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height)  
    {  
        cv::Mat image1 = cv::imread(fileName);  
        width = image1.cols;  
        height = image1.rows;  
        char *buffer = new char[width * height * 4];  
        int w = 0;  
        for (int v = height - 1; v >= 0; v--)  
        {  
            for (int u = 0; u <width; u++)  
            {  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[0];  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[1];  
                buffer[w++] = image1.at<cv::Vec3b>(v, u)[2];  
                w++;  
            }  
        }  
    
        // Create OpenCL image  
        cl_image_format clImageFormat;  
        clImageFormat.image_channel_order = CL_RGBA;  
        clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
    
        cl_int errNum;  
        cl_mem clImage;  
        clImage = clCreateImage2D(context,  
            CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,  
            &clImageFormat,  
            width,  
            height,  
            0,  
            buffer,  
            &errNum);  
    
        if (errNum != CL_SUCCESS)  
        {  
            std::cerr << "Error creating CL image object" << std::endl;  
            return 0;  
        }  
    
        return clImage;  
    }  
    
    size_t RoundUp(int groupSize, int globalSize)  
    {  
        int r = globalSize % groupSize;  
        if (r == 0)  
        {  
            return globalSize;  
        }  
        else  
        {  
            return globalSize + groupSize - r;  
        }  
    }  
    
    int main(int argc, char** argv)  
    {  
        cl_context context = 0;  
        cl_command_queue commandQueue = 0;  
        cl_program program = 0;  
        cl_device_id device = 0;  
        cl_kernel kernel = 0;  
        cl_mem imageObjects[2] = { 0, 0 };  
        cl_int errNum;  
    
        //打印所有OpenCL平台设备信息
        displayInfo();
    
        cl_uint numplatforms;
        errNum = clGetPlatformIDs(0, NULL, &numplatforms);
        if (errNum != CL_SUCCESS || numplatforms <= 0){
            printf("没有找到OpenCL平台 
    ");
            return 1;
        }
    
        cl_platform_id * platformIds;
        platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms);
        errNum = clGetPlatformIDs(numplatforms, platformIds, NULL);
        if (errNum != CL_SUCCESS){
            printf("没有找到OpenCL平台 
    ");
            return 1;
        }
        printf("平台数:%d 
    ", numplatforms);
    
        //选用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡
        cl_uint numDevices,index_platform = 0,index_device = 0;
        cl_device_id *devicesIds;
        std::size_t paramValueSize = 0;
        for (cl_uint i = 0; i < numplatforms; i++){
            errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
            if (errNum != CL_SUCCESS || numDevices <= 0){
                printf("平台 %d 没有找到设备",i);
                continue;
            }
            devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices);
            errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL);
            if (errNum != CL_SUCCESS ){
                printf("平台 %d 获取设备ID失败", i);
                continue;
            }
    
            for (cl_uint j = 0; j < numDevices; j++){
                std::size_t tmpSize = 0;
                errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL);
                if (errNum != CL_SUCCESS){
                    std::cerr << "Failed to find OpenCL device info " << std::endl;
                    continue;
                }
    
                if (tmpSize >= paramValueSize){
                    index_platform = i;
                    index_device = j;
                }
            }
        }
    
        cl_context_properties contextProperties[] ={
            CL_CONTEXT_PLATFORM,
            (cl_context_properties)platformIds[index_platform],
            0
        };
        context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum);
        if (errNum != CL_SUCCESS){
            std::cerr << "Failed to Create Context " << std::endl;
            return 1;
        }
    
        device = devicesIds[index_device];
    
        // Create a command-queue on the first device available  
        // on the created context  
        commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum);
        if (commandQueue == NULL)  {  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Make sure the device supports images, otherwise exit  
        cl_bool imageSupport = CL_FALSE;  
        clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL);  
        if (imageSupport != CL_TRUE)  {  
            std::cerr << "OpenCL device does not support images." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Load input image from file and load it into  
        // an OpenCL image object  
        int width, height;  
        char *src0 = "test.png";
        imageObjects[0] = LoadImage(context, src0, width, height);  
        if (imageObjects[0] == 0)  {  
            std::cerr << "Error loading: " << std::string(src0) << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Create ouput image object  
        cl_image_format clImageFormat;  
        clImageFormat.image_channel_order = CL_RGBA;  
        clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
        imageObjects[1] = clCreateImage2D(context,  
            CL_MEM_WRITE_ONLY,  
            &clImageFormat,  
            width,  
            height,  
            0,  
            NULL,  
            &errNum);  
    
        if (errNum != CL_SUCCESS){  
            std::cerr << "Error creating CL output image object." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Create OpenCL program  
        program = CreateProgram(context, device, "bilateralBlur.cl");  
        if (program == NULL)  {  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
        // Create OpenCL kernel  
        kernel = clCreateKernel(program, "bilateralBlur", NULL);  
        if (kernel == NULL)  {  
            std::cerr << "Failed to create kernel" << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        // Set the kernel arguments  
        errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);  
        errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);  
        if (errNum != CL_SUCCESS)  {  
            std::cerr << "Error setting kernel arguments." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
            system("pause") ; return 1; 
        }  
    
        size_t localWorkSize[2] = { 32, 32 };  
        size_t globalWorkSize[2] = { RoundUp(localWorkSize[0], width),  
            RoundUp(localWorkSize[1], height) };  
    
        cl_event prof_event;
    
        // Queue the kernel up for execution  
        errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,  
            globalWorkSize, localWorkSize,  
            0, NULL, &prof_event);
        if (errNum != CL_SUCCESS)  
        {  
            std::cerr << "Error queuing kernel for execution." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }
    
        clFinish(commandQueue);
        errNum = clWaitForEvents(1, &prof_event);
        if (errNum)
        {
            printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)
    ", errNum);
            return EXIT_FAILURE;
        }
    
        cl_ulong ev_start_time = (cl_ulong)0;
        cl_ulong ev_end_time = (cl_ulong)0;
        size_t return_bytes;
    
        errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes);
        errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes);
        if (errNum)
        {
            printf("clGetEventProfilingInfo() failed for kernel. (%d)
    ", errNum);
            return EXIT_FAILURE;
        }
    
        double run_time = (double)(ev_end_time - ev_start_time);
    
        printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8
    ", width, height);
        printf("Work Timer:%lfms
    ", run_time / 1000000);
    
        clReleaseEvent(prof_event);
    
        // Read the output buffer back to the Host  
        char *buffer = new char[width * height * 4];  
        size_t origin[3] = { 0, 0, 0 };  
        size_t region[3] = { width, height, 1 };  
        errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,  
            origin, region, 0, 0, buffer,  
            0, NULL, NULL);  
        if (errNum != CL_SUCCESS)  {  
            std::cerr << "Error reading result buffer." << std::endl;  
            Cleanup(context, commandQueue, program, kernel, imageObjects);  
             system("pause") ; return 1; 
        }  
    
        std::cout << std::endl;  
        std::cout << "Executed program succesfully." << std::endl;  
    
        // Save the image out to disk  
        char *saveImage = "output.jpg";
        //std::cout << buffer << std::endl;  
        cv::Mat imageColor = cv::imread(src0);  
        cv::Mat imageColor2;  
        imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type());  
        int w = 0;  
        for (int v = imageColor2.rows-1; v >=0; v--)  {  
            for (int u =0 ; u <imageColor2.cols; u++)  {  
                imageColor2.at<cv::Vec3b>(v, u)[0] = buffer[w++];  
                imageColor2.at<cv::Vec3b>(v, u)[1] = buffer[w++];  
                imageColor2.at<cv::Vec3b>(v, u)[2] = buffer[w++];  
                w++;  
            }  
        }
    
        cv::imshow("原始图像", imageColor);
        cv::imshow("磨皮后", imageColor2);  
        cv::imwrite(saveImage, imageColor2);  
        cv::waitKey(0);  
    
        delete[] buffer;  
    
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
    
        return 0;  
    }

        这个host端的程序包含了opencv的一点内容,主要是用opencv来读取图片,用其他方式读取图片当然也是可以的。实际上,opencv本身有一个ocl模块,貌似是由AMD给opencv做得OpenCL扩展,其中包括了许多用OpenCL实现的opencv的一些常用函数,其中就已经包括了双边滤波和自适应双边滤波。

        这段程序选用了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡,最佳的OpenCL设备的选择应当综合考虑,在我的电脑上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU似乎就是最佳的OpenCL设备,虽然在实际获取的设备信息中CPU的许多参数比GPU强,但是实际运行的时长却是GPU的几倍,所以对于用哪些参数来判断一个OpenCL设备是最佳的我也不是很清楚,希望懂得朋友可以指导一二。

        另外,这段程序其实是很简单的,实际有效的代码只有300多行,获取设备信息的代码只是为了看看自己的电脑上有哪些OpenCL设备以及相关的信息,main中的displayInfo();完全可以注释掉。

        另外关于OpenCL库文件的获取,可以从intel,英伟达,AMD等获取到,我所使用的OpenCL的头文件和lib文件就是从英伟达的CUDA里面copy出来的,你也可以直接就是用我的。

    4.运行结果

    (1)硬件信息

    imageimage

    (2)控制台输出OpenCL设备的信息

    Number of platforms:    2
            CL_PLATFORM_PROFILE:    FULL_PROFILE
            CL_PLATFORM_VERSION:    OpenCL 2.0
            CL_PLATFORM_VENDOR:     Intel(R) Corporation
            CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
            Number of devices:      2
                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                    CL_DEVICE_VENDOR_ID:    32902
                    CL_DEVICE_MAX_COMPUTE_UNITS:    24
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  256 256 256
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  256
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        0
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     1
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  1050
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   390280806
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  128
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                    CL_DEVICE_MAX_SAMPLERS: 16
                    CL_DEVICE_MAX_PARAMETER_SIZE:   1024
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        524288
                    CL_DEVICE_GLOBAL_MEM_SIZE:      1561123226
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                    CL_DEVICE_MAX_CONSTANT_ARGS:    8
                    CL_DEVICE_LOCAL_MEM_TYPE:
                    CL_DEVICE_LOCAL_MEM_SIZE:       65536
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   83
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00DEC488
            CL_DEVICE_NAME: Intel(R) HD Graphics 520
            CL_DEVICE_VENDOR:       Intel(R) Corporation
            CL_DRIVER_VERSION:      20.19.15.4364
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 2.0
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
            CL_DEVICE_EXTENSIONS:   cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir


                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
                    CL_DEVICE_VENDOR_ID:    32902
                    CL_DEVICE_MAX_COMPUTE_UNITS:    4
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  8192 8192 8192
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  8192
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     32
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    16
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      8
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     4
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    8
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   4
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  2500
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536838144
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  480
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    2048
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   2048
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                    CL_DEVICE_MAX_SAMPLERS: 480
                    CL_DEVICE_MAX_PARAMETER_SIZE:   3840
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        262144
                    CL_DEVICE_GLOBAL_MEM_SIZE:      2147352576
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     131072
                    CL_DEVICE_MAX_CONSTANT_ARGS:    480
                    CL_DEVICE_LOCAL_MEM_TYPE:       CL_LOCAL | CL_GLOBAL
                    CL_DEVICE_LOCAL_MEM_SIZE:       32768
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   395
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00DEC488
            CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
            CL_DEVICE_VENDOR:       Intel(R) Corporation
            CL_DRIVER_VERSION:      5.2.0.10094
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 2.0 (Build 10094)
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
            CL_DEVICE_EXTENSIONS:   cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer


            CL_PLATFORM_PROFILE:    FULL_PROFILE
            CL_PLATFORM_VERSION:    OpenCL 1.2 CUDA 8.0.44
            CL_PLATFORM_VENDOR:     NVIDIA Corporation
            CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
            Number of devices:      1
                    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                    CL_DEVICE_VENDOR_ID:    4318
                    CL_DEVICE_MAX_COMPUTE_UNITS:    3
                    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
            CL_DEVICE_MAX_WORK_ITEM_SIZES:  1024 1024 64
                    CL_DEVICE_MAX_WORK_GROUP_SIZE:  1024
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:        1
                    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   1
                    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                    CL_DEVICE_MAX_CLOCK_FREQUENCY:  1241
                    CL_DEVICE_ADDRESS_BITS: 32
                    CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536870912
                    CL_DEVICE_IMAGE_SUPPORT:        1
                    CL_DEVICE_MAX_READ_IMAGE_ARGS:  256
                    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                    CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                    CL_DEVICE_IMAGE3D_MAX_WIDTH:    4096
                    CL_DEVICE_IMAGE3D_MAX_HEIGHT:   4096
                    CL_DEVICE_IMAGE3D_MAX_DEPTH:    4096
                    CL_DEVICE_MAX_SAMPLERS: 32
                    CL_DEVICE_MAX_PARAMETER_SIZE:   4352
                    CL_DEVICE_MEM_BASE_ADDR_ALIGN:  4096
                    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
                    CL_DEVICE_SINGLE_FP_CONFIG:     CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:        CL_READ_WRITE_CACHE
                    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    128
                    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        49152
                    CL_DEVICE_GLOBAL_MEM_SIZE:      2147483648
                    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                    CL_DEVICE_MAX_CONSTANT_ARGS:    9
                    CL_DEVICE_LOCAL_MEM_TYPE:
                    CL_DEVICE_LOCAL_MEM_SIZE:       49152
                    CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                    CL_DEVICE_HOST_UNIFIED_MEMORY:  0
                    CL_DEVICE_PROFILING_TIMER_RESOLUTION:   1000
                    CL_DEVICE_ENDIAN_LITTLE:        1
                    CL_DEVICE_AVAILABLE:    1
                    CL_DEVICE_COMPILER_AVAILABLE:   1
                    CL_DEVICE_EXECUTION_CAPABILITIES:       CL_EXEC_KERNEL
                    CL_DEVICE_QUEUE_PROPERTIES:     CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
                    CL_DEVICE_PLATFORM:     00E30580
            CL_DEVICE_NAME: GeForce 940MX
            CL_DEVICE_VENDOR:       NVIDIA Corporation
            CL_DRIVER_VERSION:      369.30
            CL_DEVICE_PROFILE:      FULL_PROFILE
            CL_DEVICE_VERSION:      OpenCL 1.2 CUDA
            CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 1.2
            CL_DEVICE_EXTENSIONS:   cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts


    平台数:2
    Image dimensions: 273 x 415 pixels, Image type = CL_RGBA, CL_UNORM_INT8
    Work Timer:3.422816ms

    Executed program succesfully.

    273X415大小的图片用时不到4ms。

    (3)双边滤波的效果

    image

        效果应该来说是很明显的。不过由于没有肤色检测和最后一步锐化,以及参数的设置等问题,连我朋友都说这个磨皮效果太嫩了,看着很假。所以在算法上我这个是有待完善的。

        另外,在速度上,这个算法应该依然有优化的空间。

    源码:http://download.csdn.net/download/qq_33892166/9761287

        源码如果报错“Error queuing kernel for execution.”,尝试修改 size_t localWorkSize[2] = { 32, 32 }; 为 size_t localWorkSize[2] = { 16, 16 };

      

  • 相关阅读:
    Note/Solution 转置原理 & 多点求值
    Note/Solution 「洛谷 P5158」「模板」多项式快速插值
    Solution 「CTS 2019」「洛谷 P5404」氪金手游
    Solution 「CEOI 2017」「洛谷 P4654」Mousetrap
    Solution Set Border Theory
    Solution Set Stirling 数相关杂题
    Solution 「CEOI 2006」「洛谷 P5974」ANTENNA
    Solution 「ZJOI 2013」「洛谷 P3337」防守战线
    Solution 「CF 923E」Perpetual Subtraction
    KVM虚拟化
  • 原文地址:https://www.cnblogs.com/betterwgo/p/6431522.html
Copyright © 2011-2022 走看看