zoukankan      html  css  js  c++  java
  • CUDA学习入门2

    1. nvidia提供了一个c++的类库thrust用来简化编程,在安装cuda toolkit时候已经包含了thrust
    这个库全是头文件,不需要添加任何库文件的依赖
    测试程序

    #include <thrust/host_vector.h>
    #include <thrust/device_vector.h>
    #include <thrust/generate.h>
    #include <thrust/sort.h>
    #include <thrust/copy.h>
    #include <algorithm>
    #include <cstdlib>
    
    ///////////cpu
    #include <windows.h>
    #include <algorithm>
    
    template <class T>
    void cpu_sort(T begin, T end)
    {
        std::sort(begin, end);
    }
    
    void gpu_sort(thrust::host_vector<int> &h_vec)
    {
      // transfer data to the device
      thrust::device_vector<int> d_vec = h_vec;
    
      // sort data on the device (846M keys per second on GeForce GTX 480)
      thrust::sort(d_vec.begin(), d_vec.end());
    
      // transfer data back to host
      thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
    }
    
    #define CHK_TIME(x)    {int t1=GetTickCount();x;int t2=GetTickCount();printf(#x ": %d\n", t2-t1);}
    
    int main(void)
    {
      // generate 32M random numbers serially
      thrust::host_vector<int> h_vec(32 << 20);
      std::generate(h_vec.begin(), h_vec.end(), rand);
    
      thrust::host_vector<int> h_vec_1(h_vec);
      CHK_TIME(cpu_sort(h_vec_1.begin(), h_vec_1.end()));
    
      thrust::host_vector<int> h_vec_2(h_vec);
      CHK_TIME(gpu_sort(h_vec_2));
    
      return 0;
    }
    View Code

    notes
    a)文件要保存为.cu格式以便使用nvcc编译
    b)如果不知道vcproj如何设置,最简单的是把代码直接拷贝到一个example里面,利用其现成的工程来编译
    c)compile的时间实在太长了
    d)生成的文件太大了(15MB)

    这是俺的测试结果(注意,这里cpu是单线程,如果利用上多核的话,cpu性能会好很多)

    (debug version)
    cpu_sort(h_vec_1.begin(), h_vec_1.end()): 94609
    gpu_sort(h_vec_2): 3312
    (release version)
    cpu_sort(h_vec_1.begin(), h_vec_1.end()): 2828
    gpu_sort(h_vec_2): 594
    View Code

    2. 关于cuda的sort算法,用的是 radix sort

    http://stackoverflow.com/questions/6502151/parallel-sorting-on-cuda
    Many GPU sorting implementations are variants of the bitonic sort, which is pretty well known and described in most reasonable texts on algorithms published in the last 25 or 30 years.
    
    The "reference" sorting implementation for CUDA done by Nadathur Satish from Berkeley and Mark Harris and Michael Garland from NVIDIA (paper here) is a radix sort, and forms the basis of what is in NPP and Thrust.
    View Code



    3. NPP是nvidia的信号处理函数库,类似于ipp,包含了很多基本的处理算法
    https://developer.nvidia.com/npp

        Eliminates unnecessary copying of data to/from CPU memory
            Process data that is already in GPU memory
            Leave results in GPU memory so they are ready for subsequent processing
        Data Exchange and Initialization
            Set, Convert, Copy, CopyConstBorder, Transpose, SwapChannels
        Arithmetic and Logical Operations
            Add, Sub, Mul, Div, AbsDiff, Threshold, Compare
        Color Conversion
            RGBToYCbCr, YcbCrToRGB, YCbCrToYCbCr, ColorTwist, LUT_Linear
        Filter Functions
            FilterBox, Filter, FilterRow, FilterColumn, FilterMax, FilterMin, Dilate, Erode, SumWindowColumn, SumWindowRow
        JPEG
            DCTQuantInv, DCTQuantFwd, QuantizationTableJPEG
        Geometry Transforms
            Mirror, WarpAffine, WarpAffineBack, WarpAffineQuad, WarpPerspective, WarpPerspectiveBack  , WarpPerspectiveQuad, Resize
        Statistics Functions
            Mean_StdDev, NormDiff, Sum, MinMax, HistogramEven, RectStdDev
    View Code

    4.  另外,还有一些额外的库比如NVIDIA cuFFT,NVIDIA cuBLAS (6x to 17x faster performance than the latest MKL BLAS.),EM Photonics CULA Tools(linear algebra library), NVIDIA cuSPARSE,NVIDIA CUDA Math Library    
    https://developer.nvidia.com/gpu-accelerated-libraries

  • 相关阅读:
    闲聊js中的apply、call和arguments
    字符串操作,文件操作,英文词频统计预处理
    了解大数据的特点、来源与数据呈现方式
    带你精读你不知道的Javasript(上)(一)
    益智小游戏看你能否通关?
    网站性能优化——网页的生成过程
    带着封装的思想顺便实现楼层点亮
    CSS3 傻傻分不清楚的transition, transform 和 animation
    图片轮播的思路
    如何处理跨平台的自适应三
  • 原文地址:https://www.cnblogs.com/cutepig/p/3099172.html
Copyright © 2011-2022 走看看