zoukankan      html  css  js  c++  java
  • OpenACC 计算圆周率(简单版)

    ▶ 书上的计算圆周率的简单程序,主要是使用了自定义函数

     1 #include <stdio.h>
     2 #include <stdlib.h>
     3 #include <math.h>
     4 #include <openacc.h>
     5 
     6 #define N   100
     7 
     8 #pragma acc routine seq
     9 float ff(const float x)
    10 {    
    11     return 4.0f / (1.0f + x * x);
    12 }
    13 
    14 int main()
    15 {
    16     const float h = 1.0f / N;
    17     float sumf = 0, result;
    18            
    19 #pragma acc parallel loop reduction(+:sumf)
    20     for (int i = 0; i < N; i++)
    21         sumf += ff(h * (i - 0.5f));
    22 
    23     result = h * sumf;    
    24     printf("
    N = %d, myPi = %f, diff = %e
    ", N, result, result / 3.141592653589793238 - 1);
    25     //getchar();
    26     return 0;
    27 }

    ● 输出结果

    D:CodeOpenACCOpenACCProjectOpenACCProject>pgcc main.c -acc -Minfo -o main_acc.exe
    ff:
         10, Generating acc routine seq
             Generating Tesla code
         11, FMA (fused multiply-add) instruction(s) generated
    main:
         19, Accelerator kernel generated
             Generating Tesla code
             20, #pragma acc loop gang, vector(100) /* blockIdx.x threadIdx.x */
                 Generating reduction(+:sumf)
         19, Generating implicit copy(sumf)
    
    D:CodeOpenACCOpenACCProjectOpenACCProject>main_acc.exe
    launch CUDA kernel  file=D:CodeOpenACCOpenACCProjectOpenACCProjectmain.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=100 grid=1 block=100 shared memory=1024
    launch CUDA kernel  file=D:CodeOpenACCOpenACCProjectOpenACCProjectmain.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=256 grid=1 block=256 shared memory=1024
    
    N = 100, myPi = 3.161500, diff = 6.336546e-03
    PGI: "acc_shutdown" not detected, performance results might be incomplete.
     Please add the call "acc_shutdown(acc_device_nvidia)" to the end of your application to ensure that the performance results are complete.
    
    Accelerator Kernel Timing data
    D:CodeOpenACCOpenACCProjectOpenACCProjectmain.c
      main  NVIDIA  devicenum=0
        time(us): 11
        19: compute region reached 1 time
            19: kernel launched 1 time
                grid: [1]  block: [100]
                elapsed time(us): total=1000 max=1000 min=1000 avg=1000
            19: reduction kernel launched 1 time
                grid: [1]  block: [256]
                 device time(us): total=0 max=0 min=0 avg=0
        19: data region reached 2 times
            19: data copyin transfers: 1
                 device time(us): total=4 max=4 min=4 avg=4
            23: data copyout transfers: 1
                 device time(us): total=7 max=7 min=7 avg=7
  • 相关阅读:
    第87天:HTML5中新选择器querySelector的使用
    第86天:HTML5应用程序标签和智能表单
    第85天:HTML5语义化标签
    第84天:jQuery动态创建表格
    第83天:jQuery中操作form表单
    第82天:jQuery中prop()和attr()的区别
    第81天:jQuery 插件使用方法
    第80天:jQuery插件使用
    第79天:jQuery事件总结(二)
    对事务的特性ACID的理解
  • 原文地址:https://www.cnblogs.com/cuancuancuanhao/p/9419429.html
Copyright © 2011-2022 走看看