zoukankan      html  css  js  c++  java
  • Getting Started with OpenMP

    Getting Started with OpenMP*

    Abstract

    As you probably know by now, to get the maximum performance benefit from a processor with Hyper-Threading Technology, an application needs to be executed in parallel. Parallel execution requires threads, and threading an application is not trivial. What you may not know is that tools like OpenMP* can make the process a lot easier.

    This is the first in a series of three white-papers that teach you, an experienced C/C++ programmer, how to use OpenMP to get the most out of Hyper-Threading Technology. This first paper shows you how to parallelize loops, called work sharing. The second paper teaches you how to exploit non-loop parallelism and some additional OpenMP features. The final paper discusses the OpenMP runtime library functions, the Intel® C++ Compiler, and how to debug your application if things go wrong.

    A Quick Introduction to OpenMP

    The designers of OpenMP wanted to provide an easy method to thread applications without requiring that the programmer know how to create, synchronize, and destroy threads or even requiring him or her to determine how many threads to create. To achieve these ends, the OpenMP designers developed a platform-independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to insert threads into the application. Most loops can be threaded by inserting only one pragma right before the loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP, you can spend more time determining which loops should be threaded and how to best restructure the algorithms for maximum performance. The maximum performance of OpenMP is realized when it is used to thread "hotspots," the most time-consuming loops in your application.

    The power and simplicity of OpenMP is best demonstrated by looking at an example. The following loop converts a 32-bit RGB (red, green, blue) pixel to an 8-bit gray-scale pixel. The one pragma, which has been inserted immediately before the loop, is all that is needed for parallel execution.

    #pragma omp parallel for
     
    for (i=0; i < numPixels; i++)
     
    {
     
       pGrayScaleBitmap[i] = (unsigned BYTE)
     
                (pRGBBitmap[i].red * 0.299 +
     
                 pRGBBitmap[i].green * 0.587 +
     
                 pRGBBitmap[i].blue * 0.114);
     
    }
    

    Let's take a closer look at the loop. First, the example uses 'work-sha ring,' the general term used in OpenMP to describe distribution of work across threads. When work-sharing is used with the for construct, as shown in the example, the iterations of the loop are distributed among multiple threads so that each loop iteration is executed exactly once and in parallel by one or more threads. OpenMP determines how many threads to create and how to best create, synchronize, and destroy them. All the programmer needs to do is to tell OpenMP which loop should be threaded.

    OpenMP places the following five restrictions on which loops can be threaded:

    The loop variable must be of type signed integer. Unsigned integers, such as DWORD's, will not work.
    The comparison operation must be in the form loop_variable <, <=, >, or >= loop_invariant_integer
    The third expression or increment portion of the for loop must be either integer addition or integer subtraction and by a loop invariant value.
    If the comparison operation is < or <=, the loop variable must increment on every iteration, and conversely, if the comparison operation is > or >=, the loop variable must decrement on every iteration.
    The loop must be a basic block, meaning no jumps from the inside of the loop to the outside are permitted with the exception of the exit statement, which terminates the whole application. If the statements goto or break are used, they must jump within the loop, not outside it. The same goes for exception handling; exceptions must be caught within the loop.

    Although these restrictions may sound somewhat limiting, non-conforming loops can easily be rewritten to follow these restrictions.

  • 相关阅读:
    sklearn学习笔记
    概率生成模型GAN
    机器学习的种类
    如何为React提交pull request
    webpack define Plugin
    Saas应用方法论12条
    React Ref 和 React forwardRef
    几个数组去重的方法
    级数笔记
    信号量及P/V操作
  • 原文地址:https://www.cnblogs.com/scotth/p/7892873.html
Copyright © 2011-2022 走看看