By Joel Yliluoma, September 2007; last update in June 2016 for OpenMP 4.5
Abstract
In this document, we concentrate on the C++ language in particular, and use GCC to compile the examples.
- Abstract
- Preface: Importance of multithreading
- Introduction to OpenMP in C++
- The syntax
- Offloading support
- Teams
- Thread-safety (i.e. mutual exclusion)
- Controlling which data to share between threads
- Thread affinity (proc_bind)
- Execution synchronization
- Thread cancellation (OpenMP 4.0+)
- Loop nesting
- Performance
- Shortcomings
- Missing in this article
- Some specific gotchas
- Further reading
Preface: Importance of multithreading
To harness that power, it is becoming important for programmers to be knowledgeable in parallel programming — making a program execute multiple things simultaneously.
This document attempts to give a quick introduction to OpenMP, a simple C/C++/Fortran compiler extension that allows to add parallelism into existing source code without significantly having to entirely rewrite it.
Support in different compilers
- GCC (GNU Compiler Collection) supports OpenMP 4.5 since version 6.1, OpenMP 4.0 since version 4.9, OpenMP 3.1 since version 4.7, OpenMP 3.0 since version 4.4, and OpenMP 2.5 since version 4.2. Add the commandline option -fopenmp to enable it. OpenMP offloading is supported for Intel MIC targets only (Intel Xeon Phi KNL + emulation) since version 5.1, and to NVidia (NVPTX) targets since version 7 or so.
- Clang++ supports OpenMP 4.5 since version 3.9 (without offloading), OpenMP 4.0 since version 3.8 (for some parts), and OpenMP 3.1 since version 3.7. Add the commandline option -fopenmp to enable it.
- Solaris Studio supports OpenMP 4.0 since version 12.4, and OpenMP 3.1 since version 12.3. Add the commandline option -xopenmp to enable it.
- Intel C Compiler (icc) supports Openmp 4.5 since version 17.0, OpenMP 4.0 since version 15.0, OpenMP 3.1 since version 12.1, OpenMP 3.0 since version 11.0, and OpenMP 2.5 since version 10.1. Add the commandline option -openmp to enable it. Add the -openmp-stubs option instead to enable the library without actual parallel execution.
- Microsoft Visual C++ (cl) supports OpenMP 2.0 since version 2005. Add the commandline option /openmp to enable it.
Note: If your GCC complains that "-fopenmp" is valid for D but not for C++ when you try to use it, or does not recognize the option at all, your GCC version is too old. If your linker complains about missing GOMP functions, you forgot to specify "-fopenmp" in the linking.
More information: http://openmp.org/wp/openmp-compilers/
Introduction to OpenMP in C++
Here are two simple example programs demonstrating OpenMP.
You can compile them like this:
g++ tmp.cpp -fopenmp
Example: Initializing a table in parallel (multiple threads)
#include <cmath>
int main()
{
const int size = 256;
double sinTable[size];
#pragma omp parallel for
for(int n=0; n<size; ++n)
sinTable[n] = std::sin(2 * M_PI * n / size);
// the table is now initialized
}
Example: Initializing a table in parallel (single thread, SIMD)
#include <cmath>
int main()
{
const int size = 256;
double sinTable[size];
#pragma omp simd
for(int n=0; n<size; ++n)
sinTable[n] = std::sin(2 * M_PI * n / size);
// the table is now initialized
}
Example: Initializing a table in parallel (multiple threads on another device)
#include <cmath>
int main()
{
const int size = 256;
double sinTable[size];
#pragma omp target teams distribute parallel for map(from:sinTable[0:256])
for(int n=0; n<size; ++n)
sinTable[n] = std::sin(2 * M_PI * n / size);
// the table is now initialized
}
Example: Calculating the Mandelbrot fractal in parallel (host computer)
#include <complex>
#include <cstdio>
typedef std::complex<double> complex;
int MandelbrotCalculate(complex c, int maxiter)
{
// iterates z = z + c until |z| >= 2 or maxiter is reached,
// returns the number of iterations.
complex z = c;
int n=0;
for(; n<maxiter; ++n)
{
if( std::abs(z) >= 2.0) break;
z = z*z + c;
}
return n;
}
int main()
{
const int width = 78, height = 44, num_pixels = width*height;
const complex center(-.7, 0), span(2.7, -(4/3.0)*2.7*height/width);
const complex begin = center-span/2.0;//, end = center+span/2.0;
const int maxiter = 100000;
#pragma omp parallel for ordered schedule(dynamic)
for(int pix=0; pix<num_pixels; ++pix)
{
const int x = pix%width, y = pix/width;
complex c = begin + complex(x * span.real() / (width +1.0),
y * span.imag() / (height+1.0));
int n = MandelbrotCalculate(c, maxiter);