The debugging of multithreaded program discussed in this section applies to both the OpenMP Fortran API and the Intel Fortran parallel compiler directives. When a program uses parallel decomposition directives, you must take into consideration that the bug might be caused either by an incorrect program statement or it might be caused by an incorrect parallel decomposition directive. In either case, the program to be debugged can be executed by multiple threads simultaneously.
To debug the multithreaded programs, you can use:
Intel Debugger for IA-32 and Intel Debugger for Itanium-based applications (idb)
Intel Fortran Compiler debugging options and methods; in particular, Compiling Source Lines with Debugging Statements.
Intel parallelization extension routines for low-level debugging.
VTune(TM) Performance Analyzer to define the problematic areas.
Other best known debugging methods and tips include:
Correct the program in single-threaded, uni-processor environment
Statically analyze locks
Use trace statement (such as print statement)
Think in parallel, make very few assumptions
Step through your code
Make sense of threads and callstack information
Identify the primary thread
Know what thread you are debugging
Single stepping in one thread does not mean single stepping in others
Watch out for context switch
Debuggers such as Intel Debugger for IA-32 and Intel Debugger for Itanium-based applications support the debugging of programs that are executed by multiple threads. However, the currently available versions of such debuggers do not directly support the debugging of parallel decomposition directives, and therefore, there are limitations on the debugging features.
Some of the new features used in OpenMP are not yet fully supported by the debuggers, so it is important to understand how these features work to know how to debug them. The two problem areas are:
Multiple entry points
Shared variables
You can use routine names (for example, padd) and entry names (for example, _PADD, ___PADD_6__par_loop0). Fortran Compiler, by default, first mangles lower/mixed case routine names to upper case. For example, pAdD() becomes PADD(), and this becomes entry name by adding one underscore. The secondary entry name mangling happens after that. That's why "__par_loop" part of the entry name stays as lower case. Debugger for some reason didn't take the upper case routine name "PADD" to set the breakpoint. Instead, it accepted the lower case routine name "padd".