Debugging Parallel Regions

Debugging Parallel Regions

The compiler implements a parallel region by enabling the code in the region and putting it into a separate, compiler-created entry point. Although this is different from outlining – the technique employed by other compilers, that is, creating a subroutine, – the same debugging technique can be applied.

Constructing an Entry-point Name

The compiler-generated parallel region entry point name is constructed with a concatenation of the following strings:

__par_loop for OpenMP parallel loops (!$OMP PARALLEL DO),

__par_section for OpenMP parallel sections (!$OMP PARALLEL SECTIONS)

Debugging Code with Parallel Region

Example 1 illustrates the debugging of the code with parallel region. Example 1 is produced by this command:

ifort -openmp -g -O0 -S file.f90

Let us consider the code of subroutine parallel in Example 1.

Subroutine PARALLEL() source listing

1    subroutine parallel

2    integer id,OMP_GET_THREAD_NUM

3 !$OMP PARALLEL PRIVATE(id)

4    id = OMP_GET_THREAD_NUM()

5 !$OMP END PARALLEL

6    end

The parallel region is at line 3. The compiler created two entry points: parallel_ and ___parallel_3__par_region0. The first entry point corresponds to the subroutine parallel(), while the second entry point corresponds to the OpenMP parallel region at line 3.

Example 1 Debuging Code with Parallel Region

Machine Code Listing of the Subroutine parallel()

        .globl parallel_
parallel_:
..B1.1:                    # Preds ..B1.0
..LN1:
pushl     %ebp                                    #1.0
movl      %esp, %ebp                              #1.0
subl      $44, %esp                               #1.0
pushl     %edi                                    #1.0

... ... ... ... ... ... ... ... ... ... ... ... ...

..B1.13:                    # Preds ..B1.9
addl      $-12, %esp                             #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
movl      $0, 4(%esp)                            #6.0
movl      $_parallel__6__par_region1, 8(%esp)    #6.0
call      __kmpc_fork_call                       #6.0
                  # LOE
..B1.31:                    # Preds ..B1.13
addl      $12, %esp                              #6.0
                  # LOE
..B1.14:                    # Preds ..B1.31 ..B1.30
..LN4:
leave                                            #9.0
ret                                              #9.0
                  # LOE
.type parallel_,@function
.size parallel_,.-parallel_
.globl _parallel__3__par_region0
_parallel__3__par_region0:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.15:                    # Preds ..B1.0
pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN5:
call      omp_get_thread_num_                    #4.0
                  # LOE eax
..B1.32:                    # Preds ..B1.15
movl      %eax, -32(%ebp)                        #4.0
                  # LOE
..B1.16:                    # Preds ..B1.32
movl      -32(%ebp), %eax                        #4.0
movl      %eax, -20(%ebp)                        #4.0
..LN6:
leave                                            #9.0
ret                                              #9.0
                  # LOE
.type _parallel__3__par_region0,@function
.size _parallel__3__par_region0,._parallel__3__par_region0
.globl _parallel__6__par_region1
_parallel__6__par_region1:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.17:                     # Preds ..B1.0
pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN7:
call      omp_get_thread_num_                    #7.0
                  # LOE eax
..B1.33:                    # Preds ..B1.17
movl      %eax, -28(%ebp)                        #7.0
                  # LOE
..B1.18:                    # Preds ..B1.33
movl      -28(%ebp), %eax                        #7.0
movl      %eax, -16(%ebp)                        #7.0
..LN8:
leave                                            #9.0
ret                                              #9.0
.align    4,0x90
# mark_end;

Debugging the program at this level is just like debugging a program that uses POSIX threads directly. Breakpoints can be set in the threaded code just like any other routine. With GNU debugger, breakpoints can be set to source-level routine names (such as parallel). Breakpoints can also be set to entry point names (such as parallel_ and _parallel__3__par_region0). Note that Intel Fortran Compiler for Linux converted the upper case Fortran subroutine name to the lower case one.