Auto-parallelization Threshold Control and Diagnostics
Threshold Control
The -par_threshold{n} option sets a threshold
for the auto-parallelization of loops based on the probability of profitable
execution of the loop in parallel. The value of n
can be from 0 to 100. The default value is 75. This option is used for
loops whose computation work volume cannot be determined at compile-time.
The threshold is usually relevant when the loop trip count is unknown
at compile-time.
The -par_threshold{n} option has the following
versions and functionality:
- Default: -par_threshold is
not specified in the command line, which is the same as when -par_threshold0
is specified. The loops get auto-parallelized regardless of computation
work volume, that is, parallelize always.
- -par_threshold100 - loops
get auto-parallelized only if profitable parallel execution is almost
certain.
- The intermediate 1 to 99 values represent the percentage
probability for profitable speed-up. For example, n=50
would mean: parallelize only if there is a 50% probability of the code
speeding up if executed in parallel.
- The default value of n is n=75
(or -par_threshold75). When -par_threshold
is used on the command line without a number, the default value passed
is 75.
The compiler applies a heuristic that tries to balance the overhead
of creating multiple threads versus the amount of work available to be
shared amongst the threads.
Diagnostics
The -par_report{0|1|2|3} option controls the
auto-parallelizer's diagnostic levels 0, 1, 2, or 3 as follows:
- -par_report0 = no diagnostic
information is displayed.
- -par_report1 = indicates
loops successfully auto-parallelized (default). Issues a "LOOP
AUTO-PARALLELIZED" message for parallel loops.
- -par_report2 = indicates
successfully auto-parallelized loops as well as unsuccessful loops.
- -par_report3 = same as 2
plus additional information about any proven or assumed dependencies inhibiting
auto-parallelization (reasons for not parallelizing).
Example of Parallelization Diagnostics Report
The example below shows output generated by -par_report3:
prompt>icpc -c -parallel -par_report3 prog.cpp
Sample Ouput
program prog
procedure: prog
serial loop: line 5: not a parallel candidate due
to
statement at line 6
serial loop: line 9
flow data dependence from line 10 to line 10, due
to "a"
12 Lines Compiled |
where the program prog.cpp is as follows:
Sample prog.c
/* Assumed side effects */
for (i=1; i<10000; i++)
{
a[i]
= foo(i);
}
/* Actual dependence */
for (i=1; i<10000; i++)
{
a[i]
= a[i-1] + i;
} |
Troubleshooting Tips
- Use -par_threshold0 to see
if the compiler assumed there was not enough computational work
- Use -par_report3 to view
diagnostics
- Use -ipo to eliminate assumed
side-effects done to function calls