Programming with Auto-parallelization
The auto-parallelization feature implements some concepts of OpenMP*,
such as worksharing construct (with the parallel for
directive). This section provides specifics of auto-parallelization.
Guidelines for Effective Auto-parallelization Usage
A loop is parallelizable if:
- The loop is countable at compile time. This means
that an expression representing how many times the loop will execute (also
called "the loop trip count") can be generated just before entering
the loop.
- There are no FLOW (READ after WRITE), OUTPUT
(WRITE after READ) or
ANTI (WRITE after READ) loop-carried data dependences. A loop-carried
data dependence occurs when the same memory location is referenced in
different iterations of the loop. At the compiler's discretion, a loop
may be parallelized if any assumed inhibiting loop-carried dependencies
can be resolved by run-time dependency testing.
The compiler may generate a run-time test for the profitability of executing
in parallel for loop with loop parameters that
are not compile-time constants.
Coding Guidelines
Enhance the power and effectiveness of the auto-parallelizer by following
these coding guidelines:
- Expose the trip count of loops whenever possible.
Specifically use constants where the trip count is known and save loop
parameters in local variables.
- Avoid placing structures inside loop bodies that
the compiler may assume to carry dependent data, for example, function
calls, ambiguous indirect references, or global references.
Auto-parallelization Data Flow
For auto-parallelization processing, the compiler
performs the following steps:
- Data flow analysis
- Loop classification
- Dependence analysis
- High-level parallelization
- Data partitioning
- Multi-threaded code generation
These steps include:
- Data flow analysis: compute the flow of data through
the program
- Loop classification: determine loop candidates for
parallelization based on correctness and efficiency as shown by threshold
analysis
- Dependence analysis: compute the dependence analysis
for references in each loop nest
- High-level parallelization:
- analyze dependence graph to determine loops which
can execute in parallel.
- compute run-time dependency
- Data partitioning: examine data reference and partition
based on the following types of access: shared,
private, and firstprivate.
- Multi-threaded code generation:
- modify loop parameters
- generate entry/exit per threaded task
- generate calls to parallel runtime routines for
thread creation and synchronization