The Intel® Fortran Compiler with the auto-parallelization feature and a high-level symmetric multirpocessing (SMP) programming model enable you with an easy way to exploit the parallelism on SMP systems.
Automatic parallelization relieves the user from having to deal with the low-level details of iteration partitioning, data sharing, thread scheduling and synchronizations. It also provides the benefit of the performance available from multiprocessor systems.
To enable auto-parallelizer, use the -parallel option. The -parallel option detects parallel loops capable of being executed safely in parallel and automatically generates multithreaded code for these loops. An example of the command using auto-parallelization is as follows:
IA-32 compilations:
prompt>ifc -c -parallel -par_threshold0 myprog.f
Itanium-based compilations:
prompt>efc -c -parallel -par_threshold0 myprog.f
Enhance the power and effectiveness of the auto-parallelizer by following these coding guidelines:
Expose the trip count of loops whenever possible; specifically use constants where the trip count is known and save loop parameters in local variables.
Avoid placing structures inside loop bodies that the compiler may assume to carry dependent data, for example, procedure calls or global references.
Currently, compiler is analyzed only on loop nests, but potentially on independent regions of code (task parallelism). A loop is parallelizable if:
there is no loop-carried dependency or
any loop-carried dependencies can be resolved by some code transformation, for example: privatization of scalars or runtime dependency testing.
To prepare auto-parallelization, the compiler performs the following transformations:
Partitions data accesses: shared, private, first-private, last-private, reduction
Modifies loop parameters and references
Generates new entry/exit per threaded task
Generates both parallel and serial versions with conditional execution based on:
- work/overhead threshold analysis
- runtime dependency testing
Option |
Description |
Default |
OMP_NUM_THREADS |
Controls the number of threads used. |
Number of processors currently installed in the system |
OMP_SCHEDULE |
Specifies the type of runtime scheduling. |
static |