Efficient Compilation

Understandably, efficient compilation contributes to performance improvement. Before you analyze your program for performance improvement, and improve program performance, you should think of efficient compilation itself. Based on the analysis of your application, you can decide which Intel Fortran Compiler optimizations and command-line options can improve the run-time performance of your application.

Efficient Compilation Techniques

The efficient compilation techniques can be used during the earlier stages and later stages of program development.

During the earlier stages of program development, you can use incremental compilation with minimal optimization. For example:

ifort -c -O1 sub2.f90 (generates object file of sub2)

ifort -c -O1 sub3.f90 (generates object file of sub3)

ifort -omain.exe -g -O0 main.f90 sub2.obj sub3.obj

The above command turns off all compiler default optimizations (for example, -O2) with -O0. You can use the -g option to generate symbolic debugging information and line numbers in the object code for use by a source-level debugger. The main.exe file created in the above command contains symbolic debugging information.

During the later stages of program development, you should specify multiple source files together and use an optimization level of at least -O2 (default) to allow more optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization, -O2:

ifort -omain.exe main.f90  sub2.f90  sub3.f90

Compiling multiple source files lets the compiler examine more code for possible optimizations, which results in:

For very large programs, compiling all source files together may not be practical. In such instances, consider compiling source files containing related routines together using multiple ifort commands, rather than compiling source files individually.

Options That Improve Run-Time Performance

The table below lists the options that can directly improve run-time performance. Most of these options do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results. The Intel Fortran Compiler performs some optimizations by default unless you turn them off by corresponding command-line options. Additional optimizations can be enabled or disabled using command options.

Option

Description

-align keyword

Analyzes and reorders memory layout for variables and arrays.

Controls whether padding bytes are added between data items within common blocks, derived-type data, and record structures to make the data items naturally aligned.

-ax{K|W|N|B|P}
IA-32 only

Optimizes your application's performance for specific processors. Regardless of which -ax suboption you choose, your application is optimized to use all the benefits of that processor with the resulting binary file capable of being run on any Intel IA-32 processor.

-O2 (-fast)

Sets the following performance-related options: -align dcommons,
-align sequence   

-O1,
-inline
all

Inlines every call that can possibly be inlined while generating correct code. Certain recursive routines are not inlined to prevent infinite loops

-parallel

Enables parallel processing using directed decomposition (directives inserted in source code. This can improve the performance of certain programs running on shared memory multiprocessor systems

-On

Controls the types of optimization performed. The default optimizations set is -O2, unless you specify -O0 (no optimizations). Use -O3 to activate loop transformation optimizations.

-openmp

Enables parallel processing using directed decomposition (directives inserted in source code). This can improve the performance of certain programs running on shared memory multiprocessor systems.

-qp

Requests profiling information, which you can use to identify those parts of your program where improving source code efficiency would most likely improve run-time performance. After you modify the appropriate source code, recompile the program and test the run-time performance.

-tpp{n}
 

Specifies the target processor generation architecture on which the program will be run, allowing the optimizer to make decisions about instruction tuning optimizations needed to create the most efficient code. Keywords allow specifying one particular processor generation type, multiple processor generation types, or the processor generation type currently in use during compilation. Regardless of the setting of -tpp{n}, the generated code will run correctly on all implementations of the Intel® IA-32 or Itanium® architectures.

-unrolln

Specifies the number of times a loop is unrolled (n) when specified with optimization level -O3. If you omit n in -unroll, the optimizer determines how many times loops can be unrolled.

Options That Slow Down the Run-time Performance

The table below lists options that can slow down the run-time performance. Some applications that require floating-point exception handling or rounding might need to use the -fpen dynamic option. Other applications might need to use the -assume dummy_aliases or -vms options for compatibility reasons. Other options that can slow down the run-time performance are primarily for troubleshooting or debugging purposes.

Table below lists the options that can slow down run-time performance.

Option

Description

-assume dummy_aliases

Forces the compiler to assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. These program semantics slow performance, so you should specify
-assume
dummy_aliases only for the called subprograms that depend on such aliases.

The use of dummy aliases violates the FORTRAN 77 and Fortran 95/90 standards but occurs in some older programs.

-c

If you use -c when compiling multiple source files, also specify
-o
outputfile to compile many source files together into one object file. Separate compilations prevent certain interprocedural optimizations, such as when using multiple f90 commands or using -c without the -ooutputfile option.

-check bounds

Generates extra code for array bounds checking at run time.

-check overflow

Generates extra code to check integer calculations for arithmetic overflow at run time. Once the program is debugged, omit this option to reduce executable program size and slightly improve run-time performance.

-fpe 3

Using this option slows program execution. It enables certain types of floating-point exception handling, which can be expensive.

-g  

Generate extra symbol table information in the object file. Specifying this option also reduces the default level of optimization to -O0 or -O0 (no optimization).

Note

The -g option only slows your program down when no optimization level is specified, in which case -g turns on -O0, which slows the compilation down. If -g, -O2 are specified, the code runs very much the same speed as if -g were not specified.

-inline none
-inline
manual

Prevents the inlining of all procedures except statement functions.

-save

Forces the local variables to retain their values from the last invocation terminated. This may change the output of your program for floating-point values as it forces operations to be carried out in memory rather than in registers, which in turn causes more frequent rounding of your results.

-O0

Turns off optimizations. Can be used during the early stages of program development or when you use the debugger.

-vms

Controls certain VMS-related run-time defaults, including alignment. If you specify the -vms option, you may need to also specify the -align records option to obtain optimal run-time performance.