Improving/Restricting FP Arithmetic Precision

The -mp and -mp1 (-mp1 is for IA-32 only) options maintain and restrict, respectively, floating-point precision, but also affect the application performance. The -mp1 option causes less impact on performance than the -mp option. -mp1 ensures the out-of-range check of operands of transcendental functions and improve accuracy of floating-point compares. For IA-32 systems, the -mp option implies -mp1; -mp1 implies -fp_port. -mp slows down performance the most of these three, -fp_port the least of these three.

The -mp option restricts some optimizations to maintain declared precision and to ensure that floating-point arithmetic conforms more closely to the ANSI and IEEE* standards. This option causes more frequent stores to memory, or disallow some data from being register candidates altogether. The Intel architecture normally maintains floating point results in registers. These registers are 80 bits long, and maintain greater precision than a double-precision number. When the results have to be stored to memory, rounding occurs. This can affect accuracy toward getting more of the "expected" result, but at a cost in speed. The -pc{32|64|80} option (IA-32 only) can be used to control floating point accuracy and rounding, along with setting various processor IEEE flags.

For most programs, specifying the -mp option adversely affects performance. If you are not sure whether your application needs this option, try compiling and running your program both with and without it to evaluate the effects on performance versus precision.

Specifying this option has the following effects on program compilation:

On IA-32 systems, floating-point user variables declared as floating-point types are not assigned to registers.
On Itanium®-based systems, floating-point user variables may be assigned to registers. The expressions are evaluated using precision of source operands. The compiler will not use Floating-point Multiply and Add (FMA) function to contract multiply and add/subtract operations in a single operation. The contractions can be enabled by using -IPF_fma option. The compiler will not speculate on floating-point operations that may affect the floating-point state of the machine. See Floating-point Arithmetic Precision for Itanium-based Systems.
Floating-point arithmetic comparisons conform to IEEE 754.
The exact operations specified in the code are performed. For example, division is never changed to multiplication by the reciprocal.
The compiler performs floating-point operations in the order specified without reassociation.
The compiler does not perform the constant folding on floating-point values. Constant folding also eliminates any multiplication by 1, division by 1, and addition or subtraction of 0. For example, code that adds 0.0 to a number is executed exactly as written. Compile-time floating-point arithmetic is not performed to ensure that floating-point exceptions are also maintained.

For IA-32 systems, whenever an expression is spilled, it is spilled as 80 bits (EXTENDED PRECISION), not 64 bits (DOUBLE PRECISION). Floating-point operations conform to IEEE 754. When assignments to type REAL and DOUBLE PRECISION are made, the precision is rounded from 80 bits (EXTENDED) down to 32 bits (REAL) or 64 bits (DOUBLE PRECISION). When you do not specify -O0, the extra bits of precision are not always rounded away before the variable is reused.

Even if vectorization is enabled by the -xK|W|B|P options, the compiler does not vectorize reduction loops (loops computing the dot product) and loops with mixed precision types. Similarly, the compiler does not enable certain loop transformations. For example, the compiler does not transform reduction loops to perform partial summation or loop interchange.