This topic discusses how to efficiently access arrays and how to efficiently pass array arguments.
Many of the array access efficiency techniques described in this section are applied automatically by the Intel Fortran loop transformations optimizations. Several aspects of array use can improve run-time performance:
The fastest array access occurs when contiguous
access to the whole array or most of an array occurs. Perform one or a
few array operations that access all of the array or major parts of an
array instead of numerous operations on scattered array elements. Rather
than use explicit loops for array access, use elemental array operations,
such as the following line that increments all elements of array variable
a:
a = a + 1
When reading or writing an array, use the array name and not a DO loop or an implied
DO-loop
that specifies each element number. Fortran 95/90 array syntax allows
you to reference a whole array by using its name in an expression. For
example:
real :: a(100,100) |
Similarly, you can use derived-type array structure components, such as:
type x |
Make sure multidimensional arrays are referenced
using proper array syntax and are traversed in the natural
ascending storage order, which is column-major
order for Fortran. With column-major order, the leftmost subscript
varies most rapidly with a stride of one. Whole array access uses column-major
order.
Avoid row-major order, as is
done by C, where the rightmost subscript varies most rapidly.
For example, consider the nested do loops that
access a two-dimension array with the j loop as the innermost loop:
integer x(3,5), y(3,5), i, j |
Since j varies the
fastest and is the second array subscript in the expression x
(i,j), the array is accessed in row-major order.
To make the array accessed in natural column-major order, examine the array
algorithm and data being modified. Using arrays x
and y, the array can be accessed in natural column-major
order by changing the nesting order of the do
loops so the innermost loop variable corresponds to the leftmost array
dimension:
integer x(3,5), y(3,5), i, j |
The Intel Fortran whole array access ( x = y + 1 ) uses efficient column major order. However, if the application requires that J vary the fastest or if you cannot modify the loop order without changing the results, consider modifying the application program to use a rearranged order of array dimensions. Program modifications include rearranging the order of:
Dimensions in the declaration of the arrays x(5,3) and y(5,3)
The assignment of x(j,i) and y(j,i) within the do loops
All other references to arrays x and y
In this case, the original DO loop nesting is used where J is the innermost loop:
integer x(3,5), y(3,5), i, j |
Code written to access multidimensional arrays in row-major order (like C) or random order can often make use of the CPU memory cache less efficient. For more information on using natural storage order during record, see Improving I/O Performance.
Use the available Fortran 95/90 array intrinsic procedures rather than create your own.
Whenever possible, use Fortran 95/90 array intrinsic procedures instead of creating your own routines to accomplish the same task. Fortran 95/90 array intrinsic procedures are designed for efficient use with the various Intel Fortran run-time components.
Using the standard-conforming array intrinsics can also make your program more portable.
With multidimensional arrays where access to array elements will be noncontiguous, avoid leftmost array dimensions that are a power of two (such as 256, 512).
Since the cache sizes are a power of 2, array dimensions that are also a power of 2 may make less efficient use of cache when array access is noncontiguous. If the cache size is an exact multiple of the leftmost dimension, your program will probably make inefficient use of the cache. This does not apply to contiguous sequential access or whole array access.
One work-around is to increase the dimension to allow some unused elements, making the leftmost dimension larger than actually needed. For example, increasing the leftmost dimension of A from 512 to 520 would make better use of cache:
real
a(512, 100) |
In this code, array a has a leftmost dimension of 512, a power of two. The innermost loop accesses the rightmost dimension (row major), causing inefficient access. Increasing the leftmost dimension of a to 520 (real a (520,100)) allows the loop to provide better performance, but at the expense of some unused elements.
Because loop index variables I and J are used in the calculation, changing the nesting order of the do loops changes the results.
For more information on arrays and their data declaration statements, see the Intel® Fortran Language Reference Manual.
In Fortran, there are two general types of array arguments:
Explicit-shape arrays used with FORTRAN 77.
These arrays have a fixed rank and extent that is known at compile time. Other dummy argument (receiving) arrays that are not deferred-shape (such as assumed-size arrays) can be grouped with explicit-shape array arguments.
Deferred-shape arrays introduced with Fortran 95/90.
Types of deferred-shape arrays include array pointers and allocatable arrays. Assumed-shape array arguments generally follow the rules about passing deferred-shape array arguments.
When passing arrays as arguments, either the starting (base) address of the array or the address of an array descriptor is passed:
When using explicit-shape (or assumed-size) arrays to receive an array, the starting address of the array is passed.
When using deferred-shape or assumed-shape arrays to receive an array, the address of the array descriptor is passed (the compiler creates the array descriptor).
Passing an assumed-shape array or array pointer to an explicit-shape array can slow run-time performance. This is because the compiler needs to create an array temporary for the entire array. The array temporary is created because the passed array may not be contiguous and the receiving (explicit-shape) array requires a contiguous array. When an array temporary is created, the size of the passed array determines whether the impact on slowing run-time performance is slight or severe.
The following table summarizes what happens with the various combinations of array types. The amount of run-time performance inefficiency depends on the size of the array.
Output Argument Array Types
Input Arguments Array Types |
Explicit-Shape Arrays |
Deferred-Shape and Assumed-Shape Arrays |
Explicit-shape arrays |
Very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. |
Efficient. Only allowed for assumed-shape arrays (not deferred-shape arrays). Does not use an array temporary. Passes an array descriptor. Requires an interface block. |
Deferred-shape and assumed-shape arrays |
When passing an allocatable array, very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. When not passing an allocatable array, not efficient. Instead use allocatable arrays whenever possible. Uses an array temporary. Does not pass an array descriptor. Interface block optional. |
Efficient. Requires an assumed-shape or array pointer as dummy argument. Does not use an array temporary. Passes an array descriptor. Requires an interface block. |