Using Arrays Efficiently

This topic discusses how to efficiently access arrays and how to efficiently pass array arguments.

Accessing Arrays Efficiently

Many of the array access efficiency techniques described in this section are applied automatically by the Intel Fortran loop transformations optimizations. Several aspects of array use can improve run-time performance:

real :: a(100,100)
a = 0.0
a = a + 1       ! Increment all elements
            ! of a by 1
.
.
.
write (8) a     ! Fast whole array use

   Similarly, you can use derived-type array structure components, such as:

type x
INTEGER A(5)integer a(5)
end type x
.
.
.
type (x) z
write (8)z%a    ! Fast array structure
            !
 component use

integer x(3,5), y(3,5), i, j
y = 0
do i=1,3               ! I outer loop varies slowest
do j=1,5               ! J inner loop varies fastest
x (i,j) = y(i,j) + 1   ! Inefficient row-major storage order
end do                 ! (rightmost subscript varies fastest)
end do
.
.
.
end program

Since j varies the fastest and is the second array subscript in the expression x (i,j), the array is accessed in row-major order.
To make the array accessed in natural column-major order, examine the array algorithm and data being modified. Using arrays x and y, the array can be accessed in natural column-major order by changing the nesting order of the do loops so the innermost loop variable corresponds to the leftmost array dimension:

integer x(3,5), y(3,5), i, j
y = 0
do j=1,5               ! J outer loop varies slowest
do i=1,3               ! I inner loop varies fastest
x (i,j) = y(i,j) + 1   ! Efficient column-major storage order
end do                 ! (leftmost subscript varies fastest)
end do
.
.
.
end program

The Intel Fortran whole array access ( x = y + 1 ) uses efficient column major order. However, if the application requires that J vary the fastest or if you cannot modify the loop order without changing the results, consider modifying the application program to use a rearranged order of array dimensions. Program modifications include rearranging the order of:

In this case, the original DO loop nesting is used where J is the innermost loop:

integer x(3,5), y(3,5), i, j
y = 0
do i=1,3              ! I outer loop varies slowest
do j=1,5              ! J inner loop varies fastest
x (j,i) = y(j,i) + 1  ! Efficient column-major storage order
end do                ! (leftmost subscript varies fastest)
end do
.
.
.
end program

Code written to access multidimensional arrays in row-major order (like C) or random order can often make use of the CPU memory cache less efficient. For more information on using natural storage order during record, see Improving I/O Performance.

Whenever possible, use Fortran 95/90 array intrinsic procedures instead of creating your own routines to accomplish the same task. Fortran 95/90 array intrinsic procedures are designed for efficient use with the various Intel Fortran run-time components.

Using the standard-conforming array intrinsics can also make your program more portable.

Since the cache sizes are a power of 2, array dimensions that are also a power of 2 may make less efficient use of cache when array access is noncontiguous. If the cache size is an exact multiple of the leftmost dimension, your program will probably make inefficient use of the cache. This does not apply to contiguous sequential access or whole array access.

One work-around is to increase the dimension to allow some unused elements, making the leftmost dimension larger than actually needed. For example, increasing the leftmost dimension of A from 512 to 520 would make better use of cache:

 real a(512, 100)
do i= 2,511
do j = 2,99
a(i,j)=(a(i+1,j-1) + a(i-1, j+1)) * 0.5
end do
end do

In this code, array a has a leftmost dimension of 512, a power of two. The innermost loop accesses the rightmost dimension (row major), causing inefficient access. Increasing the leftmost dimension of a to 520 (real a (520,100)) allows the loop to provide better performance, but at the expense of some unused elements.

Because loop index variables I and J are used in the calculation, changing the nesting order of the do loops changes the results.

For more information on arrays and their data declaration statements, see the Intel® Fortran Language Reference Manual.

Passing Array Arguments Efficiently

In Fortran, there are two general types of array arguments:

These arrays have a fixed rank and extent that is known at compile time. Other dummy argument (receiving) arrays that are not deferred-shape (such as assumed-size arrays) can be grouped with explicit-shape array arguments.

Types of deferred-shape arrays include array pointers and allocatable arrays. Assumed-shape array arguments generally follow the rules about passing deferred-shape array arguments.

When passing arrays as arguments, either the starting (base) address of the array or the address of an array descriptor is passed:

Passing an assumed-shape array or array pointer to an explicit-shape array can slow run-time performance. This is because the compiler needs to create an array temporary for the entire array. The array temporary is created because the passed array may not be contiguous and the receiving (explicit-shape) array requires a contiguous array. When an array temporary is created, the size of the passed array determines whether the impact on slowing run-time performance is slight or severe.

The following table summarizes what happens with the various combinations of array types. The amount of run-time performance inefficiency depends on the size of the array.

Output Argument Array Types

Input Arguments Array Types

Explicit-Shape Arrays

Deferred-Shape and Assumed-Shape Arrays

Explicit-shape arrays

Very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional.

Efficient. Only allowed for assumed-shape arrays (not deferred-shape arrays). Does not use an array temporary. Passes an array descriptor. Requires an interface block.

Deferred-shape and assumed-shape arrays

When passing an allocatable array, very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional.

When not passing an allocatable array, not efficient. Instead use allocatable arrays whenever possible.

Uses an array temporary. Does not pass an array descriptor. Interface block optional.

Efficient. Requires an assumed-shape or array pointer as dummy argument. Does not use an array temporary. Passes an array descriptor. Requires an interface block.