Loop Structure Coding Background

The goal of vectorizing compilers is to exploit single-instruction multiple data (SIMD) processing automatically. However, the realization of this goal has been difficult to achieve. The reason for the difficulty in achieving vectorization is due to two major factors:

  1. Style -- The style in which you write source code can inhibit optimization. For example, a common problem with global pointers is that they often prevent the compiler from being able to prove two memory references are distinct locations. Consequently, this prevents certain reordering transformations.

  2. Hardware Restrictions -- The compiler is limited by restrictions imposed by the underlying hardware. In the case of Streaming SIMD Extensions, the vector memory operations are limited to stride-1 accesses with a preference to 16-byte aligned memory references. This means that if the compiler abstractly recognizes a loop as vectorizable, it still might not vectorize it to a distinct target architecture.

Many stylistic issues that prevent the automatic parallelization by vectorization compilers are found in loop structures. The ambiguity arises from the complexity of the keywords, operators, data references, and memory operations within the loop bodies.

However, by understanding these limitations and by knowing how to interpret diagnostic messages, you can modify your program to overcome the known limitations and enable effective vectorizations -- improving your application's performance. The following sections summarize the capabilities and restrictions of the vectorizer with respect to loop structures.