The goal of vectorizing compilers is to exploit single-instruction multiple data (SIMD) processing automatically. Review these guidelines and restrictions, see code examples in further topics, and check them against your code to eliminate ambiguities that prevent the compiler from achieving optimal vectorization.
You will often need to make some changes to your loops. However, you should make only the changes needed to enable vectorization and no others.
For loop bodies -
Use:
Straight-line code (a single basic block)
Vector data only; that is, arrays and invariant expressions on the right hand side of assignments. Array references can appear on the left hand side of assignments.
Only assignment statements
Avoid:
Function calls
Unvectorizable operations
Mixing vectorizable types in the same loop
Data-dependent loop exit conditions
Loop unrolling (compiler does it)
Decomposing one loop with several statements in the body into several single-statement loops.
Vectorization depends on the two major factors:
Hardware. The compiler is limited by restrictions imposed by the underlying hardware. In the case of Streaming SIMD Extensions, the vector memory operations are limited to stride-1 accesses with a preference to 16-byte-aligned memory references. This means that if the compiler abstractly recognizes a loop as vectorizable, it still might not vectorize it for a distinct target architecture.
Style. The style in which you write source code can inhibit optimization. For example, a common problem with global pointers is that they often prevent the compiler from being able to prove two memory references at distinct locations. Consequently, this prevents certain reordering transformations.
Many stylistic issues that prevent the automatic parallelization by vectorization compilers are found in loop structures. The ambiguity arises from the complexity of the keywords, operators, data references, and memory operations within the loop bodies.
However, by understanding these limitations and by knowing how to interpret diagnostic messages, you can modify your program to overcome the known limitations and enable effective vectorizations. The following sections summarize the capabilities and restrictions of the vectorizer with respect to loop structures.