Vectorization Examples

This section contains a few simple examples of some common issues in vector programming.

Argument Aliasing: A Vector Copy

The loop in the example below, a vector copy operation, vectorizes because the compiler can prove dest[i] and src[i] are distinct.

Vectorizable Copy Due To Unproven Distinction

void vec_copy(float *dest, float *src, int len) 

{

   int i;

   for(i=0; i<len; i++;)

   {

      dest[i]=src[i];

   }

}

The restrict keyword in the example below indicates that the pointers refer to distinct objects. Therefore, the compiler allows vectorization without generation of multi-version code.

Using restrict to Prove Vectorizable Distinction

void vec_copy(float *restrict dest, float *restrict src, int len) 

{

   int i;

   for(i=0; i<len; i++)

   {

      dest[i]=src[i];

   }

}

Data Alignment

A 16-byte or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of sixteen.

The "Misaligned Data Crossing 16-Byte Boundary" figure shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment.

Misaligned Data Crossing 16-Byte Boundary

For example, if you know that elements a[0] and b[0] are aligned on a 16-byte boundary, then the following loop can be vectorized with the alignment option on (#pragma vector aligned):

Alignment of Pointers is Known

float *a, *b;

int i;

 

for(int i=0; i<10; i++)

{

   a[i]=b[i];

}

After vectorization, the loop is executed as shown here:

Vector and Scalar Clean-up Iterations

Both the vector iterations a[0:3] = b[0:3]; and a[4:7] = b[4:7]; can be implemented with aligned moves if both the elements a[0] and b[0] (or, likewise, a[4] and b[4] ) are 16-byte aligned.

Caution

If you specify the vectorizer with incorrect alignment options, the compiler will generate unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception.

Data Alignment Examples

The example below contains a loop that vectorizes but only with unaligned memory instructions. The compiler can align the local arrays, but because lb is not known at compile-time. The correct alignment cannot be determined.

Loop Unaligned Due to Unknown Variable Value at Compile Time

void f(int lb)

{

   float z2[N], a2[N], y2[N], x2;

   for(i=lb; i<N; i++)

   {

      a2[i]=a2[i]*x2+y2[i];

   }

}

If you know that lb is a multiple of 4, you can align the loop with #pragma vector aligned as shown in the example that follows:

Alignment Due to Assertion of Variable as Multiple of 4

void f(int lb)

{

   float z2[N], a2[N], y2[N], x2;

   assert(lb%4==0);

 

   #pragma vector aligned

 

   for(i=lb; i<N; i++)

   {

      a2[i]=a2[i]*x2+y2[i];

   }

}