Matrix multiplication is commonly written as shown in the example below:
for(i=0; i<N; i++) { for(j=0; j<n; j++) { for(k=0; k<n; k++) { c[i][j]=c[i][j]+a[i][k]*b[k][j]; } } } |
The use of b[k][j], is not a stride-1 reference and therefore will not normally be vectorizable. If the loops are interchanged, however, all the references will become stride-1 as shown in the "Matrix Multiplication With Stride-1" example.
Caution
Interchanging is not always possible because of dependencies, which can lead to different results.
for(i = 0; i<N; i++) { for(k=0; k<n; k++) { for(j=0; j<n; j++) { c[i][j]=c[i][j]+a[i][k]*b[k][j]; } } } |