Alignment Support

To improve intrinsics performance, you need to align data. For example, when you are using the Streaming SIMD Extensions, you should align data to 16 bytes in memory operations to improve performance. Specifically, you must align __m128 objects as addresses passed to the _mm_load and _mm_store intrinsics. If you want to declare arrays of floats and treat them as __m128 objects by casting, you need to ensure that the float arrays are properly aligned.

Use __declspec(align) to direct the compiler to align data more strictly than it otherwise does on both IA-32 and Itanium(TM)-based systems. For example, a data object of type int is allocated at a byte address which is a multiple of 4 by default (the size of an int). However, by using __declspec(align), you can direct the compiler to instead use an address which is a multiple of 8, 16, or 32 with the following restrictions on IA-32:

You can use this data alignment support as an advantage in optimizing cache line usage. By clustering small objects that are commonly used together into a struct, and forcing the struct to be allocated at the beginning of a cache line, you can effectively guarantee that each object is loaded into the cache as soon as any one is accessed, resulting in a significant performance benefit.

The syntax of this extended-attribute is as follows:

align(n)

where n is an integral power of 2, less than or equal to 32. The value specified is the requested alignment.

Note

If a value is specified that is less than the alignment of the affected data type, it has no effect. In other words, data is aligned to the maximum of its own alignment or the alignment specified with __declspec(align).

You can request alignments for individual variables, whether of static or automatic storage duration. (Global and static variables have static storage duration; local variables have automatic storage duration by default.) You cannot adjust the alignment of a parameter, nor a field of a struct or class. You can, however, increase the alignment of a struct (or union or class ), in which case every object of that type is affected.

As an example, suppose that a function uses local variables i and j as subscripts into a 2-dimensional array. They might be declared as follows:

int i, j;

These variables are commonly used together. But they can fall in different cache lines, which could be detrimental to performance. You can instead declare them as follows:

__declspec(align(8)) struct { int i, j; } sub;

The compiler now ensures that they are allocated in the same cache line. In C++, you can omit the struct variable name (written as sub in the above example). In C, however, it is required, and you must write references to i and j as sub.i and sub.j.

If you use many functions with such subscript pairs, it is more convenient to declare and use a struct type for them, as in the following example:

typedef struct __declspec(align(8)) { int i, j; } Sub;

By placing the __declspec(align) after the keyword struct, you are requesting the appropriate alignment for all objects of that type. However, that allocation of parameters is unaffected by __declspec(align). (If necessary, you can assign the value of a parameter to a local variable with the appropriate alignment.)

You can also force alignment of global variables, such as arrays:

__declspec(align(16)) float array[1000];