Use __declspec(cpu_specific) and __declspec(cpu_dispatch) in your code to generate instructions specific to the Intel processor on which the application is running, and also to execute correctly on other IA-32 processors.
Note
Manual CPU dispatch cannot be used to recognize Intel® Itanium® processors. The syntax of these extended attributes is as follows:
The values for cpuid and cpuid-list are shown in the tables below:
Processor | Values for cpuid |
---|---|
x86 processors not provided by Intel Corporation | generic |
Intel® Pentium® processors | pentium |
Intel Pentium processors with MMX™ Technology | pentium_mmx |
Intel Pentium Pro processors | pentium_pro |
Intel Pentium II processors | pentium_ii |
Intel Pentium III processors | pentium_iii |
Intel Pentium III (exclude xmm registers) | pentium_iii_no_xmm_regs |
Intel Pentium 4 processors | pentium_4 |
Intel Pentium M processors | pentium_m |
Intel processors code-named "Prescott". | future_cpu_10 |
Values for cpuid-list |
---|
cpuid |
cpuid-list, cpuid |
The attributes are not case sensitive. The body of a function declared with __declspec(cpu_dispatch) must be empty, and is referred to as a stub (an empty-bodied function).
Use the following guidelines to implement automatic processor dispatch support:
Here is an example of how these features can be used:
#include <mmintrin.h> /* Pentium processor function does not use intrinsics to add two arrays. */
__declspec(cpu_specific(pentium)) void array_sum(int *r, int *a, int *b,size_t l) { for (; length > 0; l--) *result++ = *a++ + *b++; }
/* Implementation for a Pentium processor with MMX technology uses an MMX instruction intrinsic to add four elements simultaneously. */
__declspec(cpu_specific(pentium_MMX)) void array_sum(int *r,int const *a, int *b, size_t l) { __m64 *mmx_result = (__m64 *)result; __m64 const *mmx_a = (__m64 const *)a; __m64 const *mmx_b = (__m64 const *)b;
for (; length > 3; length -= 4) *mmx_result++ = _mm_add_pi16(*mmx_a++, *mmx_b++);
/* The following code, which takes care of excess elements, is not needed if the array sizes passed are known to be multiples of four. */
result = (unsigned short *)mmx_r; a = (unsigned short const *)mmx_a; b = (unsigned short const *)mmx_b;
for (; length > 0; l--) *result++ = *a++ + *b++; }
__declspec(cpu_dispatch(pentium, pentium_MMX)) void array_sum (int *r,int const *a, int *b, size_t l) )
{
/* Empty function body informs the compiler to generate the CPU-dispatch function listed in the cpu_dispatch clause. */
} |