This section describes the load, set, and store operations, which let you load and store data into memory. The load and set operations are similar in that both initialize __m128 data. However, the set operations take a float argument and are intended for initialization with constants, whereas the load operations take a floating point argument and are intended to mimic the instructions for loading data from memory. The store operation assigns the initialized data to the address.
The intrinsics are listed in the following table. Syntax and a brief description are contained the following topics.
The prototypes for Streaming SIMD Extensions intrinsics are in the xmmintrin.h header file.
Intrinsic Name |
Alternate Name |
Operation | Corresponding Instruction |
---|---|---|---|
_mm_load_ss | Load the low value and clear the three high values | MOVSS | |
_mm_load_ps1 | _mm_load1_ps | Load one value into all four words | MOVSS + Shuffling |
_mm_load_ps | Load four values, address aligned | MOVAPS | |
_mm_loadu_ps | Load four values, address unaligned | MOVUPS | |
_mm_loadr_ps | Load four values, in reverse order | MOVAPS + Shuffling | |
_mm_set_ss | Set the low value and clear the three high values | Composite | |
_mm_set_ps1 | _mm_set1_ps | Set all four words with the same value | Composite |
_mm_set_ps | Set four values, address aligned | Composite | |
_mm_setr_ps | Set four values, in reverse order | Composite | |
_mm_setzero_ps | Clear all four values | Composite | |
_mm_store_ss | Store the low value | MOVSS | |
_mm_store_ps1 | _mm_store1_ps | Store the low value across all four words. The address must be 16-byte aligned. | Shuffling + MOVSS |
_mm_store_ps | Store four values, address aligned | MOVAPS | |
_mm_storeu_ps | Store four values, address unaligned | MOVUPS | |
_mm_storer_ps | Store four values, in reverse order | MOVAPS + Shuffling | |
_mm_move_ss | Set the low word, and pass in three high values | MOVSS | |
_mm_getcsr | Return register contents | STMXCSR | |
_mm_setcsr | Control Register | LDMXCSR | |
_mm_prefetch | |||
_mm_stream_pi | |||
_mm_stream_ps | |||
_mm_sfence | |||
_mm_cvtss_f32 |
__m128 _mm_load_ss(float const*a)
Loads an SP FP value into the low word and clears the upper three words.
r0 := *a
r1 := 0.0 ; r2 := 0.0 ; r3 := 0.0
__m128 _mm_load_ps1(float const*a)
Loads a single SP FP value, copying it into all four words.
r0 := *a
r1 := *a
r2 := *a
r3 := *a
__m128 _mm_load_ps(float const*a)
Loads four SP FP values. The address must be 16-byte-aligned.
r0 := a[0]
r1 := a[1]
r2 := a[2]
r3 := a[3]
__m128 _mm_loadu_ps(float const*a)
Loads four SP FP values. The address need not be 16-byte-aligned.
r0 := a[0]
r1 := a[1]
r2 := a[2]
r3 := a[3]
__m128 _mm_loadr_ps(float const*a)
Loads four SP FP values in reverse order. The address must be 16-byte-aligned.
r0 := a[3]
r1 := a[2]
r2 := a[1]
r3 := a[0]
__m128 _mm_set_ss(float a)
Sets the low word of an SP FP value to a and clears the upper three words.
r0 := c
r1 := r2 := r3 := 0.0
__m128 _mm_set_ps1(float a)
Sets the four SP FP values to a.
r0 := r1 := r2 := r3 := a
__m128 _mm_set_ps(float a, float b, float c, float d)
Sets the four SP FP values to the four inputs.
r0 := a
r1 := b
r2 := c
r3 := d
__m128 _mm_setr_ps(float a, float b, float c, float d)
Sets the four SP FP values to the four inputs in reverse order.
r0 := d
r1 := c
r2 := b
r3 := a
__m128 _mm_setzero_ps(void)
Clears the four SP FP values.
r0 := r1 := r2 := r3 := 0.0
void _mm_store_ss(float *v, __m128 a)
Stores the lower SP FP value.
*v := a0
void _mm_store_ps1(float *v, __m128 a)
Stores the lower SP FP value across four words.
v[0] := a0
v[1] := a0
v[2] := a0
v[3] := a0
void _mm_store_ps(float *v, __m128 a)
Stores four SP FP values. The address must be 16-byte-aligned.
v[0] := a0
v[1] := a1
v[2] := a2
v[3] := a3
void _mm_storeu_ps(float *v, __m128 a)
Stores four SP FP values. The address need not be 16-byte-aligned.
v[0] := a0
v[1] := a1
v[2] := a2
v[3] := a3
void _mm_storer_ps(float *v, __m128 a)
Stores four SP FP values in reverse order. The address must be 16-byte-aligned.
v[0] := a3
v[1] := a2
v[2] := a1
v[3] := a0
__m128 _mm_move_ss(__m128 a, __m128 b)
Sets the low word to the SP FP value of b. The upper 3 SP FP values are passed through from a.
r0 := b0
r1 := a1
r2 := a2
r3 := a3
unsigned int _mm_getcsr(void)
void _mm_setcsr(unsigned int i)Returns the contents of the control register.
Sets the control register to the value specified.
void _mm_prefetch(char const*a, int sel)
void _mm_stream_pi(__m64 *p, __m64 a)(uses PREFETCH) Loads one cache line of data from address a to a location "closer" to the processor. The value sel specifies the type of prefetch operation: the constants _MM_HINT_T0, _MM_HINT_T1, _MM_HINT_T2, and _MM_HINT_NTA should be used, corresponding to the type of prefetch instruction.
void _mm_stream_ps(float *p, __m128 a)(uses MOVNTQ) Stores the data in a to the address p without polluting the caches. This intrinsic requires you to empty the multimedia state for the mmx register. See The EMMS Instruction: Why You Need It and When to Use It topic.
void _mm_sfence(void)(see MOVNTPS) Stores the data in a to the address p without polluting the caches. The address must be 16-byte-aligned.
float _mm_cvtss_f32(__m128 a)(uses SFENCE) Guarantees that every preceding store is globally visible before any subsequent store.
This intrinsic extracts a single precision floating point value from the first vector element of an __m128. It does so in the most effecient manner possible in the context used. This intrinsic doesn't map to any specific SSE instruction.