The following load operation intrinsics and their respective instructions are functional in the Streaming SIMD Extensions 2.
The prototypes for Streaming SIMD Extensions 2 intrinsics are in the emmintrin.h header file.
__m128d _mm_load_pd(double const*dp)
(uses MOVAPD) Loads two DP FP values. The address p must be 16-byte aligned.
r0 := p[0]
r1 := p[1]
__m128d _mm_load1_pd(double const*dp)
(uses MOVSD + shuffling) Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.
r0 := *p
r1 := *p
__m128d _mm_loadr_pd(double const*dp)
(uses MOVAPD + shuffling) Loads two DP FP values in reverse order. The address p must be 16-byte aligned.
r0 := p[1]
r1 := p[0]
__m128d _mm_loadu_pd(double const*dp)
(uses MOVUPD) Loads two DP FP values. The address p need not be 16-byte aligned.
r0 := p[0]
r1 := p[1]
__m128d _mm_load_sd(double const*dp)
(uses MOVSD) Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.
r0 := *p
r1 := 0.0
__m128d _mm_loadh_pd(__m128d a, double const*dp)
(uses MOVHPD) Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.
r0 := a0
r1 := *p
__m128d _mm_loadl_pd(__m128d a, double const*dp)
(uses MOVLPD) Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.
r0 := *p
r1 := a1