Load Operations for Streaming SIMD Extensions 2

The following load operation intrinsics and their respective instructions are functional in the Streaming SIMD Extensions 2.

The prototypes for Streaming SIMD Extensions 2 intrinsics are in the emmintrin.h header file.

__m128d _mm_load_pd(double const*dp)

(uses MOVAPD) Loads two DP FP values. The address p must be 16-byte aligned.
r0 := p[0]
r1 := p[1]

__m128d _mm_load1_pd(double const*dp)

(uses MOVSD + shuffling) Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.
r0 := *p
r1 := *p

__m128d _mm_loadr_pd(double const*dp)

(uses MOVAPD + shuffling) Loads two DP FP values in reverse order. The address p must be 16-byte aligned.
r0 := p[1]
r1 := p[0]

__m128d _mm_loadu_pd(double const*dp)

(uses MOVUPD) Loads two DP FP values. The address p need not be 16-byte aligned.

r0 := p[0]
r1 := p[1]

__m128d _mm_load_sd(double const*dp)

(uses MOVSD) Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.
r0 := *p
r1 := 0.0

__m128d _mm_loadh_pd(__m128d a, double const*dp)

(uses MOVHPD) Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.
r0 := a0
r1 := *p

__m128d _mm_loadl_pd(__m128d a, double const*dp)

(uses MOVLPD) Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.
r0 := *p
r1 := a1