Integer Intrinsics Using Streaming SIMD Extensions

The integer intrinsics are listed in the table below followed by a description of each intrinsic with the most recent mnemonic naming convention.

Intrinsic Name Operation Corresponding Instruction


Extract on of four words



Insert a word



Compute the maximum



Compute the maximum, unsigned



Compute the minimum



Compute the minimum, unsigned



Create an eight-bit mask



Multiply, return high bits



Return a combination of four words



Conditional Store



Compute rounded average



Compute rounded average



Compute sum of absolute differences


For this topic you need to ensure to empty the multimedia state for the mmx register. See The EMMS Instruction: Why You Need It and When to Use It topic for more details.

int _mm_extract_pi16(__m64 a, int n )

Extracts one of the four words of a. The selector n must be an immediate.

r := (n==0) ? a0 : ( (n==1) ? a1 : ( (n==2) ? a2 : a3 ) )


__m64 _mm_insert_pi16(__m64 a, int d, int n )

Inserts word d into one of four words of a. The selector n must be an

r0 := (n==0) ? d : a0;

r1 := (n==1) ? d : a1;

r2 := (n==2) ? d : a2;

r3 := (n==3) ? d : a3;


__m64 _mm_max_pi16(__m64 a, __m64 b )

Computes the element-wise maximum of the words in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

r2 := min(a2, b2)

r3 := min(a3, b3)


__m64 _mm_max_pu8(__m64 a, __m64 b )

Computes the element-wise maximum of the unsigned bytes in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)


r7 := min(a7, b7)


__m64 _mm_min_pi16(__m64 a, __m64 b )

Computes the element-wise minimum of the words in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

r2 := min(a2, b2)

r3 := min(a3, b3)


__m64 _mm_min_pu8(__m64 a, __m64 b )

Computes the element-wise minimum of the unsigned bytes in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)


r7 := min(a7, b7)


int _mm_movemask_pi8(__m64 a )

Creates an 8-bit mask from the most significant bits of the bytes in a.

r := sign(a7)<<7 | sign(a6)<<6 |... | sign(a0)


__m64 _mm_mulhi_pu16(__m64 a, __m64 b )

Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results.

r0 := hiword(a0 * b0)

r1 := hiword(a1 * b1)

r2 := hiword(a2 * b2)

r3 := hiword(a3 * b3)


__m64 _mm_shuffle_pi16(__m64 a, int n )

Returns a combination of the four words of a. The selector n must be an immediate.

r0 := word (n&0x3) of a

r1 := word ((n>>2)&0x3) of a

r2 := word ((n>>4)&0x3) of a

r3 := word ((n>>6)&0x3) of a


void _mm_maskmove_si64(__m64 d, __m64 n, char * p)

Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.

if (sign(n0)) p[0] := d0

if (sign(n1)) p[1] := d1


if (sign(n7)) p[7] := d7


__m64 _mm_avg_pu8(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned bytes in a and b.

t = (unsigned short)a0 + (unsigned short)b0

r0 = (t >> 1) | (t & 0x01)


t = (unsigned short)a7 + (unsigned short)b7

r7 = (unsigned char)((t >> 1) | (t & 0x01))


__m64 _mm_avg_pu16(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned words in a and b.

t = (unsigned int)a0 + (unsigned int)b0

r0 = (t >> 1) | (t & 0x01)


t = (unsigned word)a7 + (unsigned word)b7

r7 = (unsigned short)((t >> 1) | (t & 0x01))


__m64 _mm_sad_pu8(__m64 a, __m64 b)

Computes the sum of the absolute differences of the unsigned bytes in a and b, returning he value in the lower word. The upper three words are cleared.

r0 = abs(a0-b0) +... + abs(a7-b7)

r1 = r2 = r3 = 0