Integer Intrinsics Using Streaming SIMD Extensions

The integer intrinsics are listed in the table below followed by a description of each intrinsic with the most recent mnemonic naming convention.

Intrinsic Name Operation Corresponding Instruction

_mm_extract_pi16

Extract on of four words

PEXTRW

_mm_insert_pi16

Insert a word

PINSRW

_mm_max_pi16

Compute the maximum

PMAXSW

_mm_max_pu8

Compute the maximum, unsigned

PMAXUB

_mm_min_pi16

Compute the minimum

PMINSW

_mm_min_pu8

Compute the minimum, unsigned

PMINUB

_mm_movemask_pi8

Create an eight-bit mask

PMOVMSKB

_mm_mulhi_pu16

Multiply, return high bits

PMULHUW

_mm_shuffle_pi16

Return a combination of four words

PSHUFW

_mm_maskmove_si64

Conditional Store

MASKMOVQ

_mm_avg_pu8

Compute rounded average

PAVGB

_mm_avg_pu16

Compute rounded average

PAVGW

_mm_sad_pu8

Compute sum of absolute differences

PSADBW

For this topic you need to ensure to empty the multimedia state for the mmx register. See The EMMS Instruction: Why You Need It and When to Use It topic for more details.

int _mm_extract_pi16(__m64 a, int n )

Extracts one of the four words of a. The selector n must be an immediate.

r := (n==0) ? a0 : ( (n==1) ? a1 : ( (n==2) ? a2 : a3 ) )

 

__m64 _mm_insert_pi16(__m64 a, int d, int n )

Inserts word d into one of four words of a. The selector n must be an
immediate.

r0 := (n==0) ? d : a0;

r1 := (n==1) ? d : a1;

r2 := (n==2) ? d : a2;

r3 := (n==3) ? d : a3;

 

__m64 _mm_max_pi16(__m64 a, __m64 b )

Computes the element-wise maximum of the words in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

r2 := min(a2, b2)

r3 := min(a3, b3)

 

__m64 _mm_max_pu8(__m64 a, __m64 b )

Computes the element-wise maximum of the unsigned bytes in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

...

r7 := min(a7, b7)

 

__m64 _mm_min_pi16(__m64 a, __m64 b )

Computes the element-wise minimum of the words in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

r2 := min(a2, b2)

r3 := min(a3, b3)

 

__m64 _mm_min_pu8(__m64 a, __m64 b )

Computes the element-wise minimum of the unsigned bytes in a and b.

r0 := min(a0, b0)

r1 := min(a1, b1)

...

r7 := min(a7, b7)

 

int _mm_movemask_pi8(__m64 a )

Creates an 8-bit mask from the most significant bits of the bytes in a.

r := sign(a7)<<7 | sign(a6)<<6 |... | sign(a0)

 

__m64 _mm_mulhi_pu16(__m64 a, __m64 b )

Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results.

r0 := hiword(a0 * b0)

r1 := hiword(a1 * b1)

r2 := hiword(a2 * b2)

r3 := hiword(a3 * b3)

 

__m64 _mm_shuffle_pi16(__m64 a, int n )

Returns a combination of the four words of a. The selector n must be an immediate.

r0 := word (n&0x3) of a

r1 := word ((n>>2)&0x3) of a

r2 := word ((n>>4)&0x3) of a

r3 := word ((n>>6)&0x3) of a

 

void _mm_maskmove_si64(__m64 d, __m64 n, char * p)

Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.

if (sign(n0)) p[0] := d0

if (sign(n1)) p[1] := d1

...

if (sign(n7)) p[7] := d7

 

__m64 _mm_avg_pu8(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned bytes in a and b.

t = (unsigned short)a0 + (unsigned short)b0

r0 = (t >> 1) | (t & 0x01)

...

t = (unsigned short)a7 + (unsigned short)b7

r7 = (unsigned char)((t >> 1) | (t & 0x01))

 

__m64 _mm_avg_pu16(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned words in a and b.

t = (unsigned int)a0 + (unsigned int)b0

r0 = (t >> 1) | (t & 0x01)

...

t = (unsigned word)a7 + (unsigned word)b7

r7 = (unsigned short)((t >> 1) | (t & 0x01))

 

__m64 _mm_sad_pu8(__m64 a, __m64 b)

Computes the sum of the absolute differences of the unsigned bytes in a and b, returning he value in the lower word. The upper three words are cleared.

r0 = abs(a0-b0) +... + abs(a7-b7)

r1 = r2 = r3 = 0