Integer Intrinsics Using Streaming SIMD Extensions

The integer intrinsics are listed in the table below followed by a description of each intrinsic with the most recent mnemonic naming convention.

The prototypes for Streaming SIMD Extensions intrinsics are in the xmmintrin.h header file.

Intrinsic
Name
Alternate
Name
Operation Corresponding
Instruction
_m_pextrw _mm_extract_pi16 Extract on of four words PEXTRW
_m_pinsrw _mm_insert_pi16 Insert a word PINSRW
_m_pmaxsw _mm_max_pi16 Compute the maximum PMAXSW
_m_pmaxub _mm_max_pu8 Compute the maximum, unsigned PMAXUB
_m_pminsw _mm_min_pi16 Compute the minimum PMINSW
_m_pminub _mm_min_pu8 Compute the minimum, unsigned PMINUB
_m_pmovmskb _mm_movemask_pi8 Create an eight-bit mask PMOVMSKB
_m_pmulhuw _mm_mulhi_pu16 Multiply, return high bits PMULHUW
_m_pshufw _mm_shuffle_pi16 Return a combination of four words PSHUFW
_m_maskmovq _mm_maskmove_si64 Conditional Store MASKMOVQ
_m_pavgb _mm_avg_pu8 Compute rounded average PAVGB
_m_pavgw _mm_avg_pu16 Compute rounded average PAVGW
_m_psadbw _mm_sad_pu8 Compute sum of absolute differences PSADBW

For these intrinsics you need to empty the multimedia state for the mmx register. See The EMMS Instruction: Why You Need It and When to Use It topic for more details.

int _m_pextrw(__m64 a, int n)

Extracts one of the four words of a. The selector n must be an immediate.
r := (n==0) ? a0 : ( (n==1) ? a1 : ( (n==2) ? a2 : a3 ) )

__m64 _m_pinsrw(__m64 a, int d, int n)

Inserts word d into one of four words of a. The selector n must be an immediate.
r0 := (n==0) ? d : a0;
r1 := (n==1) ? d : a1;
r2 := (n==2) ? d : a2;
r3 := (n==3) ? d : a3;

__m64 _m_pmaxsw(__m64 a, __m64 b)

Computes the element-wise maximum of the words in a and b.
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)

__m64 _m_pmaxub(__m64 a, __m64 b)

Computes the element-wise maximum of the unsigned bytes in a and b.
r0 := min(a0, b0)
r1 := min(a1, b1)
...
r7 := min(a7, b7)

__m64 _m_pminsw(__m64 a, __m64 b)

Computes the element-wise minimum of the words in a and b.
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)

__m64 _m_pminub(__m64 a, __m64 b)

Computes the element-wise minimum of the unsigned bytes in a and b.
r0 := min(a0, b0)
r1 := min(a1, b1)
...
r7 := min(a7, b7)

int _m_pmovmskb(__m64 a)

Creates an 8-bit mask from the most significant bits of the bytes in a.
r := sign(a7)<<7 | sign(a6)<<6 |... | sign(a0)

__m64 _m_pmulhuw(__m64 a, __m64 b)

Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results.
r0 := hiword(a0 * b0)
r1 := hiword(a1 * b1)
r2 := hiword(a2 * b2)
r3 := hiword(a3 * b3)

__m64 _m_pshufw(__m64 a, int n)

Returns a combination of the four words of a. The selector n must be an immediate.
r0 := word (n&0x3) of a
r1 := word ((n>>2)&0x3) of a
r2 := word ((n>>4)&0x3) of a
r3 := word ((n>>6)&0x3) of a

void _m_maskmovq(__m64 d, __m64 n, char *p)

Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.
if (sign(n0)) p[0] := d0
if (sign(n1)) p[1] := d1
...
if (sign(n7)) p[7] := d7

__m64 _m_pavgb(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned bytes in a and b.
t = (unsigned short)a0 + (unsigned short)b0
r0 = (t >> 1) | (t & 0x01)
...
t = (unsigned short)a7 + (unsigned short)b7
r7 = (unsigned char)((t >> 1) | (t & 0x01))

__m64 _m_pavgw(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned words in a and b.
t = (unsigned int)a0 + (unsigned int)b0
r0 = (t >> 1) | (t & 0x01)
...
t = (unsigned word)a7 + (unsigned word)b7
r7 = (unsigned short)((t >> 1) | (t & 0x01))

__m64 _m_psadbw(__m64 a, __m64 b)

Computes the sum of the absolute differences of the unsigned bytes in a and b, returning he value in the lower word. The upper three words are cleared.
r0 = abs(a0-b0) +... + abs(a7-b7)
r1 = r2 = r3 = 0