Multimedia Additions

The prototypes for these intrinsics are in the ia64intrin.h header file.

Intrinsic	Corresponding Instruction
__int64 _m64_czx1l(__m64 a)	czx1.l (Compute Zero Index)
__int64 _m64_czx1r(__m64 a)	czx1.r (Compute Zero Index)
__int64 _m64_czx2l(__m64 a)	czx2.l (Compute Zero Index)
__int64 _m64_czx2r(__m64 a)	czx2.r (Compute Zero Index)
__m64 _m64_mix1l(__m64 a, __m64 b)	mix1.l (Mix)
__m64 _m64_mix1r(__m64 a, __m64 b)	mix1.r (Mix)
__m64 _m64_mix2l(__m64 a, __m64 b)	mix2.l (Mix)
__m64 _m64_mix2r(__m64 a, __m64 b)	mix2.r (Mix)
__m64 _m64_mix4l(__m64 a, __m64 b)	mix4.l (Mix)
__m64 _m64_mix4r(__m64 a, __m64 b)	mix4.r (Mix)
__m64 _m64_mux1(__m64 a, const int n)	mux1 (Mux)
__m64 _m64_mux2(__m64 a, const int n)	mux2 (Mux)
__m64 _m64_padd1uus(__m64 a, __m64 b)	padd1.uus (Parallel add)
__m64 _m64_padd2uus(__m64 a, __m64 b)	padd2.uus (Parallel add)
__m64 _m64_pavg1_nraz(__m64 a, __m64 b)	pavg1 (Parallel average)
__m64 _m64_pavg2_nraz(__m64 a, __m64 b)	pavg2 (Parallel average)
__m64 _m64_pavgsub1(__m64 a, __m64 b)	pavgsub1 (Parallel average subtract)
__m64 _m64_pavgsub2(__m64 a, __m64 b)	pavgsub2 (Parallel average subtract)
__m64 _m64_pmpy2r(__m64 a, __m64 b)	pmpy2.r (Parallel multiply)
__m64 _m64_pmpy2l(__m64 a, __m64 b)	pmpy2.l (Parallel multiply)
__m64 _m64_pmpyshr2(__m64 a, __m64 b, const int count)	pmpyshr2 (Parallel multiply and shift right)
__m64 _m64_pmpyshr2u(__m64 a, __m64 b, const int count)	pmpyshr2.u (Parallel multiply and shift right)
__m64 _m64_pshladd2(__m64 a, const int count, __m64 b)	pshladd2 (Parallel shift left and add)
__m64 _m64_pshradd2(__m64 a, const int count, __m64 b)	pshradd2 (Parallel shift right and add)
__m64 _m64_psub1uus(__m64 a, __m64 b)	psub1.uus (Parallel subtract)
__m64 _m64_psub2uus(__m64 a, __m64 b)	psub2.uus (Parallel subtract)

__int64 _m64_czx1l(__m64 a)

The 64-bit value a is scanned for a zero element from the most significant element to the least significant element, and the index of the first zero element is returned. The element width is 8 bits, so the range of the result is from 0 - 7. If no zero element is found, the default result is 8.

__int64 _m64_czx1r(__m64 a)

The 64-bit value a is scanned for a zero element from the least significant element to the most significant element, and the index of the first zero element is returned. The element width is 8 bits, so the range of the result is from 0 - 7. If no zero element is found, the default result is 8.

__int64 _m64_czx2l(__m64 a)

The 64-bit value a is scanned for a zero element from the most significant element to the least significant element, and the index of the first zero element is returned. The element width is 16 bits, so the range of the result is from 0 - 3. If no zero element is found, the default result is 4.

__int64 _m64_czx2r(__m64 a)

The 64-bit value a is scanned for a zero element from the least significant element to the most significant element, and the index of the first zero element is returned. The element width is 16 bits, so the range of the result is from 0 - 3. If no zero element is found, the default result is 4.

__m64 _m64_mix1l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 1-byte groups, starting from the left, as shown in Figure 1, and return the result.

__m64 _m64_mix1r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 1-byte groups, starting from the right, as shown in Figure 2, and return the result.

__m64 _m64_mix2l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 2-byte groups, starting from the left, as shown in Figure 3, and return the result.

__m64 _m64_mix2r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 2-byte groups, starting from the right, as shown in Figure 4, and return the result.

__m64 _m64_mix4l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 4-byte groups, starting from the left, as shown in Figure 5, and return the result.

__m64 _m64_mix4r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 4-byte groups, starting from the right, as shown in Figure 6, and return the result.

__m64 _m64_mux1(__m64 a, const int n)

Based on the value of n, a permutation is performed on a as shown in Figure 7, and the result is returned. Table 1 shows the possible values of n.

Table 1. Values of n for m64_mux1 Operation

n

@brcst 0

@mix 8

@shuf 9

@alt 0xA

@rev 0xB

	n
@brcst	0
@mix	8
@shuf	9
@alt	0xA
@rev	0xB

__m64 _m64_mux2(__m64 a, const int n)

Based on the value of n, a permutation is performed on a as shown in Figure 8, and the result is returned.

__m64 _m64_pavgsub1(__m64 a, __m64 b)

The unsigned data elements (bytes) of b are subtracted from the unsigned data elements (bytes) of a and the results of the subtraction are then each independently shifted to the right by one position. The high-order bits of each element are filled with the borrow bits of the subtraction.

__m64 _m64_pavgsub2(__m64 a, __m64 b)

The unsigned data elements (double bytes) of b are subtracted from the unsigned data elements (double bytes) of a and the results of the subtraction are then each independently shifted to the right by one position. The high-order bits of each element are filled with the borrow bits of the subtraction.

__m64 _m64_pmpy2l(__m64 a, __m64 b)

Two signed 16-bit data elements of a, starting with the most significant data element, are multiplied by the corresponding two signed 16-bit data elements of b, and the two 32-bit results are returned as shown in Figure 9.

__m64 _m64_pmpy2r(__m64 a, __m64 b)

Two signed 16-bit data elements of a, starting with the least significant data element, are multiplied by the corresponding two signed 16-bit data elements of b, and the two 32-bit results are returned as shown in Figure 10.

__m64 _m64_pmpyshr2(__m64 a, __m64 b, const int count)

The four signed 16-bit data elements of a are multiplied by the corresponding signed 16-bit data elements of b, yielding four 32-bit products. Each product is then shifted to the right count bits and the least significant 16 bits of each shifted product form 4 16-bit results, which are returned as one 64-bit word.

__m64 _m64_pmpyshr2u(__m64 a, __m64 b, const int count)

The four unsigned 16-bit data elements of a are multiplied by the corresponding unsigned 16-bit data elements of b, yielding four 32-bit products. Each product is then shifted to the right count bits and the least significant 16 bits of each shifted product form 4 16-bit results, which are returned as one 64-bit word.

__m64 _m64_pshladd2(__m64 a, const int count, __m64 b)

a is shifted to the left by count bits and then is added to b. The upper 32 bits of the result are forced to 0, and then bits [31:30] of b are copied to bits [62:61] of the result. The result is returned.

__m64 _m64_pshradd2(__m64 a, const int count, __m64 b)

The four signed 16-bit data elements of a are each independently shifted to the right by count bits (the high order bits of each element are filled with the initial value of the sign bits of the data elements in a); they are then added to the four signed 16-bit data elements of b. The result is returned.

__m64 _m64_padd1uus(__m64 a, __m64 b)

a is added to b as eight separate byte-wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_padd2uus(__m64 a, __m64 b)

a is added to b as four separate 16-bit wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_psub1uus(__m64 a, __m64 b)

a is subtracted from b as eight separate byte-wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_psub2uus(__m64 a, __m64 b)

a is subtracted from b as four separate 16-bit wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_pavg1_nraz(__m64 a, __m64 b)

The unsigned byte-wide data elements of a are added to the unsigned byte-wide data elements of b and the results of each add are then independently shifted to the right by one position. The high-order bits of each element are filled with the carry bits of the sums.

__m64 _m64_pavg2_nraz(__m64 a, __m64 b)

The unsigned 16-bit wide data elements of a are added to the unsigned 16-bit wide data elements of b and the results of each add are then independently shifted to the right by one position. The high-order bits of each element are filled with the carry bits of the sums.