Naming and Usage Syntax

Most of the intrinsic names use a notational convention as follows:

_mm_<intrin_op>_<suffix>

<intrin_op>

Indicates the intrinsics basic operation; for example, add for addition and sub for subtraction.

<suffix>

Denotes the type of data operated on by the instruction. The first one or two letters of each suffix denotes whether the data is packed (p), extended packed (ep), or scalar (s). The remaining letters denote the type:

s single-precision floating point

d double-precision floating point

i128 signed 128-bit integer

i64 signed 64-bit integer

u64 unsigned 64-bit integer

i32 signed 32-bit integer

u32 unsigned 32-bit integer

i16 signed 16-bit integer

u16 unsigned 16-bit integer

i8 signed 8-bit integer

u8 unsigned 8-bit integer

A number appended to a variable name indicates the element of a packed object. For example, r0 is the lowest word of r. Some intrinsics are "composites" because they require more than one instruction to implement them.

The packed values are represented in right-to-left order, with the lowest value being used for scalar operations. Consider the following example operation:

double a[2] = {1.0, 2.0};

__m128d t = _mm_load_pd(a);

The result is the same as either of the following:

__m128d t = _mm_set_pd(2.0, 1.0);

__m128d t = _mm_setr_pd(1.0, 2.0);

In other words, the xmm register that holds the value t will look as follows:

The "scalar" element is 1.0. Due to the nature of the instruction, some
intrinsics require their arguments to be immediates (constant integer literals).

Intrinsic Syntax

To use an intrinsic in your code, insert a line with the following syntax:

data_type intrinsic_name (parameters)

Where,

data_type	Is the return data type, which can be either void, int, __m64, __m128, __m128d, __m128i, __int64. Intrinsics that can be implemented across all IA may return other data types as well, as indicated in the intrinsic syntax definitions.
intrinsic_name	Is the name of the intrinsic, which behaves like a function that you can use in your C++ code instead of inlining the actual instruction.
parameters	Represents the parameters required by each intrinsic.