This book provides a series of tables that compare intrinsics performance across architectures. Before implementing intrinsics across architectures, please note the following.
Instrinsics may generate code that does not run on all IA processors. Therefore the programmer is responsible for using CPUID to detect the processor and generating the appropriate code.
Implement intrinsics by processor family, not by specific processor. The guiding principle for which family–IA-32 or Itanium(TM) processors–the intrinsic is implemented on is performance, not compatibility. Where there is added performance on both families, the intrinsic will be identical.