AlternativeUniversity.net
Alternative University

Computer Science

C/C++ Programming

Compiler Intrinsics for x64

Compiler intrinsics for personal computers (x64) are mixed-language programming that calls pre-defined assembly language macros for you, so that you do not have to write complete assembly language code.

Intrinsics are predefined by the compiler. Instead of writing assembly language statements, simply include a header file that declares instrinsic function prototypes, and use those declared function prototypes as if they are functions.

In this article we look at how to use compiler intrinsics on 64-bit Microsoft Windows using the GCC compiler with the Code Blocks IDE, and using the Visual Studio compiler and IDE.


CPUID

Intrinsics are executed directly on the computer processor that is running the application program. A computer processor is also called a computer processing unit or CPU. Modern CPUs actually have more than one processor, called processing elements (PEs). All of the PEs of a CPU are clustered together and managed by the CPU architecture to be interfaced as if a single processor, which is simply referred to as the computer processor or CPU.

The CPUID intrinsic is available on all x64 computers. It provides information about the computer processor, and lets you know what other intrinsics are available.

The following Wikipedia page describes uses of the CPUID assembler instruction and compiler intrinsic:

https://en.wikipedia.org/wiki/CPUID

At the end of that page are links to Intel and AMD documents that further describe use of the CPUID instruction. The AMD document states the following:

CPUID Processor Identification : Provides information about the processor and its capabilities through a number of different functions. Software should load the number of the CPUID function to execute into the EAX register before executing the CPUID instruction. The processor returns information in the EAX, EBX, ECX, and EDX registers; the contents and format of these registers depend on the function.”

The CPUID intrinsic automatically references the correct processor registers based on the order of its function prototype arguments. You just have to list the arguments in the correct order.

The CPUID instruction invokes one of several different “standard” functions that provide information about the computer processor (CPU). The function that is invoked depends on the function number that you provide. Function number zero (0) returns the brand name of the processor, and also returns the maxiumum standard function number that is supported.

CPUID can also invoke “extended” functions that may not be available on all computers. This article only covers CPUID usage for standard function numbers. Examples using CPUID with extended function numbers are given in the Wikipedia link above.

When using another standard function number, before invoking it, make sure it is not greater than the maximum standard function number supported. Function number 0 is always available, if the computer is a modern personal computer (x64).


Our first example uses Code Blocks with the GCC compiler to retrieve the computer processor brand and the maximum standard function number supported by CPUID.

Start up Code Blocks, and select “Create a new project”. For project type, select “Console application”. In the Console application window, you can choose C or C++. For this example, we will choose C. This should also work with C++.

After choosing C or C++, specify a project name and folder to store the project. For name, we are specifying “CPUID”, to be saved in a folder called “Examples”. When prompted to choose a compiler, select GCC.

When the project is created and opens, in the app menu bar (near top of app window) select View and make sure Manager is checked.

In the Management window, double-click on main.c:

The source code should look like this:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  printf("Hello world!\n");
  return 0;
}

To compile and run, click on the gear then the green arrow, or click on the combination gear and green arrow button. A console window should pop up, displaying the text “Hello world!”

Now, above those include statements, insert the following include statements:

#include <cpuid.h>
#include <stdint.h>

The file cpuid.h is automatically provided by GCC and includes the following intrinsic function prototype:

__cpuid(level, a, b, c, d)

The first argument is the CPUID standard function number, which specifies the standard CPUID function to call, and the subsequent arguments are non-null pointers to store the resulting contents of the EAX, EBX, ECX and EDX registers respectively.

Function number zero returns the brand name of the computer processor, as 12 ASCII characters effectively stored in three 4-byte unsigned integers (i.e., stored in three CPU registers which the intrinsic copies to three integers you provide as arguments a, b and c in the function prototype above).

Now, in the main() function, replace the printf statement with these statements:

uint32_t maxnum;
unsigned __int128 u = 0;
__cpuid (0, maxnum,
    *((uint32_t*)&u+0x0),
    *((uint32_t*)&u+0x2),
    *((uint32_t*)&u+0x1));
printf("Brand: %s\n", u);

Compile and run the program. A console window will pop up displaying “AuthenticAMD” if the processor is manufactured by Advanced Micro Devices (AMD), or displaying “GenuineIntel” if the processor is Intel or Intel-compatible (such as a virtual machine that supports Intel intrinsics).

Note that the integers storing the ASCII bytes use little endian byte order, and the second and third integers are swapped.


MUL128

Microsoft Visual Studio predefines a MUL128 intrinsic that multiplies two 64-bit integers, returning the low order 64 bits of the product, and storing the high order 64 bits in a pointer provided as an argument of the intrinsic function prototype. This example shows how to invoke that intrinsic.

Start up Visual Studio, select Create a new project, choose Console application, and provide a project name and folder. When the project is created, a source code file appears in the editor. Compile and run the file. A pop-up console window will display “Hello world!”

After making sure that source code works, replace the source code with the following source code:

#include <stdio.h>
#include <intrin.h>
int main()
{
  _int64 a, b, c, d;
  a = 0xfffffffffffffffLL;
  b = 0xf0000000LL;
  d = _mul128(a, b, &c);
  printf(
    "%#llx * %#llx\n = %#llx%llx\n",
    a, b, c, d);
}

Compile and run the program. A console window will display the following:

0xfffffffffffffff * 0xf0000000
= 0xeffffffffffffff10000000

For more information about this intrinsic, see:

https://learn.microsoft.com/en-us/cpp/intrinsics/mul128


AVX2 (SIMD)

AVX intrinsics can be used to perform SIMD at the CPU level.

SIMD is an abbreviation for single instruction multiple data. In this type of processing, a single processing instruction is performed on multiple CPU data channels (parallel processing).

The purpose of SIMD is to broadcast a single instruction to multiple processing elements (PEs) in the CPU. Each PE performs the instruction on data accessed by the PE, at the same time the other PEs are performing the same instruction on their data.

Advanced Vector Extensions (AVX) perform SIMD with a total SIMD data width of 128 bits (16 bytes). This data width can handle four 4-byte numbers at a time, two 8-byte numbers at a time, etc.

AVX2 is an extension of AVX that supports SIMD data width of 256 bits (32 bytes). This allows processing of eight 4-byte numbers at a time, four 8-byte numbers at a time, etc.

All modern personal computer processors support AVX2. There is a very small number of micro-computers that do not meet personal computer (PC) standards and do not support AVX2. The CPUID instruction is used to determine if AVX2 is available on the processor.

To verify that AVX2 is supported, invoke CPUID instruction number 7. If the resulting EBX register has bit 5 set (equal to 1), then AVX2 is supported. If that bit is clear (zero), then AVX2 is not supported, which is very unlikely.

Note: Before invoking CPUID function number 7, invoke function number 0, to verify that the maximum standard function number is at least 7.

The following example illustrates using CPUID with Visual Studio to determine whether AVX2 is supported:

#include <stdio.h>
#include <intrin.h>
int main()
{
  int cpuInfo[4];
  __cpuid(cpuInfo, 0);
  printf("Max num: %d\n", cpuInfo[0]);
  if (cpuInfo[0] >= 7) {
    __cpuid(cpuInfo, 7);
    if (
      (cpuInfo[1] & (1 << 5))
      == (1 << 5))
      printf("AVX2 is supported.\n");
  }
}


One of the AVX2 intrinsics is _mm256_add_epi64 which is documented on the following page at the Intel web site:

AVX2 Intrinsic: _mm256_add_epi64

This intrinsic performs component-wise addition of two vectors. Each vector has four components. Each component is a 64-bit integer.

#include <stdio.h>
#include <stdint.h>
#include <immintrin.h>
int main()
{
  int64_t vec1[4] = {1, 2, 3, 4};
  int64_t vec2[4] = {5, 6, 7, 8};
  int64_t sum[4];
  __m256i a, b, c;
  a = _mm256_loadu_si256((__m256i*)vec1);
  b = _mm256_loadu_si256((__m256i*)vec2);
  c = _mm256_add_epi64(a, b);
  _mm256_storeu_si256((__m256i*)sum, c);
  printf(
    "sum: %lld %lld %lld %lld\n",
    sum[0], sum[1], sum[2], sum[3]);
  }
}

A console window displays the following result:

sum: 6 8 10 12