How to compile program with built-in __bf16 type?

Hi everyone, I’m new here. I am interested in supporting llvm data type bfloat16 as i know from here

I tried to get bundle version of llvm and subprojects via this source, but got an error with linker

/usr/bin/ld: /tmp/check_compiler-6f7408.o: in function `intend(__bf16)':
check_compiler.cpp:(.text+0x14): undefined reference to `__truncsfbf2'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

source code of check_compiler.cpp is pretty simple

#include <cmath>
#include <stdio.h>
#include <immintrin.h>
#include <x86intrin.h>

__bf16 intend(__bf16 x) {
    return x;

int main() {
    __bf16 kek;
    kek = intend(kek);
    float t;
    // float r = _mm_cvtsbh_ss(kek);

and works just like a COMPILER_RT_HAS_x86_64_BFLOAT16 Test from compiler-rt.

Then i tried to build llvm, clang and compiler-rt from work tree (git sources) with ninja with this commands:


sudo ninja -j10
sudo ninja -j10 install

It logs successful test proceed

Performing C SOURCE FILE Test COMPILER_RT_HAS_x86_64_BFLOAT16 succeeded with the following output:
Change Dir: /home/shaprunovk/clickhouse/llvm/llvm-project/build-llvm/runtimes/builtins-bins/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/ninja cmTC_c0111 && [1/2] Building C object CMakeFiles/cmTC_c0111.dir/src.c.o
[2/2] Linking C static library libcmTC_c0111.a

Source file was:
__bf16 foo(__bf16 x) { return x; }

and installed static library contains truncdfbf2.c and truncsfbf2.c

-- Up-to-date: /usr/local/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.builtins.a

But i still can’t compile check_compiler.cpp both with clang++-16 and clang++-17 builded from tree, is there any header i need to include to work with __bf16 or what am I doing wrong?

# We only build BF16 files when "__bf16" is available.


 check_c_source_compiles("__bf16 foo(__bf16 x) { return x; }"
      # Build BF16 files only when "__bf16" is available.
        list(APPEND ${arch}_SOURCES ${BF16_SOURCES})

from compiler-rt/lib/builtins/CMakeLists.txt

Are you sure that __bf16 is available on your system? IIRC, Intel introduced bfloat with AVX512BF.

hi, thanks for the possible option!
I am sure because of test succesful proceed and this document says that it should be supported on my system

lscpu output:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  10
  On-line CPU(s) list:   0-9
Vendor ID:               GenuineIntel
  Model name:            Intel Xeon Processor (Cascadelake)
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  5
    Socket(s):           1
    Stepping:            6
    BogoMIPS:            4190.13
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dn
                         owprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   320 KiB (10 instances)
  L1i:                   320 KiB (10 instances)
  L2:                    20 MiB (5 instances)
  L3:                    16 MiB (1 instance)
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-9
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Or maybe you’re right, i find that CascadeLake did not support bf16 avx instruction set

Your link says that Intel supports bfloat with SSE2. Could you try a simpler __bf16 test case?

I didn’t really understand what you mean by simpler :frowning:

i am little confused about requirements for support, sse2 is surely available on my cpu arch with “which includes all 64-bit and all recent 32-bit processors.”
Plus the test of support __bf16 from compiler-rt is done

Sorry. For whatever reasons, your original program needs a truncate.

Understand it, thank you.
But got the same result anyway.

By default, the compiler runtime from gcc is used.
To build with compiler-rt you need to use flag with clang: