How to build compiler-rt for new X86 half float ABI

Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type sup… · llvm/llvm-project@655ba9c · GitHub changed half floats from being passed in GPRs to XMM registers. This breaks compatibility with anything that passes half as uint16_t, including compiler-rt.

LLVM generates a call to __truncsfhf2 to cast float->half when the hardware doesn’t have an instruction for it. This now requires rebuilding compiler-rt with the new ABI by configuring it with COMPILER_RT_HAS_FLOAT16 but this is also impossible because _Float16 is only available in clang when avx512fp16 is enabled.

Are there any options for

  • Building a compiler-rt that works at all?
  • Building a compiler-rt that can support multiple ABIs?

@phoebe

@d0k Yes, this is a problem. We need the front-end patch to enable the support of _Float16. I think we can split that patch to enable the type support first.

Split by ⚙ D128571 [X86] Support `_Float16` on SSE2 and up

I think I just ran into a problem with this, if hosting with a relatively new clang on a x86_64 machine (which also has 32-bit support), you get during the configuration phase:

-- Performing Test COMPILER_RT_HAS_FLOAT16 - Success

but when it builds the i386 variants of compiler-rt builtins, you get an error:

In file included from compiler-rt/lib/builtins/extendhfsf2.c:11:
In file included from compiler-rt/lib/builtins/fp_extend_impl.inc:38:
compiler-rt/lib/builtins/fp_extend.h:44:9: error: _Float16 is not supported on this target
typedef _Float16 src_t;
        ^

Any ideas?

@DimitryAndric I think the problem is you run cmake for 64 bit target, while built for 32 bit target. I think there are two ways to solve it:

  1. Add the option -m32 to both CFLAGS/CXXFLAGS/LDFLAGS etc. when you run cmake;
  2. Add option -msse2 to CFLAGS/CXXFLAGS etc.

The latter should be better.

No, that’s not the problem. This is a regular build hosted on, and targeting, x86_64. What happens is that compiler-rt has a sub-CMake thing where it builds both x86_64 and i386 versions of its builtins and sanitizer libraries.

I am not sure how this works, exactly, but it does not seem to run the configure steps (that determine whether Float16 is supported) for each of the target architectures separately. It seems to reuse top-level CMake configuration flags for each of the x86_64 and i386 builds.

Investigating further… (btw I wonder why this doesn’t break on Linux. I’m using FreeBSD here, so maybe this is taking another path.)

I don’t have much experience in using cmake. But I think the second method, i.e., adding -msse2 (maybe to the topmost cmake command) should work for both x86_64 and i386 versions.

Regarding Linux, I think it should have the same problem if you build with the same steps. The root cause is i386 doesn’t imply SSE2 while x86_64 does. And the support of _Float16 depends on SSE2.

Sure, but the target i386 arch might not support SSE2, so I don’t always want to force it on. There needs to be some way of telling the i386-specific builds to use different compilation flags from the x86_64 builds. In my case I could patch it to add a && !defined(__i386__) but it’s better to solve it in the general case.

1 Like

How about check SSE2: defined(__SSE2__)? This should be a bit more general :slight_smile:

Ah I think I at least understand now why Linux builds might go better, since recent versions of gcc appear to default to enabling SSE2 when using -m32, e.g.:

% gcc8 -v
...
gcc version 8.5.0 (FreeBSD Ports Collection)

% gcc8 -m32 -dM -E -x c /dev/null|grep SSE
#define __SSE__ 1
#define __SSE2__ 1

while this not the case for clang -m32; you have to explicitly enable it with -msse2.

That’s interesting. I was under the impression GCC doesn’t enable SSE2 implicitly the same as Clang. But I found it enabled it since 4.5.3
Maybe we can make the 32bits imply SSE2 as well. @topperc WDYT?

For 32-bit we use a different default CPU based on the OS.

  switch (Triple.getOS()) {                                                      
  case llvm::Triple::NetBSD:                                                     
    return "i486";                                                               
  case llvm::Triple::Haiku:                                                      
  case llvm::Triple::OpenBSD:                                                    
    return "i586";                                                               
  case llvm::Triple::FreeBSD:                                                    
    return "i686";                                                               
  default:                                                                       
    // Fallback to p4.                                                           
    return "pentium4";                                                           
  }     

This is based on the minimum requirements of the respective operating systems. Though it could be out of date. Note that clang is the default compiler for at least FreeBSD. So if FreeBSD doesn’t require a CPU with SSE2, clang can’t require it.

I’m now building with this diff:

diff --git a/compiler-rt/lib/builtins/fp_extend.h b/compiler-rt/lib/builtins/fp_extend.h
index eee4722bf90e..bd279424394b 100644
--- a/compiler-rt/lib/builtins/fp_extend.h
+++ b/compiler-rt/lib/builtins/fp_extend.h
@@ -40,7 +40,7 @@ static __inline int src_rep_t_clz(src_rep_t a) {
 }

 #elif defined SRC_HALF
-#ifdef COMPILER_RT_HAS_FLOAT16
+#if defined(__FLT16_MAX__)
 typedef _Float16 src_t;
 #else
 typedef uint16_t src_t;

which works for me, and removes the need for a configure-time check. I.e. the compiler (either clang or gcc) only defines __FLT16_*__ macros if float16 support is available, so you might as well use it.

Submitted a more complete fix in ⚙ D130718 [compiler-rt] [builtins] Detect _Float16 support at compile time

There is a fundamental problem here.

I work on another project where we use LLVM as code generator. When dealing with half-precision floating point, the code generator may emit calls to the builtin conversion functions. We will generate an object file, which may then be linked against other object files generated by clang (i.e. the standalone compiler). If the version of the clang doesn’t match the version of the code generator, we can have a situation where __extendhfsf2 expects uint16_t, but gets _Float16, and we have no way of telling the user that this is happening.

Can we rename the _Float16 functions to __truncsfhf3 and __extendsfhf3 or something like that?

@kparzysz-quic raised a good point! I think we can’t make ABI ambiguity within the same function. I think providing 2 versions of function in compiler-rt is a good idea. Besides, codegen has already generating different symbols for different ABI: Compiler Explorer

So in this case, you would only emit the sfhf3 variants, iff COMPILER_RT_HAS_FLOAT16 is true? E.g. it will still depend on how your configure phase went on the particular host you are building on. I think that’s fundamentally wrong too. :slight_smile:

The idea there was that the compiler rt would have all of these functions, with the sfhf2 variants using the integer ABI, and the sfhf3 variants using floating point ABI[1]. What the code generator emits wouldn’t matter as long as it conforms to the ABIs. LLVM 15+ could emit the 3 variants, and if the compiler rt used for linking didn’t have these symbols, the linking would fail.

Given that GCC has started using the 2 variants with floating point ABI, this solution no longer works…

[1] We could have them written in assembly for cases where the compiler used to build the rt didn’t support float16.

Maybe for your system, but that’s not always true. Is your GCC configured to use -march=pentium4 or -march=prescott by default? The output of gcc -v should answer that.

If GCC is configured with e.g. --with-arch-32=i686 (which is the case for several popular linux distros) then the default for -m32 codegen is -march=i686 which doesn’t include the SSE2 instruction set.

So this definitely affects GCC builds too.

1 Like