[RFC] Explore using LLVM libc math routines for compiler-rt floating point builtins

Background

compiler-rt builtins provide essential floating-point computation routines to support runtime execution when the underlying hardware does not support specific floating-point types natively. Unfortunately, the current implementations are under tested, leaving subtle numerical bugs lying dormant for years (see for instance PR#119449). Additionally, they are written in a mix of C and hand-written assembly, which substantially increases the maintenance burden.

On the other hand, LLVM libc implements these basic floating-point operations with rigorous and extensive testing. A recent effort (see also Issue #147386) has refactored LLVM libc math routines to be free-standing and header-only. This allows other LLVM projects to consume LLVM libc logic directly via header inclusions without introducing complex build and link dependencies.

We would like to explore the feasibilities of replacing the compiler-rt floating point builtins with LLVM libc’s equivalents (either generic C versions or specialized ASM versions, or both).

Proposed Plan

  • Ensure basic floating-point operations in LLVM libc are exposed as shared, free-standing headers suitable for inclusion in compiler-rt.
  • Add C++ generic implementations of compiler-rt builtins such as __addtf3 that simply include the corresponding LLVM libc shared headers and call equivalent functions.
  • Add a CMake configuration flag like COMPILER_RT_USE_LIBC_MATH and put all the C++ generic implementations behind this flag.
  • Optimize the code size and performance, especially for embedded targets.

Other Considerations

  • By putting everything behind a CMake configuration flag, we would like to make sure that this exploration is completely transparent and non-disruptive to all the current usage of compiler-rt. It will also help us to spin up build bots and ease the future transitions if it is proved to work well.
  • It is then possible to easily extend support for other floating point math features in compiler-rt builtins in coordination with libc, such as emulating floating point exceptions or rounding modes, without direct or circular dependencies between the projects.

Please let me know if you have any feedback or objections. Thanks!
@compnerd @scanon @statham-arm @petrhosek

1 Like

I hadn’t been aware of the LLVM libc arithmetic functions until now, so thanks for bringing those to my attention. (I wouldn’t have thought to look there, because I assumed any work in that area would have happened in compiler-rt, where the existing functions are!)

I agree that the compiler-rt functions are currently under-tested. The set of optimized AArch32 assembly functions I’m preparing to submit as a PR stack ([RFC] Improved AArch32 FP arithmetic from Arm Optimized Routines) also comes with a large set of extra test cases, which I’m hoping will be of use to non-Arm targets too. They already found one bug in a compiler-rt routine, although happily it’s one of the ones I was already going to try to replace.

Looking at the LLVM libc routines, I think probably my set of optimized assembly functions is still likely to have value for people aiming at high performance on AArch32. By my current measurements, the speed of libm functions (in a different libc, I’m afraid) is nearly doubled by running them using my functions in place of the default compiler-rt ones. But of course my functions are also AArch32-specific!

The first draft of that PR stack I mentioned is now uploaded, including all those test cases I mentioned. (I don’t want to hijack the rest of the thread with Arm-specific business, but the test cases might be of use more generally. I might make an effort to run them on the libc arithmetic functions myself.)
179918 [compiler-rt][ARM] Enable strict mode in divsf3/mulsf3 tests
179919 [compiler-rt][ARM] cmake properties for complicated builtin sources
179920 [compiler-rt][ARM] Double-precision FP support functions
179921 [compiler-rt][ARM] Optimized double precision FP add/sub
179923 [compiler-rt][ARM] Optimized double-precision FP mul/div
179925 [compiler-rt][ARM] Optimized single-precision FP comparisons
179926 [compiler-rt][ARM] Optimized FP double ↔ single conversion
179927 [compiler-rt][ARM] Optimized FP → integer conversions
179928 [compiler-rt][ARM] Optimized integer → FP conversions
179929 [compiler-rt][ARM] Optimized single precision FP add/sub

1 Like

Thanks @statham-arm ! Our initial goal is to be able to replace the current generic implementations in compiler-rt eventually. And I totally agree with your initial assessment that the current target-specific optimized assembly implementations will still be there for awhile until the C++ implementations could get on par with them. And even in that case, I’m sure that such C++ implementations will also need to specialize somewhat to the underlying hardware capabilities.

Another thing that I’m worried about is that there are downstream users of compiler-rt with only C/ASM compilers in their environment. And if they cannot switch to a C++17-capable compiler (which I hope won’t be the case), then we might need to maintain and port any of the fixes to the current generic C implementations.

Also thanks for the upstreaming PRs of ARM-specific optimized routines. Feel free to tag me along for reviewing if you need extra eyes on those.

(I did run my FP arithmetic tests over the libc routines. They did pretty well on the difficult cases, but, well, #181121)

Thanks @statham-arm for trying them out and holes in our testings. The issue is now fixed with #181231, and the tests are updated #182131. Please let us know if you find other issues with our implementations or testings.

About other libm functions for targets without floating point units, I think code size and performance could be improved significantly with an integer-only implementations compared to a floating-point implementation using compiler-rt routines. I’m going to work on some double precision functions and let you try to see if that strategy works.

Please let us know if you find other issues with our implementations or testings.

I did find one other thing, re-running my tests just now to check your fix. But it’s an annoying edge case which not everyone cares about, concerning signs of zero: adding (−0 + −0) ought to give −0 by IEEE 754 rules, but in fact gives +0. I’ll raise a ticket for it if you like.

Thanks @statham-arm for the extra compiler-rt tests and for running them against libc!

I was able to reproduce the (-0 + -0) issue and reviewed the recent PR that fixed the other issue you raised. The fix for (-0 + -0) looks straightforward, just a small change in add_sub.h.

If @lntue thinks this is worth fixing, I’d be happy to take it as my first LLVM commit and send a patch shortly.

I’m following this thread because I’m interested in the related GSoC project. I’ve been looking into the compiler-rt and libc folders and am happy to help with any other available issues as well!

Indeed. Feel free to open a PR for it.

Hi were you able to work on this? If not I would love to take a look

Hi @lntue,

I was exploring libc/shared/math/ and noticed it provides shared, header-only implementations for many operations including some float128 variants (like faddf128.h for narrowing float128→float addition). However, I couldn’t find shared headers for same-type float128 arithmetic — i.e., float128 + float128 → float128, which is what compiler-rt builtins like __addtf3, __subtf3, __multf3, and __divtf3 require.

For integrating LLVM libc math routines into compiler-rt builtins:

Should we first add same-type float128 operations to libc/shared/math/ (e.g., a header exposing add(float128, float128) → float128)?

Or should the compiler-rt wrappers directly use src/__support/FPUtil/generic/add_sub.h internally?

I’ve previously contributed to LLVM libc math code (PR #181086 — bf16divf refactor to header-only) and have experience with floating-point operations in the X86 backend (PR #183932, PR #182660), so I’m interested in contributing to this project for GSoC 2026.

I don’t remember if we posted the discussion that we had offline, but for the builtins, it would be acceptable to have a C implementation of the math routines if:

1. we can demonstrate no size increase (across all the architectures)
2. we can demonstrate no performance loss (across all the architecures)
3. we do not grow a dependency on llvm-c for the builtins (as that causes circular dependencies)
4. do not gain any new dependencies for the C implementation support (e.g. C++)

I think that the third one might be the most challenging aspect - we might have to duplicate the implementation (or alternatively, sink the actual implementation into compiler-rt, and have llvm-c pull it from there).

Hi @lntue, @compnerd,

I’m interested in working on this idea for GSoC 2026 and have been going through the thread and related patches. I have also previously worked on one of the issues (# 175360) from the above mentioned refactor.

Regarding the constraints @compnerd mentioned above:

  • C dependency requirement (#4): My understanding is that the C++ based implementations would be gated behind COMPILER_RT_USE_LIBC_MATH, so existing C-only toolchains remain unaffected. Is the concern mainly about any optional C++ presence in compiler-rt, or specifically about scenarios where downstream users cannot opt out of it?

  • No size/performance regression (#1, #2): I expect validation here to be the main effort. Would it be sufficient to focus on a representative set like ARM (e.g., Cortex-M/AArch32) and RISC-V, or are there particular architectures/configurations that should be treated as mandatory baselines?

A couple of scoping questions to help shape the proposal:

  1. Initial builtin coverage:
    Should the exploration focus only on the core arithmetic builtins (__addtf3, __subtf3, __multf3, __divtf3), or is there interest in including conversions and comparisons as well?

  2. Integration approach with LLVM libc:
    For float128 same-type operations, would you prefer:

    • exposing them via libc/shared/math/ (to keep a clean boundary), or
    • allowing compiler-rt to include from src/__support/FPUtil/generic/ directly during the exploration phase?
  3. On avoiding dependencies (#3):
    Is the preferred direction to duplicate logic in compiler-rt, to sink the canonical implementation into compiler-rt with libc consuming it from there, or is the header-only inclusion approach sufficient to avoid the circular dependency concern entirely?

I’m currently drafting my proposal and would be happy to share a more concrete plan once I align on these points.

Thanks!

I don’t think that COMPILER_RT_USE_LIBC_MATH is helpful. We don’t want to have to worry about two different implementations that can drift apart or bugs that get reported and we find later that the configuration we are looking at is different. I would like to have a single definition. As such, it would need to be that there is no optional C++ dependency - C99 is okay though.

For size/performance, I think that ARM, X86 would be absolutely mandatory, and the other architectures are likely something we can discuss, and if the difference is small enough, that might be okay.

I don’t think that we should be broadening the scope of the operations that the builtins provides. It sticks to the ABI as defined by GCC (there are a couple of extensions such as __isOSVersionAtLeast which is used for availability on macOS and is re-used in Swift, but they are outside of the math routines) for the math routines.

I would prefer to have the header library outside of libc to ensure that accidental includes are not picked up from libc.

I would be okay with sinking the canonical definition into compiler-rt and have libc consume it from there.

1 Like

@compnerd (cc @0bVdnt , @lntue )
I’m not fully convinced that implementing the routines in C99 within compiler-rt and having libc comsume from there is the right direction.

To be clear, I don’t think the C99 reimplementation approach is without merit. It would still reduce overall project complexity by eliminating duplicate IEEE 754 implementations. However, I believe directly reusing libc’s verified code is the better direction.

First, this approach largely negates the core benefit of this RFC. The motivation was to leverage LLVM libc’s rigorously tested and verified implementations. If we rewrite them in C99 inside compiler-rt, we’re essentially creating a new independent implementation that must be seperately verified, rather than reusing what libc has already built.

Second, the libc team would need to refactor their code to consume from compiler-rt instead of their own well-established implementations. I’m not sure this direction would be welcomed by the libc maintainers.

Third, there is existing precedent for optional CMake flags in compier-rt that intoruce dependencies on LLVM libc or alternative libraries. (COMPILER_RT_BUILD_SCUDO_STANDALONE_WITH_LLVM_LIBC, COMPILER_RT_USE_ATOMIC_LIBRARY, etc) A COMPILER_RT_USE_LIBC_MATH flag would follow the same pattern.

With a CMake flag approach, user who want the rigorously verified libc math implementations instead of the C/assembly routines can opt in, while the default build remains unchanged. The libc implementation stays as the single source of truth, and there is no risk of two implementations drifting apart, since compiler-rt would directly include libc’s header-only code rather than maintaining a seperate C99 copy.

Of course, there may be users who cannot use a C++ compiler to build compiler-rt. However, this would likely be limited to cases where compiler-rt source files are extracted and built with an external compiler other than Clang (e.g., GCC or SDCC). In my understanding, such external compilers typically come with their own runtime libraries, making it uncommon to cross-compile compiler-rt builtins with them in practice. And since the LLVM libc math routines are header-only and compiled with -ffreestanding, no C++ runtime library is required at the target side. In the long term, I think it would be better to transition the floating-point implementations to libc and make C99/assembly implementations the optional fallback instead.

I disagree with this. Why should LLVM be bounded by the implementation details of a different compiler? LLVM has a number of practically unimplementable math intrinsics that need backing compiler-rt support (and ideally would never be dependent on a host system library).

2 Likes

I mean, that is the current state of the world. Clang cannot create any new compiler runtime support functions, because in common configurations (e.g. Clang on GNU Linux), we don’t actually provide a runtime support library.

But I agree, the current situation is unfortunate.

If GCC needs to introduce a new function in libgcc.a for an out-of-line implementation of some code sequence, it just does so – implementing the new library routine at the same time as modifying the compiler to emit a call to it. Everything gets linked against libgcc.a.

But if Clang needs to do the same…we just have to pray that GCC implements the feature first? That’s kinda terrible. I’m not saying this is a trivial problem to fix, but it would be really nice to figure out some way out of the mess. Especially for stateless helpers, where there’s no real requirement to share an implementation when code from both compilers goes into a single binary.

I would argue that the same could be done in the reverse - those can be vended by libc and users can opt into the implementation through llvm libc.

No, compiler-rt builtins are meant to be used in a libc free environment, and are used by other libc implementations themselves. These environments do not come with their own runtimes, compiler rt is the runtime.

If I understand correctly, the idea is to have LLVM libc provide runtime symbols for fp builtins, and have the Clang driver optionally link against them instead of compiler-rt’s? That doesn’t sound bad, but I have one concern: wouldn’t having libc provide compiler builtins symbols blur the boundary between the compiler runtime and libc? I’m not sure that exposing rumtime symbols from libc is right direction. (Though I suppose it wouldn’t be a major issue if it’s behind a flag)

I know that other libc implementations use compiler runtime as a bare-metal runtime. However I’m not sure that his is a reason to prohibit C++ in builtins. Regardless of which libc implementation uses compiler-rt, it is ultimately Clang that links compiler-rt, not the libc itself. When building musl with GCC, for example, GCC’s own runtime will be used instead.

If a user is in a Clang environment, freestanding C++ compilation is always available. The problematic case would be limited to minimal environments where compiler-rt is built with a non-Clang compiler (e.g., GCC without G++), but I believe this is quite uncommon. Is my understanding incorrect?

Sorry I just want to clarify a few things. The current floating-point math implementations that LLVM libc shares via our headers in libc/shared folder are almost as free-standing as possible. They do not requires anything from C or C++ standard libraries, beside a C++ compiler. So by including the headers and compile the objects however we like in compiler-rt, it shouldn’t bring or depend on anything from the C or C++ standard libraries.

At the moment, we do require the compilers to understands float and double types (computations on them are not required, only bit_cast, and uint32_t, uint64_t). But that requirements could easily be relaxed, as we demonstrated with our bfloat16 support, and that our basic math operation templates work well for both smaller and larger types (bfloat16, float16, float128).

With all the header-only freestanding refactoring work, testings, and CIs setup in LLVM libc for these shared functions, I don’t see much of the values in moving them into a separate folder and complicate the build systems of multiple projects.

And about the scope of the RFC, right now we only aim to replace the generic C implementations. The current most prominent embedded target, ARM, seems to be using its own ASM implementations, so it shouldn’t be affected. Our C++ floating-point math implementations can rival other well-established C math libraries for both performance and code size (for instance PR#184751), so I believe that we can make it work without regression. Ideally, in the long term, it would be best if we could also replace the ASM implementations, but I will leave that for a future RFC when we are ready.

The only constraints that might block this as @zlfn mentioned, is whether the users only have C compilers, older versions of C++ compilers that cannot compile our headers, or cannot bring more LLVM folders to their builds. And that’s where the cmake control flag can give them an easy way to opt-out to wait for migrations on their sides, or until we stop supporting such use cases with newer releases of LLVM.

1 Like