[Proposal] split built-ins from the rest of "compiler-rt"

As exposed in 2013 and continues to be a problem, there’s a side-effect of our definition of “compiler runtime”.

Currently, compiler-rt has 23 sub-projects, one of them being the builtins, which is where the problem lies.

The problem

  1. Compiler-RT implements functions that libgcc doesn’t (ex. __muloti4).
  2. LLVM lowers 128-bit arithmetic to calls to that builtin, because the target (AArch64) said it supports.
  3. The linker fails to find the symbol because the actual runtime (libgcc) doesn’t.

The heart of the problem is that LLVM thinks it’s supported because Compiler-RT implements it, but it’s not linked by default on GNU platforms. The compile-time decision cannot be proven safe as the information is obtained at link time.

This disconnect causes genuine responses like:

I’m confused about why you think I should need to explicitly link with compiler-rt : if the compiler doesn’t automatically link in its own runtime functions as it needs them surely that’s a compiler bug? I don’t need to explicitly link with libgcc if I’m compiling with gcc…

The work arounds

My explanations of “LLVM is a modular compiler” over the years don’t quite cut it, because this isn’t something the affected platforms can fix. There is no build flag that allows them to assume libgcc or compiler-rt as their runtime, so no way to enable/disable this behaviour based on the platform.

All the work-arounds are downright ugly:

  • Implement it in libgcc: Will take years to take effect, and may not be accepted in a compatible way.
  • Add -rtlib=compiler-rt -lgcc_s (or vice-versa) to every command line.
  • Copy RT’s implementation in a local file somewhere (like Android did for many years).

Problems reverting the code generation choice

As exposed on the Phab review, reverting that decision, while safer from a compiler’s “least surprise principle”, will regress on all platforms that currently use Compiler-RT as their runtime (FreeBSD, Android, Chromium, Apple and others I don’t know about).

Split builtins from Compiler-RT

So I’d like for us to consider my proposal from 2016 once again: let’s split the builtins from the rest of Compiler-RT and include it in the build by default. This would allow us to grow the builtins as much as we want, and would provide the certainty to the compiler for code generation decisions.

The easiest way, I think, would be to not make the builtins as the default rtlib, but to always link them through Clang (and probably llc, lli, etc). It doesn’t stop people using LLVM in their tooling from having the mismatch, but there’s another mismatch there anyway, so isn’t a big problem for developers.

If the default rtlib is indeed compiler-rt, then linking the builtins would provide zero benefit, but still wouldn’t break linkage, I think.

Another alternative would be to just split the builtins that aren’t supported by other builtins. That’s a bit ugly and fiddly, but would allow Compiler-RT to continue to be the runtime in the same way.

We just need to decide what to do with the conflicting symbols: Do we use the runtime first, and then our builtins or vice-versa?

To answer that question, we’d need to know the expected behaviours on multiple platforms out there and I don’t have that visibility. That’s the reason this isn’t an RFC, it’s just a proposal.

There’s also a lot of “I think” in this post, which means, I really don’t know. Hoping people more familiar with linkage tricks can help here.

@compnerd @echristo @nickdesaulniers @davidchisnall @hansw2000

2 Likes

Also not on Windows.

I don’t know the background and ramifications of chaning this though. Why aren’t we already using our own builtins by default?

1 Like

I guess since all the pieces are modular - and in most cases, you’d already be using the existing libgcc - which has worked mostly fine so far.

If we go this way, things change a bit for how one do various cross builds. Currently, I can e.g. download a GCC aarch64-linux cross sysroot, and just use a generic Clang release together with that. If the compiler_rt builtins are a mandatory part, I couldn’t use any random Clang release without first building a matching compiler_rt for my target. (The set of potential arch+OS that compiler_rt builtins would have to be built for is near infinite, and if the builtins contain C code, building them probably do need that cross SDK too, so one couldn’t necessarily have all the common ones prebuilt as part of a Clang release.)

1 Like

That question probably has many answers… I’m hoping to get a better idea in this thread.

It may just be that the answer is “we never got around doing that”, in which case, we should definitely do it. But since the first time I proposed this to various people (and companies) since even before 2016, I got various answers that weren’t simple, and I can’t remember most of them.

The main point, IIRC, was that they wanted to use the same implementation as libgcc to avoid hard-to-find bugs in either library, if some objects are compiled with one and others with the other.

Also, potential duplication of functionality, especially for embedded targets, if the signatures are slightly different but do the same thing (ex. divmod on Arm).

I think this is a much smaller problem now, given that both have been in sync and battle-tested for many years.

But I may be missing the other reasons I can’t remember…

What about propagating the rtlib choice into the LLVM codegen? (So far I guess -rtlib has been a linktime-only option, but it could have effect on codegen too.) If unset, or set to libgcc, it’d do the safe/pessimistic choice, but if set (either explicitly, or via CLANG_DEFAULT_RTLIB, or via Clang target-specific defaults) it would give the more optimal codegen.

That way there’s no regression for the full-LLVM toolchains, while it works just as neatly in a modular fashion as it has done so far, where it’s possible to replace only one bit of a toolchain with Clang.

The MSVC targets have had a similar longstanding issue; Clang supports __int128_t, while MSVC doesn’t. E.g. libcxx checks for __SIZEOF_INT128__, deduces that the compiler does support __int128_t, and goes on to use it. Most simple uses of the type result in just normal generated instructions, but divisions end up in a rtlib call. For all other rtlib calls in MSVC environments, they’re directed to equivalent functions available in the MSVC default libraries. (This is currently resolved by simply explicilty ignoring __int128_t in clang-cl builds of libcxx.)

There have been various discussions in that context too, that Clang for those targets should try to use its own compiler_rt builtins. (MSVC even ships the clang_rt.builtins libraries for x86 architectures - probably because they’re building and bundling ASan, and get those as a byproduct.)

For compiler_rt builtins in MSVC/clang-cl contexts, there’s another hurdle involved though; in such contexts, one doesn’t normally link executables/libraries with clang, but by invoking the linker (link.exe or lld-link) directly - so the object files themselves would have to carry a directive, telling the linker that e.g. clang_rt.builtins-x86_64 has to be loaded. @rnk had a good suggestion on how that could be implemented here: ⚙ D134912 [libc++] Disable int128_t and ship filesystem on Windows by default (The point being that we wouldn’t want to always try to include this library, only when the codegen actually has generated something that touches i128.)

2 Likes

+1, the builtins are supposed to be a compiler implementation detail, so I don’t understand why we’re concerned with the implementation details for other compilers. Using the llvm builtins is the natural default on all platforms

1 Like

And what’s annoying is there are always link error and it turns out somehow LLVM generates builtins only defined in compiler-rt.

There are two related problems:

  • On platforms that pick up compiler-rt to implement libgcc (FreeBSD, ChromeOS, others?) we are using them, just not necessarily the same version as the compiler. For extra fun, and perhaps more relevantly, other compilers are also using them. If I compile a .so with GCC and link it into a program compiled with clang, I want to have a single copy of these helper functions compiled with both. This matters a lot for the atomic helpers (I lose atomicity if they use different locks) but it also matters for some of the floating-point helpers where I might get different rounding errors depending on where a particular bit of computation occurs.
  • Conversely, on platforms that don’t use compiler-rt by default, they will often provide a runtime library that implements a set of functions that intersects the set provided by compiler-rt (at least including the ones that the platform ABI spec requires, if such a thing exists, and often with some others that the default compiler needs).

Clang has a lot of custom code for each target platforms, including custom code for Ubuntu and so on for finding all of the weird and wonderful places it decided to install system headers this week. I would suggest that the right solution for people packaging clang for a target that does not include all of the required builtins is to also package a shim library that includes all of the missing builtins that compiler-rt provides and their platform equivalent does not, and add that to the default linker invocation that clang produces for their target.

This would require a bit of work on the compiler-rt build system to make it easy to build a libcompiler-rt-shim.{a,so} containing a specified set of builtins. Ideally, we’d generate the list automatically by doing nm or equivalent on the platform’s default rtlib (as specified by whoever is building / packaging the clang toolchain) and generate a build that is all of the compiler-rt builtins except those.

The LLVM Libc already has mechanism to specify the provided functions per platform:

Nonetheless, hoisting the builtins into its own top project is a good cleanup.

I personally think that would be the best option and there is somewhat prior art for doing this with the TargetLibraryInfo analysis. That way there’s nothing in the way of making more optimized routines within compiler-rt that can be called from LLVM, while not regressing any other platforms.

1 Like

I’m more in favour of propagating the rtlib choice to code-generation than always linking with the builtins. Especially for people that want to use libgcc’s implemenations of the spec in (Libgcc (GNU Compiler Collection (GCC) Internals)) when available. The driver would need to take care to put libgcc ahead of compiler-rt on the command line so that libgcc’s implementation took precedent.

Although the following isn’t a strict analogy. The Arm 32-bit ABI for better or worse, standardised the runtime functions and any private helper function not in the standard was recommended to be output in a COMDAT group for portability across compilers (abi-aa/rtabi32.rst at main · ARM-software/abi-aa · GitHub). COMDAT groups would only work for ELF and my understanding is that only armcc actually followed this, everyone else just used a helper library.

1 Like

Linking order is one of the ideas…

So far, they seem to be:

  1. Link compiler-rt after -rtlib option
  2. make -rtlib=compiler-rt the default for all platforms
  3. separate the uncommon-rt-builtins in a small rt-lib and always link that, regardless of which rtlib you use
  4. pass link options through the compiler/objects.

(1) would work, but be confusing with symbol precedence, not all platforms are the same
(2) would fail if user passes -rtlib=libgcc to the linker but not the compiler. User error, sure, but not quite obvious.
(3) would always work, but has additional build infra that we need to maintain
(4) would need to be added to every single step in the compiler (AST, IR, MIR, obj) to work, and need to change build environments, etc

Personally, I think we should at least try (2), it’s long past due. If that fails, I’d go with (3).

Not sure I understand this.

  • The atomic helpers with locks are in libatomic.so. compiler-rt builtins has a CMake option to provide atomic helpers, but it’s off by default, with a very explicit warning next to the option that it will explode in environments with shared libraries.
  • The only floating-point routines compiler-rt provides are basic arithmetic; IEEE 754 requires those operations to be correctly rounded, so all implementations should return exactly the same thing.

Every target that supports C++ has a usable equivalent.


Using a command-line option to indicate the compiler runtime is fine as far as it goes, but then we’re still stuck with the issue that the set of APIs has been essentially frozen for the past 30 (?) years. This means if we want a new builtin for any reason, we can’t have it; we’re stuck emitting inline copies of the functionality everywhere in the user’s program.

I think embedding the implementations into object files is the best approach for any interface we want outside the historic libgcc ABI; it’s much less likely to cause weird issues for users compared to anything that requires prebuilt libraries.

Won’t splitting the builtins out help simplify the build system too? I seem to recall some issue in the past where having builtins as part of compiler-rt made the builds more complicated (maybe it was for cross-compilers?).

That’s what I was thinking. Record the --rtlib value (whether it was set at the command line explicitly, or implied by the target triple) as metadata in the IR. Then LLVM can make better codegen decisions based on the value of this node.

And if that results in a failure to link because folks were setting different values for --rtlib for different compilation units or between compilation vs link, well then “don’t do stupid things.”

There are multiple approaches with different trade-offs. +1 for embedding the implementation in the relocatable object file, similar to __x86.get_pc_thunk.bx and __x86.get_pc_thunk.cx emitted by GCC and __llvm_retpoline_r11/__clang_call_terminate/etc emitted by Clang. The argument against this is sometimes object file size (which is generally not a concern for such rarely used routines) and convenience in Clang/LLVM (if embedded implementation is inconvenient, we should strive to fix it).

This (a) avoids making --rtlib= codegen-affecting (therefore avoids the complexity introducing module flags metadata for LTO), (b) simplifies cross compilation (no prebuilt compiler-rt/lib/builtins when libgcc is available), and (c) works with LTO with bitcode compiler-rt (bitcode IR symbol table has some symbol resolution issue in this area) the best.

Splitting compiler-rt/lib/builtins or a subset will lead to a multiple definition scenario which can be addressed by using archive semantics but is still not great. There will likely be needs to customize platform/compiler-rt/libgcc as we currently do for --rtlib=, leading to more complexity in the user interface.

I believe so and it’s something we’ve already discussed in the past. When bootstrapping the toolchain, builtins need to be built before any other runtimes including the rest of compiler-rt which creates an ordering issue. We support building builtins separately from the rest of compiler-rt to workaround this issue but it’s quite convoluted. This also applies to crtbegin.o and crtend.o which are also a part of compiler-rt.

@smithp35 pointed me at this thread. It so happens that today I was trying to build the compiler-rt builtins with a set of clang options that are incompatible with C++ (-fropi in particular). It looks to me as if that should be possible in principle, because there’s no C++ in the compiler-rt/builtins subdir, only in other unrelated compiler-rt subprojects. But the top-level compiler-rt/CMakeLists.txt checked for a C++ compiler anyway, didn’t find out, and failed hard.

So I’d be in favour of having builtins separate enough that you could build them without working C++, if for no other reason!

Probably in that case -DCMAKE_CXX_COMPILER_WORKS=ON is your friend (parts of the runtimes infrastructure already use that).

Compiling with -DCMAKE_C_COMPILER_WORKS=ON -DCMAKE_CXX_COMPILER_WORKS=ON -DCMAKE_ASM_COMPILER_WORKS=ON should be able to build compiler-rt builtins in a freestanding environment.

1 Like

I saw a very similar issue when trying to support integer divisions on _BitInt<N> with N > 128. I could easily add a runtime function for that into compiler-rt, but clang links against libgcc on most Linux distros by default; and the backend didn’t know which one would be linked.

We also toyed with the idea of having an new bitint runtime library that clang would link on top of libgcc by default, but the integer division seemed to too small to justify a new approach in that space.

In the end, we generated the appropriate IR directly.