[RFC] Add command line option for selecting C library

Summary

This is a proposal to introduce a new clang command line option --libc to specify the standard C library to use.

Background

There are multiple standard C libraries available to use, especially in the embedded space. Today there is a way to chose which C++ library to use (LLVM libc++ or GCC libstdc++) as well as Glibc or MuslC for a Linux target, however there is no standard/dedicated way to select a C library in the bare-metal driver.

Related discussion in LLVM libc monthly: Monthly LLVM libc meeting - #36 by michaelrj-google

Related existing options:

-nolibc

-stdlib=<arg>, --stdlib=<arg>, --stdlib <arg>

   C++ standard library to use. <arg> must be ‘libc++’, ‘libstdc++’ or ‘platform’.

Goal

Provide a standard way to select C library to use for embedded use cases.

Out of Scope

Hosted use cases like glibc vs muslc on Linux (<arch>-linux-gnu and <arch>-linux-musl target triples).

Proposal

Add a new command line option --libc in the scope of the BareMetal driver:

-libc=<arg>, --libc=<arg>, --libc <arg>

    C standard library to use. <arg> can be any string, ‘system’ is the default.

Provide CLANG_DEFAULT_C_LIB, similar to CLANG_DEFAULT_CXX_STDLIB, to override the default C library.

Initially, make the string provided to --libc available to the multilib selection logic only and let multilib handle it - no hardcoded logic in the driver. Later, either generic or library specific logic can be added in the library handling or even library specific optimizations in the compiler - each of these require careful analysis and design on its own.

Note: Strings used for --libc should be aligned with RUNTIMES_USE_LIBC option used for libc++, today only system and llvm-libc are supported in HandleLibC.cmake, however there is an open pull request to add picolibc and newlib.

Benefits

  1. Standard option to specify C library to use, aligned with that for standard C++ library to use and building without C library.
  2. Keeping list of library names open ended provides minimum implementation and maximum flexibility in regards to forward compatibility as it does not prescribe any driver behavior - it is kept in the multilib logic.

Implementation

  1. Add build time CMake option CLANG_DEFAULT_C_LIB so that toolchains can choose.
  2. Add --libc option in Options.td
  3. Pass the value provided for --libc option to multilib backend in ToolChain.cpp
  4. Make use of the value provided for --libc option in building the path to multilib.yaml file in BareMetal.cpp driver. The system option is not added to the path, i.e does not change the current behavior.
    NOTE: Drivers that do not use multilib.yaml configuration (use custom logic in the driver) will not be affected by this change.

The rest of the library logic should be handled by the multilib, e.g. Arm Toolchain can use picolibc, llvmlibc, newlib and newlib-nano all just by pointing --sysroot to the correct multilib.yaml file via a config file. Config file adds the benefit of using a macro to get the root folder of the toolchain to create a relative path to the yaml file - otherwise the full absolute path has to be provided to --sysroot, which is not practical.

Arm Toolchain example: path to multilib.yaml is set in getMultilibConfigPath() by appending yaml file name to ClangRuntimesSysRoot here. The default sysroot points to lib/clang-runtimes/ under the toolchain root, thus the default yaml file path is /lib/clang-runtimes/multilib.yaml

If the LLVM libc overlay package is used, then –config=llvmlibc.cfg overrides sysroot to lib/clang-runtimes/llvmlibc/ under the toolchain root, thus the resulting path is lib/clang-runtimes/llvmlibc/multilib.yaml

With the proposed change, the name of the library, llvmlibc in the example, should be appended to the sysroot in getMultilibConfigPath() (unless the specified library name is system to keep current bevahior) to get the same resulting lib/clang-runtimes/llvmlibc/multilib.yaml

NOTE: This change does not impact -stdlib handling: -stdlib=libstdc++ is not supported in the BareMetal driver for Arm targets.

Alternatives

  1. ATfE now uses config files to set the --sysroot, e.g. llvmlibc.cfg. The drawback is that it is too generic and requires adding --config=llvmlib.cfg on the command line which can mean anything.
  2. Provide a predefined list of libraries to support, analyze and generalize the driver logic required to support them all, implement required logic in the driver itself. A lot of upfront work and alignment, likely difficult to extend later when a new library appears that has incompatible assumptions about use.
  3. Use target triples like arm-none-picolibceabi and arm-none-llvmlibceabi similar to aarch64-linux-musl, aarch64-linux-android and aarch64-linux-gnu - this has the drawback of predefined list of supported libraries and is not really related to the ABI.

Thank you for the RFC!

Because we currently use -stdlib= for selecting the C++ standard library. I think it would be reasonable to use something like -cstdlib= or -stdclib=, etc for parity (we may even want to consider renaming -stdlib so it’s more clear that it’s a C+±specific option, but that’s a potentially disruptive change).

I’m personally not keen on making this only work for non-hosted targets. I think it would be more intuitive for users to have partiy between -stdlib in C++ and picking the C runtime library. (Picking the library may set the triple differently, but making users understand all the complexity of triples is not the most user-friendly IMO.) WDYT?

CC @MaskRay @jansvoboda11 for driver opinions
CC @michaelrj-google for llvm-libc opinions

I don’t have strong feeling for the option name. There are a few choices: --libc= (we should avoid -libc=, which conflicts with the library path -l), --cstdlib=, --stdclib=, --clib=. We should ideally require = and not support the separator.

The default for the other options is called “platform”, can we follow that for this option rather than using “system”?

Providing the cmake option for a default ends up making some tests very fragile (as seen when updating in-tree tests for merging the riscv and bare metal toolchains, especially in the fuchsia builds) - can we avoid this option and instead ask toolchains to use a clang config file if they want to override the default?

2 Likes

Thank you for the review!

clang supports multiple different hosted environments with their own tradition and rules around supported C libraries. For example, on Linux there is a choice if glibc, musl C, ucLibc, etc while Windows has completely different options. The fear is that trying to design and implement one overarching solution that addresses all of these could be too big and complex.

That said, as long as the option syntax is agreed and implemented for bare-metal, there will be the opportunity to make use of it for hosted environments where is makes sense and there is interest or benefit of doing so.

There may be a scope for a simple generic implementation like using this new option to define the ending part of the tripe (-gnu or -musl) instead of using the triple. Not sure if this creates any benefit, but it has a potential to create confusion if both the triple and C library option are specified and disagree.

In summary, having agreed syntax for the option allows adopting it in respective drivers where there is a use case, including hosted ones.

Thank you and @AaronBallman for the naming suggestions!

I think, --cstdlib= maybe most descriptive and different from --stdlib= to avoid confusion.

1 Like

Thank you for the review and a good point!

system comes from llvm-project/runtimes/cmake/Modules/HandleLibC.cmake at 8226fbee4b6254bc3cd4806de623937262028883 · llvm/llvm-project · GitHub as one of options for RUNTIMES_USE_LIBC - there may be an opportunity to unify or there might have been a good reason why system was chosen.

I hope @petrhosek may be able to clarify.

Could you please provide a bit more details of what went wrong?

There were two motivating points here:

  1. Consistency with libc++
  2. Easier migration paths for downstream toolchains that have some existing, likely downstream, libraries provided by default today so without the default option in the future users will need to explicitly specify the library in use, i.e. do some migration work for their projects. We can argue that it is a one time effort, so as soon as the library is explicit it will work in the future though.

Could you please provide a bit more details of what went wrong?

There are driver tests which use --unwindlib= and --rtlib=. These are affected by CLANG_DEFAULT_UNWINDLIB and CLANG_DEFAULT_RTLIB. There are also specific combinations of flags which give errors (some runtime libs are known to be incompatible with specific unwind libraries - see GetUnwindLibType - e.g. libunwind and libgcc).

In particular, the Fuchsia builds override the usual values of these two cmake variables, so in these builds a driver invocation that leaves out one of these options may error or the flags may be different to what they would be in a standard build. The lit tests have to be flexible for the fact the linker arguments are different depending on which library is chosen.

I had to do one fix-forward when we landed some of the riscv+baremetal toolchain object merge because of this: [Driver] Fix Arm/AArch64 Link Argument tests by lenary · Pull Request #144582 · llvm/llvm-project · GitHub (this change means that the libc flags don’t need to be adjacent to the compiler-rt flags, because iirc the unwind library flags might come between them if you have overriden the cmake variables). This fix, however, wasn’t enough, and the toolchain merge patches had to be reverted and re-landed because some of the cases had incompatible choices of libraries on the fuchsia builder. There are more details in the discussion starting here: [Driver] Add support for GCC installation detection in Baremetal toolchain by quic-garvgupt · Pull Request #121829 · llvm/llvm-project · GitHub

I am going to push back slightly on one point made in that discussion by Petr:

the tests needs to be written in a way that correctly handles different settings of CLANG_DEFAULT_RTLIB and CLANG_DEFAULT_UNWINDLIB which we’ve already done in different tests, typically by explicitly specifying --rtlib=platform and --unwindlib=platform .

I agree this needs to be done today with these two settings, BUT it’s not good that cmake-level configurations have this much influence on in-tree lit tests, and that debugging this kind of failure effectively requires either reconfiguring your whole build directory and rebuilding, or a whole second build directory for clang with these options set.

1 Like

Thank you for the detailed explanation!

I’m generally in favor of having an option for selecting which libc is being used. My question is do you plan to use this for just telling the compiler which library it should use by default or do you also plan to make ABI assumptions based on the library flag?

If this is just for picking which library to use, this seems fine as is. It seems like it matches what libc++ does already.

If this will also make ABI assumptions that gets more complicated. As an example, if LLVM-libc provided a version of printf with no float support and (I’ll call it printf_nofloat for simplicity) and clang replaced a user’s call to printf with printf_nofloat if it doesn’t use float, that could be a useful optimization. The problem is that you need to worry about library versions – say LLVM-libc version 24 doesn’t have printf_nofloat but version 25 does, just knowing that you’re using LLVM-libc isn’t enough.

P.S. Sorry it took me a bit to respond.

Thank you for review and comments, Michael!

This current proposal on purpose limits the scope to only providing a way to define which library to use, where the library is more or less an abstract token/id.

In the future, this can be used to extend code gen with library specific optimizations, but you are right, each such case might be rather involved and require own RFC to define the details.

My point is that I think the initial offering needs to support hosted as well as bare metal.

I think if users see an option that lets them pick the C standard library to be used, they’re going to presume it works “the same” as setting the C++ standard library. Getting a diagnostic about it only working for bare metal targets would not match user expectations but it also trains users to ignore the option as not being relevant to them. Also, focusing on bare metal runs the risk of designing something that only works for bare metal and we need a different solution for hosted. Given that there are (presumably) far more hosted users than bare metal users, I think we should be designing for the majority use case given the nature of the flag and that fact that -stdlib already sets a lot of user expectations.

2 Likes

My point is that I think the initial offering needs to support hosted as well as bare metal.

We discussed this in recent Embedded working group call and agreed that in principle it is possible to have different libraries implement the same ABI, e.g. that in the future LLVM libc may be a drop in replacement for GLIBC and implement the same ABI as defined by *-linux-gnu triples.

However to implement support for hosted now, we need specific use cases to confirm that design fits and actual implementations to test.

Can you think of example use cases where this option can be useful in hosted environments today?

Speaking personally, one place I’d love to use this would be Compiler Explorer. When trying to determine how code will behave across various compilers I sometimes need to vary which C++ standard library is being used by Clang, I have the same need for glibc vs musl vs llvm-libc. As a concrete example, “I wonder if this printf format specifier extension is broadly supported or just specific to this one standard library” or “what kind of errno behavior will I get when calling this standard library function?” kind of things.

I was imagining the implementation of this to effectively be: use the -cstdlib to set the triple, if there’s an explicit triple which disagrees, diagnose the disagreement and go with last-flag-wins behavior. But maybe I’m misunderstanding the complexities involved?

2 Likes

Initially, make the string provided to --libc available to the multilib selection logic only and let multilib handle it - no hardcoded logic in the driver.

Hi @voltur01, have you pushed any patches regarding this?
Also, why not expose the –libc flag to toolchain drivers?
if we just let the toolchain drivers change -L path based on –libc value (hardcoded logic), it should be simple enough, right?
or am I missing something?

Hi @quic-k, thank you for reaching out! No, there are no patches yet - we are still busy with other aspects of LLVM libc library support in our toolchain.

This new command line option will need to impact --sysroot (rather then -L) so that not only binary library files are considered by the driver, but also corresponding header files and possibly the multilib yaml configuration file.

1 Like

oh yeah, correc
why do you want to expose this to multilib backend only?
Hexagon toolchain may need this flag and its neither baremetal nor it uses multilib backend.
why not expose the flag to all toolchains and let them handle however they want - hardcoded logic, hardcoded in multilib backend, multilib yml configuration

I the original proposal it was suggested to limit the scope exactly because there could be so many different implementations that we do not know how to address all of them at the same time.

In the later discussion, it was suggested [RFC] Add command line option for selecting C library - #15 by AaronBallman that a generic implementation that override the library from the target triple could be a good initial implementation.

Would this also work for hexagon?

sounds good, thanks!
the scope shouldn’t matter tho, right? if a toolchain maintainer wants to use this flag, they can make the appropriate changes in clang driver later
the flag will be available for all toolchains, but if not used (in CLI flag and clang driver), it won’t affect the toolchain