[RFC] Multilib

Hi Petr,

I agree there is significant overlap with MultilibSet and yes I could replace that instead of having the two implementations exist side by side. This would be a breaking change to MultilibSet rather than extending it and indeed existing usages would need updating.

For LLVM Embedded Toolchain for Arm we’ll likely have a similar requirement for -fno-exceptions library variants. How I anticipated we would do this:

- path: yes/exceptions
  args: [...]
  attrs: [...]
- path: no/exceptions
  args: [-fno-exceptions, ...]
  attrs: [no-exceptions, ...]
- regex: -fno-exceptions
  matchAttrs: [no-exceptions]

If the user doesn’t specify -fno-exceptions then only yes/exceptions will match.
If the user does specify -fno-exceptions then both yes/exceptions and no/exceptions match. The rule is that last wins so no/exceptions is picked.

(The “last wins” rule works fine as long as long as you can order your optimisations along one dimension. I anticipate this won’t always be the case so in future the system could be extended with some way to score optional attributes.)

The way I’ve been using the new implementation is to change the sysroot. However I think you’re suggesting that all library variants will share a single include directory and have multiple lib directories. I vaguely remember being told that this was discussed and some people have the requirement that each variant has its own include directory since header files may be different between variants. However I think the multilib selection code could be used in different ways for different ToolChain classes. Instead of the multilib code selecting a particular variant, it could provide a list of all matching variants. One ToolChain class (e.g. BareMetal) could use the last matching variant as a --sysroot argument while another ToolChain (e.g. Fuchsia) could use all the variants as -L arguments. How does that sound?

One ToolChain class (e.g. BareMetal) could use the last matching variant as a --sysroot argument while another ToolChain (e.g. Fuchsia) could use all the variants as -L arguments. How does that sound?

To summarise our discussion earlier in the LLVM Embedded Toolchains Working Group sync up, I will not do this. Instead I will update the prototype to not use --sysroot but instead use -isystem and -L for all matching variants. This achieves the layering effect you want and it should work regardless of whether each variant has its own include directory or if they share a common one, or something in between.

I’ve created a new stack of changes taking into account the feedback I received on ⚙ D140959 RFC: Multilib prototype. Unlike that change which was strictly a prototype, the new changes should be suitable for detailed review and hopefully approval.

The patches I’m most interested to get feedback on are Multilib YAML parsing and Add -print-multi-selection-flags argument because they define a new stable APIs for Clang, so mistakes will be costly to fix later.

I will update the prototype to not use --sysroot but instead use -isystem and -L for all matching variants.

I haven’t done this but I am working on adding further patches to the stack to achieve the layering effect in the BareMetal ToolChain.

1 Like

I am working on adding further patches to the stack to achieve the layering effect in the BareMetal ToolChain

… and the patches are ready for review:

I put a design doc up for review at ⚙ D143587 [Docs] Multilib design. Hopefully that makes this whole thing more accessible to people who aren’t already familiar with the design space.

I’ll be on holiday next week but I’m hoping to start getting the patches landed the week after that unless a major issue is identified. The changes have had a good number of eyes on them so it feels like the time is approaching to understand how well it works in practice, and use those insights to update the design.

Thanks to @petrhosek, & @MaskRay for your reviews. One patch landed, 7 to go! I know that reviewing this code is quite time consuming so I don’t want to take it for granted. I think I’ve addressed all the comments on the reviews so how do you want to proceed? Do you want me to wait for you to accept the changes or are you satisfied with the changes at a high level and happy to leave it to someone else to give a final OK for the finer details?
The LLVM Embedded Toolchains Working Group sync up is this Thursday so we can discuss then if not before.

First of all, thank you for all the effort you put into this. I think the changes to the driver implementation are ready to go in, especially since those changes should be largely invisible to users. I’d like to discuss the new multilib.yaml configuration format since that part is visible to users.

In comparison to GCC, the proposed configuration format offers more flexibility. In the case of GCC, the supported multilibs are configured only at build time. Having runtime configurability for multilibs is a very nice idea, but we should be mindful of the unnecessary complexity.

In the proposed design, you map flags to labels using regex-based matching (the FlagMap section) and then define variants as a combination of labels (the Variants section). That’s fundamentally consistent with the GCC implementation which maps flag combinations to a chosen set of variants, but uses option parsing instead.

I think the naming could be clearer. In particular the term “flags” is used both in the Variants and FlagMap sections, but these are not actually flags, rather they’re arbitrary strings that are used for matching input flags to variants. Perhaps “label” or “tag” would be a better name?

I am concerned about regular expressions as the default mechanism for matching flags since regular expressions, while powerful, also tend to be error prone.

To use a concrete example from the review, to match the target Armv8.5 or newer, you’d have to do the following in GCC:

MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.6-none-eabi
MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.7-none-eabi
MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.8-none-eabi

In the proposed design, you could instead do:

- Regex: target=armv8\.([6-9]|[1-9][0-9]+)-none-eabi
  MatchFlags: [target=armv8.5-none-eabi]

This is an example where the higher expressive power of regular expressions helps with complex rules.

The downside is that regular expressions require escaping which makes simple cases more difficult. For example, if I wanted to match the -fc++-abi=itanium flag, I have to use:

- Regex: fc\+\+-abi=itanium
  MatchFlags: [itanium-abi]

This may be non-obvious and I’m worried that this will lead to unexpected and difficult to debug issues. We’ve already seen similar issues with Sanitizer special case list files which also use regular expressions (they also treat * as .* straddling the line between glob patterns and regular expressions).

Perhaps starting with a simpler, even if more verbose, version resembling the GCC implementation would be better? If it turns out to be a problem in practice, we could introduce the support for regular expressions (or glob patterns) later.

The other aspect I’m concerned about is defining variants, specifically around composition. When defining variants, you have to specify a directory name, a set of “flags” and the output of -print-multi-lib which is very flexible but also potentially error prone and might result in a lot of duplication.

In the driver, this is handled by the MultilibBuilder and MultilibSetBuilder classes which aid in constructing hierarchies of variants that we use in drivers like Gnu. To give a concrete example:

auto ArchV7A = MultilibBuilder("/armv7-a").flag("+march=armv7-a");
auto ArchV7M = MultilibBuilder("/armv7-m").flag("+march=armv7-m");
auto Hard = MultilibBuilder("/hard").flag("+mfloat-abi=hard");
auto SoftFp = MultilibBuilder("/soft").flag("+mfloat-abi=softfp");
auto Soft = MultilibBuilder("").flag("+mfloat-abi=soft");

MultilibSet Multilibs =
    .Either(ArchV7A, ArchV7M)
    .Either(Hard, SoftFp, Soft)

This code would generate the following variants:


This is quite powerful and becomes increasingly more important as the number of potential combinations grows. I’m wondering if we could make the configuration file more closely resemble the usage of those classes and allow automatic composition rather than requiring users to specify all potential combinations manually?

To make my suggestions more concrete, here’s an idea for an alternative format:

    - mfloat-abi=soft
    - march=armv7-a
    - march=armv7-m
    - mfloat-abi=hard
    - mfloat-abi=softfp

  Either: ['/armv7-a', '/armv7-m']
  Either: ['', '/soft', '/hard']

To handle different spelling of flags, we could do something like:

    - target=armv8.5-none-eabi: # used for -print-multi-lib
    - target=armv8.[6-7]-none-eabi # other matches
    - target=thumbv7m-none-eabi:
    - target=thumbv[7-9]*
    - mfpu=fpv4-sp-d16
    - target=thumbv6m-none-eabi:
    - target=thumbv6m-*

I’m fine landing the proposed format as is (and marking it as experimental) to get more practical experience, but I also want to make sure we explore other alternatives since this is a pretty significant new feature that will likely get adopted by a lot of users, especially in the embedded space. I would also be interested in hearing from others.

Thanks very much for your careful thoughts.

  1. Yes I’m happy to change the naming. “Flags” comes from the existing multilib system but if it no longer fits then we should change it.
    I will create a new patch to switch to “Tags”.
    (However for the remainder of this message I’ll stick with the “Flags” terminology just to make it easier to refer to the existing proposal).

  2. I’ve realised that the example in ⚙ D143587 [Docs] Multilib design misses out that you can do Flags: fc++abi=itanium in the Variants list, you don’t need to use FlagMap. I apologise for not making this clearer and I’ve updated the patch to make it more apparent.
    FlagMap is specifically intended for more complex cases that you want either negation or to map many possible Clang-generated flags onto another flag. Given that specific intention, I think the additional power of a regular expression is appropriate. Nevertheless I’m open to switching to the more verbose option, particularly if we start to see problems like those in sanitizer files.

  3. I think we can approach composition from a number of directions. For LLVM Embedded Toolchain for Arm my intention has been to generate both the multilib.yaml and the libraries themselves from within its build system. But I can also see the appeal of expressing how the combinations are generated within the YAML itself. Since YAML is so flexible I think we can explore many ideas side by side just by using different keys for different options, and narrow down the options before removing the “experimental” status.

  4. What do you say to landing the patches on Friday, unless we hear differently from others? I’ve updated the docs patch in various places to make the “experimental” more obvious.

I will create a new patch to switch to “Tags”.

Done: ⚙ D145567 [Driver] Rename multilib flags to tags

I’ve gone through the patches and set approved from the Arm side with the consideration that we will iterate on these in tree, including the experimental interface for the DSL.

I’ve asked that we give some time for any further comments or objections from the US time zone. If you do plan to make any more comments, or just need some more time to review please let us know? Either here or on one or more of the patches.

Thanks for the suggestion about an alternative syntax. It looks like that could work for a lot of cases. Where I’m less sure is where there needs to be a multilib variant based on some CPU feature that is inferred by one or more +feature1+feature2 . In the current DSL we process the flags into tags to make that simpler. This still could be done for your suggestion, although it may need some additional complexity to make it work.

Does this mean that you can do the following?

  - Dir: itanium
    Tags: [fc++abi=itanium]
    PrintOptions: [-fc++abi=itanium]

If that’s the case then I think the original name “flags” is a better fit than “tags”. I apologize about going back on forth on this. It might be also worth including this example in the documentation.

I’d like to take another look but this week is going to be especially busy for me so I might need a bit more time if that’s possible.

Thanks Petr for adding your reviews quickly. I’ve responded to all of them except the ones on D142933 which I’ll get to tomorrow.

Yes you can do this:

  - Dir: itanium
    Flags: [fc++abi=itanium]
    PrintOptions: [-fc++abi=itanium]

I’ve renamed back to Flags which I agree works better.

Thanks, I’ll try to do another pass over your changes tomorrow.

Do you think it’d be possible to derive PrintOptions automatically (at least for the simple cases such as this one) to avoid the duplication?

I’ve also noticed that a lot of the complexity in the new implementation is due to -print-multi-lib support (such as needing the PrintOptions field) which makes me wonder whether there’s a reason why this is needed? Is the goal to support existing tools that consume the output of -print-multi-lib? If not, could we provide a different option with a simpler output that doesn’t require all this complexity?

It’s tempting but I think the cost will be an extra wrinkle in the API. In practice I expect most multilib.yaml files to be generated rather than hand-written, in which case the duplication is less of an issue and keeping the API simpler will be preferable. This prediction may prove incorrect but the best test of that is getting people more people actively using the system. Therefore this question is one I’d like to explore after landing the current stack of patches.

Yes the picolibc build system does interrogate -print-multi-lib output. Also I feel we need to continue to provide -print-multi-lib functionality because it’s an API that clang shares with gcc.
I share your desire to keep things simple and fast. My latest update to ⚙ D142905 [Driver] Change multilib selection algorithm improved speed at the expense of increasing complexity slightly so I’ll take another look to see if we can have speed without compromising on simplicity.

I think we have to pick our poison:

  1. Have the callback function to enable lazily calculating the print options.
  2. Add plumbing to many functions so we can pass a NeedPrintOptions parameter to MultilibBuilder.makeMultilib() to tell it whether or not to calculate the print options at that time.
  3. Eat the 2 microsecond cost of calculating the print options every time you invoke clang.

Right now we have option 1 and that particular poison doesn’t taste too bad to me.

I’ve now responded to your comments on D142933. Over to you!

Sorry I’m late to the thread but AFAIU, Multilib::flag() can express this specific case already:

auto SoftFP = MultilibBuilder(“/soft”,“”,“”,0)
auto HardFP = MultilibBuilder(“/hard”,“”,“”,1)
MultilibSet Multilibs = MultilibSet().Either(SoftFp, HardFp);

If you build with -mfloat-abi=hard, you should get the /hard multilib.
If you build with -mfloat-abi=softfp, both are valid and the /hard multilib is selected because it has higher priority.
If you build with -mfloat-abi=soft, you should get the /soft multilib.

Hi Jon, I think when I wrote that I was overly attached to the idea that you should use addMultilibFlag. If you always use addMultilibFlag then that case cannot work, but get a little more creative and I agree you can express it:

auto Soft = MultilibBuilder("/soft","","",0)
auto SoftFP = MultilibBuilder("/softfp","","",1)
auto Hard = MultilibBuilder("/hard","","",2)
MultilibSet Multilibs = MultilibSet().Either(Soft, SoftFP, Hard);
if (ABI == arm::FloatABI::Soft) {
if (ABI == arm::FloatABI::SoftFP) {
if (ABI == arm::FloatABI::Hard) {

I am aiming to change the selection criterion to “a multilib is compatible its flags are a subset of the flags derived from the command line arguments” and then the code gets smaller:

auto Soft = MultilibBuilder("/soft","","")
auto SoftFP = MultilibBuilder("/softfp","","")
auto HardFP = MultilibBuilder("/hard","","")
MultilibSet Multilibs = MultilibSet().Either(Soft, SoftFp, HardFp);
if (ABI == arm::FloatABI::Soft)
if (ABI == arm::FloatABI::SoftFP) {
if (ABI == arm::FloatABI::Hard)

We’ve gone live with a version 0.1 of the multilib format in LLVM Embedded Toolchain for Arm 16.0.0. Feedback is very welcome either here or in the GitHub issues.

Thanks, seeing a real-world multilib.yaml example is actually very helpful (including the logic that was used to generate it). The part I’m still unsure about is why do we need PrintOptions separate from Flags. This aspect is adding a lot of implementation and configuration complexity, and it’s not clear to me why it’s needed. Can you explain the motivation? Why can’t we use Flags for the -print-multi-lib output?