[RFC] Multilib custom flags

Hi all,

This is a proposal to extend the current multilib system to support the selection of library variants which do not correspond to existing command-line options.

Thanks in advance for the review and your suggestions.

The current multilib mechanism supports libraries that target code generation or language options such as --target, -mcpu, -mfpu, -mbranch-protection. However, some library variants are particular to features that do not correspond to any command-line options. Examples include variants for multithreading and semihosting. We thus propose that the multilib system should be extended to support these use cases.

This proposal introduces a way to instruct the multilib system to consider these features in library selection. The proposed solution comprises two changes:

  1. A new section in multilib.yaml to declare flags for which no option exists. Henceforth this sort of flag will be called custom flag for clarity.
  2. A new command-line option in the Clang driver to use these custom flags.

Multilib flags declarations

The multilib.yaml file will have a new section called Flags which contains the declarations of the target’s custom flags:

Flags:
 - Name: multithreaded
   Values:
    - no-multithreaded
    - multithreaded
   Default: no-multithreaded

 - Name: io
   Values:
    - io-none
    - io-semihosting
    - io-linux-syscalls
   Default: io-none
  • Name: the name to categorize a flag.
  • Values: a list of possible values.
  • Default: it specifies which value this flag should take if not specified in the command-line invocation. It must be one value from the Values field.

A Default value is useful to save users from specifying custom flags that have a most commonly used value.

The namespace of flag values is common across all flags. This means that flag values must be unique.

The reasoning for n-ary flags is expanded in the Appendix.

New command-line option to use multilib flags

The driver must be informed about the multilib custom flags with a new command-line option.

-fmultilib-flag=C

Where the grammar for C is:

C -> option
option -> multithreaded | no-multithreaded | io-none | io-semihosting | io-linux-syscalls | ...

There must be one option instance for each flag specified:

-fmultilib-flag=multithreaded -fmultilib-flag=io-semihosting

Contradictory options are untied by last one wins.

These options are to be used exclusively by the multilib mechanism in the Clang driver. Hence they are not forwarded to the compiler frontend.

List supported custom flags

Another command-line option to list all flags declarations is proposed:

-fmultilib-flag-list

Output:

multithreaded:
  multithreaded
  semihosted
io:
  io-none
  io-semihosting
  io-linux-syscalls

Usage of custom flags by the Variants specifications

Library variants should list their requirement on one or more custom flags like they do for any other flag. The new command-line option is passed as-is to the multilib system, therefore it should be listed in the same format as described in Section New command-line option to use multilib flags.

Moreover, a variant that does not specify a requirement on any particular flag can be matched against any value of that flag. For instance, let’s introduce a new custom flag called heap-opt that governs the selection of optimized memory allocation functions as a multilib layer. In the example below, the use of -fmultilib-flag=heap-opt-size enables the selection of the multilib layer heap_optsize, while at the same time enabling the match to the most suitable base C library (as the latter does not list any requirement on the heap-opt dimension).

Variants:
 - Dir: libc
   Group: stdlibs
   Flags:
    - -march=armv8-a
    - -mfpu=none
    - -fmultilib-flag=no-multithreaded
    - -fmultilib-flag=io-semihosting

 - Dir: libc_multithreaded
   Group: stdlibs
   Flags:
    - -march=armv8-a
    - -mfpu=none
    - -fmultilib-flag=multithreaded
    - -fmultilib-flag=io-semihosting

 - Dir: libc_nosemihosting_multithreaded
   Group: stdlibs
   Flags:
    - -march=armv8-a
    - -mfpu=none
    - -fmultilib-flag=no-multithreaded
    - -fmultilib-flag=io-none

 - Dir: libc_nosemihosting
   Group: stdlibs
   Flags:
    - -march=armv8-a
    - -mfpu=none
    - -fmultilib-flag=no-multithreaded
    - -fmultilib-flag=io-none

 - Dir: heap_optsize
   Flags:
    - -march=armv8-a
    - -fmultilib-flag=heap-opt-size

Flags:
 - Name: multithreaded
   Values:
    - no-multithreaded
    - multithreaded
   Default: no-multithreaded

 - Name: io
   Values:
    - io-none
    - io-semihosting
    - io-linux-syscalls
   Default: io-none

 - Name: heap-opt
   Values:
   - heap-opt-size
   - heap-opt-security
   - heap-opt-fast
   Default: heap-opt-size

Semantic check of multilib flags

The use of unsupported flags should emit a warning such as:

warning: unsupport multilib flag ‘foo’

Appendix: design decisions

N-ary flags

The choice for n-ary flags over binary flags is motivated by some features for which the choice is more than a simple on/off. Examples include:

  • I/O:
    • stubs for IO functions;
    • semihosting-aware;
    • implemented using Linux syscalls;
  • Heap optimised for:
    • size;
    • performance;
    • security.
1 Like

I assume this is out of scope, but I’ll bring it up just to get it on the record.

Did you consider whether to have combined flags? Or is this proposal only for options where you choose only one of a set of things.

Also these flags are referred to by the Value name, is it on the user to make sure that no Value overlap between different Name?

I presume that the same code used to print this could also be used to warn of shadowed names in the configuration, to help the user diagnose this.

In reality, because you have to use the full Value name, I assume most would end up “namespaced” as you have done for example with heap-opt-size.

It’s not heap::opt-size, which would be one alternative that avoids shadowing, with considerable complexity added to pay for it. So I see why you’d avoid that sort of thing.

Same question for duplicate Name, especially if these files can be combined with other multilib files (like an “overlay”).

If you could hook this up to the “did you mean” apparatus, that would be very cool. unsupported multilib flag hap-opt-size did you mean heap-opt-size?.

One way this might work is a key added to the flag definition to put it into a “addition” mode. This would be compatible with this proposal.

Whether anyone ever does that, only the future will tell. Just thinking about whether this proposal puts us into any corners we can’t code our way out of.

Thanks for the review @DavidSpickett .

Did you consider whether to have combined flags? Or is this proposal only for options where you choose only one of a set of things.

One way this might work is a key added to the flag definition to put it into a “addition” mode. This would be compatible with this proposal.

Whether anyone ever does that, only the future will tell. Just thinking about whether this proposal puts us into any corners we can’t code our way out of.

I don’t understand what you mean. Can you give one example?

Also these flags are referred to by the Value name, is it on the user to make sure that no Value overlap between different Name ?

Good point. I reckon we could either (a) error out at the YAML parsing stage or (b) leave it on the user and silently accept it, then follow a last parsed wins rule. I don’t have a preference either way, so happy to hear opinions.

Same discussion applies for the Name as you mentioned.

It’s not heap::opt-size , which would be one alternative that avoids shadowing, with considerable complexity added to pay for it. So I see why you’d avoid that sort of thing.

We have considered the alternative of different namespaces for each Flag. Besides the additional complexity of handling segregated namespaces, I couldn’t think of an elegant way to encode it in the command-line option. It might look something like:

-fmultilib-flag=heap=opt-size

The presence of two = signs wasn’t a favourite of ours.

If you could hook this up to the “did you mean” apparatus, that would be very cool. unsupported multilib flag hap-opt-size did you mean heap-opt-size? .

Thanks for the suggestion. I’ll have to look this up, but if it’s doable, I can’t see any drawback.

Maybe I have libraries that can include drivers for different output devices. Serial, ethernet, wifi.

Using the proposed syntax I might write:

 - Name: output-driver
   Values:
   - output-driver-serial
   - output-driver-serial-ethernet
   - output-driver-serial-ethernet-wifi
  <...permutations continue...>
   Default: output-driver-serial

Or I could split them up into binary options like this:

 - Name: output-driver-serial
   Values:
   - output-driver-serial-enabled
   - output-driver-serial-disabled
   Default: output-driver-serial-enabled

<...repeat for ethernet and wifi...>

And now that I write that out, that seems like the better approach anyway, but let me continue so I can explain the original point.

Say I wanted to add a way to select many items from a list I could have a new key to indicate that flag behaviour:

 - Name: output-driver
   Values:
   - output-driver-serial
   - output-driver-ethernet
   - output-driver-wifi
   Default: output-driver-serial
   Mode: additive

Where the default for Mode is “N-ary”.

So it seems likely to me that if in the future someone wanted to implement this, they would not be restricted by the implementation you propose.

And given you can already get this effect by making many binary options, there’s a good chance no one will do this anyway :slight_smile:

TLDR: You don’t need to change anything, I’m just showing my thought process to come to that conclusion.

So another good reason to choose N-ary flags is that a 2 option N-ary is a binary flag too.

It depends. If there is an existing option that explains the decision making process, you wouldn’t have to have earlier warnings.

The drawback with warnings can be that if name and value shadowing is going to be common, the warnings are just spam. If you intend folks to mostly keep the names unique (it seems like you do) then the warnings would be useful.

My first instinct would be to allow the file to be parsed but warn about any shadowing. That way you can incrementally fix a file, and use files that you may not be allowed to modify, but have these problems.

C++ has taken too many of my brain cells so I’d go with :: but then we also get into defining reserved characters. So I agree with your decision to keep it a flat namespace.

Very much a “nice to have” given that the user is presumably able to look at the YAML file if they need to anyway.

(I mentioned this in the Embedded sync call today, but repeating here as a comment)

One thing that has come up with RISC-V’s adoption of YAML for multilib is that the YAML is much better at going the flags -> directories direction than other uses of multilibs.

Some build systems like crosstool-ng and maybe newlib’s use the gcc driver -print-multi-lib output to decide how to do their build ( [RISCV] Allow YAML file to control multilib selection by ArcaneNibble · Pull Request #98856 · llvm/llvm-project · GitHub ). This output prints the opposite mapping, i.e. directory -> flags, for the info of “use these flags to build a library into this directory”. The clang driver supports the same flag.

It would be good to understand how you expect the proposed functionality to interact with -print-multi-lib functionality.

I have uploaded drafts of the Pull Requests for this proposal:

Please bear in mind this is still a work in progress. But feel free to add comments in the mean time.

They do not address @lenary’s concerns yet. We’re still investigating how to approach this.

1 Like

Thanks for your comments. Find below the changes proposed to the RFC to address them.

The Pull Requests linked previously already reflect the proposed changes.

Warning on invalid flag value name: suggestions

If the user specifies -fmultilib-flag=<name> with a name that is invalid, but close enough to any valid flag value name in terms of edit distance, a suggesting warning is shown:

warning: unsupported option '-fmultilib-flag=invalidname'; did you mean '-fmultilib-flag=validname'?

The candidate with the smallest edit distance is chosen for the suggestion, up to a certain maximum value (implementation detail), after which a non-suggesting warning is shown instead:

warning: unsupported option '-fmultilib-flag=invalidname'

Covering use case of -print-multi-lib

Some build systems, for example newlib’s (link), use the -print-multi-lib command-line option to query what library variants are shipped with the target compiler and what command-line options were used to build the variants.

In order to cover this use case, the proposal is modified as follows.

Multilib flags declarations

In the previous proposal, each custom flag value was a simple string that corresponded to its value name. With this update, each custom flag value has the following definition:

  • Name (required): the name of the custom flag value (string). This is the string to be used in -fmultilib-flag=<string>.
  • ExtraBuildArgs (optional): a list of strings corresponding to the extra build arguments used to build a library variant that’s in accordance to this specific custom flag value.

As an example, a custom flag declaration for multithreading that works for newlib:

Flags:
- Name: multithreaded
  Values:
  - Name: no-multithreaded
    ExtraBuildArgs: [-D__SINGLE_THREAD__]
  - Name: multithreaded
  Default: no-multithreaded

In newlib, multithreading is enabled by default and can be disabled by defining the __SINGLE_THREAD__ macro. Accordingly, -D__SINGLE_THREAD__ is part of the ExtraBuildArgs field for -fmultilib-flag=no-multithreaded.

Since multithreading is the default for newlib, there is no need to have ExtraBuildArgs for -fmultilib-flag=multithreaded.

-print-multi-lib output

With the proposed changes, the output of -print-multi-lib would be as follows. For the given full multilib.yaml file:

---
MultilibVersion: 1.0

Groups:
- Name: stdlib
  Type: Exclusive

Variants:
# Base libc
- Dir: arm-none-eabi/thumb/v8-m.main/nofp
  Flags: [--target=thumbv8m.main-unknown-none-eabi, -mfpu=none, -fmultilib-flag=no-multithreaded]
  Group: stdlib
- Dir: arm-none-eabi/multithreaded/thumb/v8-m.main/nofp
  Flags: [--target=thumbv8m.main-unknown-none-eabi, -mfpu=none, -fmultilib-flag=multithreaded]
  Group: stdlib
# Memalloc layers
- Dir: arm-none-eabi/thumb/v8-m.main/small_heap
  Flags: [--target=thumbv8m.main-unknown-none-eabi, -fmultilib-flag=heap-opt-size]
- Dir: arm-none-eabi/thumb/v8-m.main/hardened_heap
  Flags: [--target=thumbv8m.main-unknown-none-eabi, -fmultilib-flag=heap-opt-security]

Flags:
- Name: multithreaded
  Values:
  - Name: no-multithreaded
    ExtraBuildArgs: [-D__SINGLE_THREAD__]
  - Name: multithreaded
  Default: no-multithreaded
- Name: heap-opt
  Values:
  - Name: heap-opt-size
  - Name: heap-opt-fast
  - Name: heap-opt-security
    ExtraBuildArgs: [-D_FORTIFY_SOURCE=3]
  Default: heap-opt-size

The output would be:

arm-none-eabi/thumb/v8-m.main/nofp;@-target=thumbv8m.main-unknown-none-eabi@mfpu=none@fmultilib-flag=no-multithreaded@D__SINGLE_THREAD__
arm-none-eabi/multithreaded/thumb/v8-m.main/nofp;@-target=thumbv8m.main-unknown-none-eabi@mfpu=none@fmultilib-flag=multithreaded
arm-none-eabi/thumb/v8-m.main/small_heap;@-target=thumbv8m.main-unknown-none-eabi@fmultilib-flag=heap-opt-size
arm-none-eabi/thumb/v8-m.main/hardened_heap;@-target=thumbv8m.main-unknown-none-eabi@fmultilib-flag=heap-opt-security@D_FORTIFY_SOURCE=3
1 Like

Given the existence of the use case of -print-multi-lib in library build systems, it’s useful to provide more context about what use cases we expect for the Multilib custom flags. We envision two use cases for them:

Case 1: select libraries bundled with the toolchain

In this scenario, the user wants to use the custom flags to help in selecting the most appropriate library variant/s among a list of libraries already shipped with the toolchain.

As such, the presence of the ExtraBuildArgs fields is not relevant here, since these are useful for building the library variants, not for library selection.

Case 2: build library variants using -print-multi-lib

Libraries’ build systems are already using -print-multi-lib to collect a mapping from library variant to command-line argument list for building it. Examples include newlib and picolibc.

In this use case, a build system queries the target toolchain about what library variants should be built. This query returns the mapping mentioned previously. With this information in hand, the build system may launch the build of each variant using the collected command-line arguments.

Here, the use of ExtraBuildArgs is crucial to inform the build system about extra command-line arguments required for the code generation of a particular feature. For example, newlib has multithreading enabled by default, but it can be disabled by defining the __SINGLE_THREAD__ macro.

A multilib.yaml for driving a newlib multilib build must therefore specify -D__SINGLE_THREAD__ in the ExtraBuildArgs of the no-multithreaded variant (represented by the custom flag value no-multithreaded).

Had we not proposed a mechanism such as ExtraBuildArgs, there would be no way to instruct a build system that uses -print-multi-lib on how to build, for instance, a multithreaded library differently from a singlethreaded one. The mere presence of -fmultilib-flag=no-multithreaded in the command-line argument list isn’t enough: this is not a code generation option, but simply a driver option useful only for multilib selection.

This looks reasonable to me. Thanks for thinking about -print-multi-libs.

1 Like

Would ExtraBuildArgs be also set by the driver when the corresponding -fmultilib-flag= is set? If yes, I think this should be called just Args to avoid prescribing a specific use case and instead making this a more generally useful feature. If no, I think the name should make it clear that this is only used for -print-multi-lib. Personally, I think these should be set by the driver because it expands the set of possible use cases.

1 Like

We talked about this today in the Embedded Sync Up. Some things came up:

It seems reasonable to use these both in -print-multi-libs and also when actually compiling for that library, we think (so in the example, -D__SINGLE_THREAD__ would also be passed to all compiles when using -fmultilib-flag=no-multithreaded)

One clarification needed is whether the flags are driver flags, or cc1 flags. Given they’re printed by -print-multi-libs, which build systems will parse then feed back into the driver, we were reasonably in agreement that they need to be driver flags - but then how does that work when driver flags are what choose the multilib to resolve, at which point you get some additional driver flags that need to be processed. This could give circular conflicts, without care.

1 Like

I am investigating how to take the ExtraBuildArgs and feed them back to the driver. Since the multilib YAML parsing and processing happens after cmd-line arguments have been already parsed (and to some extent also already processed), that doesn’t look trivial to do.

Nonetheless, the current Pull Requests as building blocks of the overall solution can still be reviewed and merged independently. I kindly invite everyone to have a look at them. Thanks

1 Like

I reckon that if the driver is now supposed to get the ExtraBuildArgs associated with the multilib custom flags and inject those args back into the driver invocation, these args should not be part of the -print-multi-lib output. Otherwise the args would be passed twice. I will do some experimentation to see how it looks like.

I think this makes sense, that we cannot feed them back into the driver in an obvious way.

Maybe we need a better name than ExtraBuildArgs - I think PrintAdditionalArgs (or something similar) could help make it clearer they’re for the -print-multi-libs output only? Sorry to start a “what to name this” debate/bikeshed - If you think a change here is actually an improvement, I’ll be happy with any name that gets closer to the concept of “print” and away from “build”.

I agree the name should probably be changed. I’m using “ExtraBuildArgs” for now just temporarily while we figure out a better name.

I think we might be conflating two behaviours here: 1. Feed the extra args back into the driver or 2. Don’t.

If we go with the first one, the output of -print-multi-lib would not include the “ExtraBuildArgs”, only the custom flags:

arm-none-eabi/thumb/v8-m.main/nofp;@-target=thumbv8m.main-unknown-none-eabi@mfpu=none@fmultilib-flag=no-multithreaded

Then the driver that’s consuming the output will read -fmultilib-flag=no-multithreaded from the command-line and feed -D__SINGLE_THREAD__ into the invocation.

If instead we go with the second, the output of -print-multi-lib would include both the “ExtraBuildArgs” and the custom flags:

arm-none-eabi/thumb/v8-m.main/nofp;@-target=thumbv8m.main-unknown-none-eabi@mfpu=none@fmultilib-flag=no-multithreaded@D__SINGLE_THREAD__

But in this case, the driver will not feed -D__SINGLE_THREAD__ into the invocation because it’s already there in the output argument list.

I believe either works fine, but I appreciate your sentiment that the first one might cover more use cases. Though it’s important to determine that, in that case, the “ExtraBuildArgs” would stop being part of the -print-multi-lib output.

Sorry, I misunderstood the prior replies. I agree that those are the two reasonable choices, and my reply was based on (1) not being possible, which you haven’t actually ruled out yet. Don’t mind me.

No worries @lenary.

Given that these “extra args” will be fed into the driver as if they were passed in the command line, we might as well just call them DriverArgs. It has little ambiguity in my opinion.

I’ve updated the Pull Requests to rename ExtraBuildArgd to DriverArgs. These are also now fed back into the driver.

Another change is in the diagnostics. Previously a warning, now invalid flag value names lead to errors.

Lastly, I’ve updated the documentation too: Add documentation for Multilib custom flags by vhscampos · Pull Request #114998 · llvm/llvm-project · GitHub

Do we need to be able to set arbitrary driver args? The example shown so far involve just macros. Could we simplify the implementation by allowing only macros?

Flags:
- Name: multithreaded
  Values:
  - Name: no-multithreaded
    Macros: [__SINGLE_THREAD__]
  - Name: multithreaded
  Default: no-multithreaded