First of all, thank you for all the effort you put into this. I think the changes to the driver implementation are ready to go in, especially since those changes should be largely invisible to users. I’d like to discuss the new multilib.yaml
configuration format since that part is visible to users.
In comparison to GCC, the proposed configuration format offers more flexibility. In the case of GCC, the supported multilibs are configured only at build time. Having runtime configurability for multilibs is a very nice idea, but we should be mindful of the unnecessary complexity.
In the proposed design, you map flags to labels using regex-based matching (the FlagMap
section) and then define variants as a combination of labels (the Variants
section). That’s fundamentally consistent with the GCC implementation which maps flag combinations to a chosen set of variants, but uses option parsing instead.
I think the naming could be clearer. In particular the term “flags” is used both in the Variants
and FlagMap
sections, but these are not actually flags, rather they’re arbitrary strings that are used for matching input flags to variants. Perhaps “label” or “tag” would be a better name?
I am concerned about regular expressions as the default mechanism for matching flags since regular expressions, while powerful, also tend to be error prone.
To use a concrete example from the review, to match the target Armv8.5 or newer, you’d have to do the following in GCC:
MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.6-none-eabi
MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.7-none-eabi
MULTILIB_MATCHES += target=armv8.5-none-eabi=target=armv8.8-none-eabi
...
In the proposed design, you could instead do:
- Regex: target=armv8\.([6-9]|[1-9][0-9]+)-none-eabi
MatchFlags: [target=armv8.5-none-eabi]
This is an example where the higher expressive power of regular expressions helps with complex rules.
The downside is that regular expressions require escaping which makes simple cases more difficult. For example, if I wanted to match the -fc++-abi=itanium
flag, I have to use:
- Regex: fc\+\+-abi=itanium
MatchFlags: [itanium-abi]
This may be non-obvious and I’m worried that this will lead to unexpected and difficult to debug issues. We’ve already seen similar issues with Sanitizer special case list files which also use regular expressions (they also treat *
as .*
straddling the line between glob patterns and regular expressions).
Perhaps starting with a simpler, even if more verbose, version resembling the GCC implementation would be better? If it turns out to be a problem in practice, we could introduce the support for regular expressions (or glob patterns) later.
The other aspect I’m concerned about is defining variants, specifically around composition. When defining variants, you have to specify a directory name, a set of “flags” and the output of -print-multi-lib
which is very flexible but also potentially error prone and might result in a lot of duplication.
In the driver, this is handled by the MultilibBuilder and MultilibSetBuilder classes which aid in constructing hierarchies of variants that we use in drivers like Gnu. To give a concrete example:
auto ArchV7A = MultilibBuilder("/armv7-a").flag("+march=armv7-a");
auto ArchV7M = MultilibBuilder("/armv7-m").flag("+march=armv7-m");
auto Hard = MultilibBuilder("/hard").flag("+mfloat-abi=hard");
auto SoftFp = MultilibBuilder("/soft").flag("+mfloat-abi=softfp");
auto Soft = MultilibBuilder("").flag("+mfloat-abi=soft");
MultilibSet Multilibs =
.Either(ArchV7A, ArchV7M)
.Either(Hard, SoftFp, Soft)
.makeMultilibSet();
This code would generate the following variants:
armv7-a/hard;@march=armv7-a@mfloat-abi=hard
armv7-a/soft;@march=armv7-a@mfloat-abi=softfp
armv7-a;@march=armv7-a@mfloat-abi=soft
armv7-m/hard;@march=armv7-m@mfloat-abi=hard
armv7-m/soft;@march=armv7-m@mfloat-abi=softfp
armv7-m;@march=armv7-m@mfloat-abi=soft
This is quite powerful and becomes increasingly more important as the number of potential combinations grows. I’m wondering if we could make the configuration file more closely resemble the usage of those classes and allow automatic composition rather than requiring users to specify all potential combinations manually?
To make my suggestions more concrete, here’s an idea for an alternative format:
Options:
'':
- mfloat-abi=soft
'/armv7-a':
- march=armv7-a
'/armv7-m':
- march=armv7-m
'/hard':
- mfloat-abi=hard
'/soft':
- mfloat-abi=softfp
Variants:
Either: ['/armv7-a', '/armv7-m']
Either: ['', '/soft', '/hard']
To handle different spelling of flags, we could do something like:
Options:
'/armv8.5':
- target=armv8.5-none-eabi: # used for -print-multi-lib
- target=armv8.[6-7]-none-eabi # other matches
'/thumbv7m':
- target=thumbv7m-none-eabi:
- target=thumbv[7-9]*
- mfpu=fpv4-sp-d16
'/thumbv6m':
- target=thumbv6m-none-eabi:
- target=thumbv6m-*
I’m fine landing the proposed format as is (and marking it as experimental) to get more practical experience, but I also want to make sure we explore other alternatives since this is a pretty significant new feature that will likely get adopted by a lot of users, especially in the embedded space. I would also be interested in hearing from others.