Whither/whether -mtune support?

Hi All,

I’ve gotten a few notes over the last few months and also given some of the recent changes to various backends to “update” the default tunings for a generic processor it made me think again about adding support for tuning to a processor rather than generating processor specific code - hence, mtune. I hope this is rather uncontroversial, but happy to discuss at length if anyone thinks we shouldn’t add this functionality to the compiler.

That said, I have a bit of a strawman outline for what I think needs to happen in general, and while I don’t have any concrete plans to attack this soon I thought I’d post it in case someone else was interested:

a) split out (in targets where we care) code generation features from tuning features on a per subtarget basis into a separate set of features
b) add support for initializing them based on a tune parameter to the subtarget
c) add support to clang for generating the tune parameter on a per function basis

d) Use them in TTI and various other backend hooks rather than any code gen specific ones.

Simple right? :slight_smile: I’m happy to elaborate here, but I believe the work is relatively straight forward if a lot of typing.

Every step there is likely to be a lot more complicated, but similar to the getSubtarget<>/getSubtargetImpl changes it should be very easy to do on any particular backend and fan out support there. Just make sure that clang knows which targets do and don’t support the flag.

Happy to help or review work here.

Thanks!

-eric

Hi Eric,

Thanks for sharing your thoughts. My thoughts on getting -mtune working were along the same lines, and indeed at least conceptually shouldn’t be too hard.
Given there are valid use cases for -mtune and it doesn’t seem like it will be hard to maintain support for it, I don’t see a reason why not to support -mtune.
My gut feel is that the hardest part about this is making sure that as target features are introduced in the future, they always get marked correctly on whether they are a code generation or a tuning feature. (I tend to call these “architectural” vs “micro-architectural” features, but I don’t see a need to bikeshed on this naming convention here). A tiny bit of well-written documentation and careful review should hopefully avoid those kind of mistakes when new features get introduced.

I also don’t have concrete plans myself to work on this soon, but I’m also happy to help review where I can if someone picks this up.

Thanks,

Kristof

Hi Eric,

This is interesting, and we (Hexagon) were thinking about something that would allow passing extra information aside from the CPU. The main (but not the only) motivation was related to scheduling: single-threaded vs multi-threaded.
Another case could be specifying cache configuration: sizes of L1, L2, and some additional info, like TCM size for example.

What are your thoughts on allowing features to have non-boolean values?

-Krzysztof

Hello Everyone,

I am planning to post a fix for the AArch64 specific
https://bugs.llvm.org/show_bug.cgi?id=34625 next week. In summary on
AArch64 mtune is using architectural features from the -mtune cpu that
aren't present in the base architecture when -mcpu isn't used. The fix
restores correctness but is somewhat blunt and removes all the non
hard-coded effects of mtune as we can't distinguish between
architectural and micro-architectural features at present.

I'm willing to work [*] on an implementation of mtune for aarch64 so
that it only affects micro-architectural features, I think that this
can be done with the approach that Eric suggests. If this is
successful then it can be adapted or generalised for other targets.

Peter

[*] I can't promise to make this priority number 1 so this may take
some time to complete.

Hi Eric,

Thanks for sharing your thoughts on that feature.

FWIW, I am in favor of this work.
If you need some backend people to do reviews don’t hesitate to add me!

Cheers,
-Quentin

Sounds sensible. I’d be happy to look at detailed patches.

I’d like to point out that target features are already used for tuning settings today (and that’s fine I think). So maybe this doesn’t need to be much more than just an extra way to initialize target features.

  • Matthias

Sounds sensible. I’d be happy to look at detailed patches.

I’d like to point out that target features are already used for tuning settings today (and that’s fine I think). So maybe this doesn’t need to be much more than just an extra way to initialize target features.

It is, I’d like to move the tuning parameters aside and basically initialize tune to arch/cpu (why ARM? why?) if it’s not already set :slight_smile:

-eric