Option -mtune

Hello,

Quick question what you think we should be doing with option -mtune. Problem is that it looks like we support it because it is documented, it can be supplied on the command line, but it is silently ignored:

// FIXME: Handle -mtune=.
(void)Args.hasArg(options::OPT_mtune_EQ);

giving the false impression to users it is doing something is probably the worst of options we have (we get regularly questions about this).
We could simply remove it, or if this is too radical, issue a diagnostic that this is an unsupported option? Any thoughts/preferences?

Cheers,
Sjoerd.

Hello,

Quick question what you think we should be doing with option -mtune. Problem is that it looks like we support it because it is documented, it can be supplied on the command line, but it is silently ignored:

// FIXME: Handle -mtune=.
(void)Args.hasArg(options::OPT_mtune_EQ);

giving the false impression to users it is doing something is probably the worst of options we have

Not /the worst/ as such, many options are added to Clang so it’s command line compatible (in the sense that you’ll get a running program that behaves correctly) with GCC - I imagine the commit history of this feature probably justifies the addition with that sort of reason.

Seems quite reasonable for --help and web documentation to mention that it’s a no-op/supported-for-compatibility flag.

As for adding a warning for these sort of no-op flags, maybe? Probably opt-in, though.

Thanks for your reply, and I can see that this is the only benefit of this no-op: if your move to Clang and if you still have -mtune set in your build environment, you don’t get an “unknown option” error. But for me personally, silently ignoring things and giving the impression it works is worse than a simple Makefile fix, but maybe I am wrong, which is why I checked here on the list. I was guessing that a diagnostic in this case would only be value if it’s enabled by default as I’m afraid many users won’t enable it? And if you need to add a flag to get this diagnostic, you might as well get rid of -mtune?

Cheers.

Thanks for your reply, and I can see that this is the only benefit of this no-op: if your move to Clang and if you still have -mtune set in your build environment, you don’t get an “unknown option” error. But for me personally, silently ignoring things and giving the impression it works is worse than a simple Makefile fix, but maybe I am wrong, which is why I checked here on the list.

If it were a matter of correctness rather than performance, I’d agree - silently accepting a flag & then not doing the thing would be problematic. (eg: accepting a language version flag for an unsupported language version & instead using a previous/other language version)

I was guessing that a diagnostic in this case would only be value if it’s enabled by default as I’m afraid many users won’t enable it? And if you need to add a flag to get this diagnostic, you might as well get rid of -mtune?

Generally, yeah. /maybe/ other folks’ll think it’s worth a warning, I don’t really - but I don’t have much experience with the flag in particular, or users of it.

Perhaps you could speak to the problems this causes your users, to help motivate the discussion/change?

  • Dave

I don't think that we could remove -mtune without breaking some builds that support both GCC and Clang with the same script, as -mtune is supported in GCC. As David points out -mtune isn't needed for correctness so ignoring it for at least these types of project seems to be reasonable. An opt-in warning for accepted but ignored options could be an option though.

IIRC We did use to have some support for mtune in AArch64, unfortunately we had to take it out (mea culpa https://reviews.llvm.org/D39179) as it wasn't just affecting micro-architectural features it was including architecture features as well. I think that there is some scope for putting back support for mtune, it would need us to more cleanly separate out the micro-architectural features. The most recent post about this was http://lists.llvm.org/pipermail/llvm-dev/2017-October/118520.html

Peter

I understand the possible benefits of -mtune, but I just don’t see how we benefit from a half baked implementation, or a no-op, and leaving the no-op in hoping that one day it will be implemented properly.

I also understand that removing the option could break some builds, but hey, compiler options and defaults change sometimes. In my opinion the cost of migrating would be worth the benefits. But I understand the different angles here, and hey, at least I’ve tried now… :slight_smile:

This is addressing the hard problem of setting optimal compiler options with -target, -march, -mcpu, and -mtune. In this equation -mtune is just a minor annoyance, but if we could get rid of this part of the confusion then that would be a good change to me and avoid the regular question we get what the -mtune options should be.

Cheers.

My naïve opinions FWIW:

  • Personally I have never understood the difference between -target, -march, -mcpu, and -mtune (particularly -march and -mtune). There are a lot of blog posts out there (e.g. https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/ http://sdf.org/~riley/blog/2014/10/30/march-mtune/ ) indicating that I’m not alone. (Anyone got a really good resource on this topic?)
  • It sounds like -mtune’s behavior is not “falsifiable”; it can never produce “wrong” code; it’s purely an optimization option, like -O2. So there is no practical difference, but there may be a psychological difference, between saying “Clang permanently treats -mtune as a no-op” versus “Clang has a remarkably low quality of implementation for -mtune, and has no immediate plans to either improve or regress it.”
  • I don’t think Clang should do anything actively to break scripts/Makefiles that invoke both GCC and Clang as “$(CXX) -mtune=…”. That would be a downside with no upside.

my $.02,
–Arthur

My naïve opinions FWIW:
- Personally I have never understood the difference between -target, -march, -mcpu, and -mtune (particularly -march and -mtune).
There are a lot of blog posts out there (e.g. It is more complicated than I thought: -mtune, -march in GCC – Daniel Lemire's blog
http://sdf.org/~riley/blog/2014/10/30/march-mtune/ ) indicating that I'm not alone. (Anyone got a really good resource on this topic?)

I haven't got a good reference in documentation unfortunately. The model our GCC team for Arm and AArch64 use (not sure if this applies to other Targets) is:
-mcpu=<cpu> == -march=<architecture CPU uses> -mtune=<cpu> where the mtune does not affect compatibility with architecture such as use of instructions, it could affect the scheduling model.

The reason for separating them is that you can in theory optimise code for running on a particular CPU but still have it be compatible with earlier CPUs of an earlier architecture. For example (sorry I only know Arm off the top of my head) -march=armv8-a -mtune=cortex-a76 . This would in GCC produce code optimised for cortex-a76, but would be compatible with other Arm processors that supported v8.0 (Cortex-A76 is v8.2).

Hope this helps, and hope I have that right.

Peter

Hi Arthur,

from the GCC manual for the x86 target ( https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/x86-Options.html#x86-Options ):

`-march=`

Generate instructions for the machine type . In contrast to , which merely tunes the generated code for the specified , allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying implies .

``
`-mtune=`

Tune to everything applicable about the generated code, except for the ABI and the set of available instructions. While picking a specific schedules things appropriately for that particular chip, the compiler does not generate any code that cannot run on the default machine type unless you use a option. For example, if GCC is configured for i686-pc-linux-gnu then generates code that is tuned for Pentium 4 but still runs on i686 machines.

``
`-mcpu=`

A deprecated synonym for .

So, -march tells the compiler what instructions it can use, and thus sets a kind of minimum requirements for running the generated binary; while -mtune directs optimisation choices, but does not impact which CPUs can run the binary.

Looking at the documentation for other targets like ARM and RISC-V, it seems that the behaviour of -march and -mtune is consistent across them, while in some cases -mcpu may specify additional flags.

  • It sounds like -mtune’s behavior is not “falsifiable”; it can never produce “wrong” code; it’s purely an optimization option, like -O2.

Indeed.

So there is no practical difference, but there may be a psychological difference, between saying “Clang permanently treats -mtune as a no-op” versus “Clang has a remarkably low quality of implementation for -mtune, and has no immediate plans to either improve or regress it.”

True… I’d document it as a “no-op” :slight_smile:

Ciao,

.Andrea

I understand the possible benefits of -mtune, but I just don’t see how we benefit from a half baked implementation, or a no-op, and leaving the no-op in hoping that one day it will be implemented properly.

“I don’t see how we benefit from <what’s there today>” - I think we’ve already covered the practical benefit of command line compatibility. That’s a fairly real/practical benefit & something that Clang does for lots of options, not only this one.

I also understand that removing the option could break some builds, but hey, compiler options and defaults change sometimes.

Pretty rarely are flags outright removed or semantics changed in ways that would break builds, so far as I know.

In my opinion the cost of migrating would be worth the benefits. But I understand the different angles here, and hey, at least I’ve tried now… :slight_smile:

This is addressing the hard problem of setting optimal compiler options with -target, -march, -mcpu, and -mtune. In this equation -mtune is just a minor annoyance, but if we could get rid of this part of the confusion then that would be a good change to me and avoid the regular question we get what the -mtune options should be.

It’d be interesting to understand the path your users go through that leads to confusion - what documentation (command line --help, websites (ones maintained/owned by your company, or Clang’s open source/general website, etc), etc) & perhaps where more clarity could be provided there.

  • Daved

Fair enough, here’s a draft patch to document this as a no-op: https://reviews.llvm.org/D78511

In this particular example, the latest question that led my post to the list, actually came from the ChromeOS team :slight_smile: about the best options for a particular big-LITTLE configuration, also asking what the -mtune setting should be. But as I mentioned earlier, just in general, this is tricky as it requires knowledge of the different cpus, its optional extensions, and what is actually implemented, and then this needs translations to the different compiler flags and options, so this is just a general problem that many users struggle with. This is all further complicated by Clang’s option handling, which for example accepts invalid architecture combinations, so users don’t get proper feedback on their option settings. Within this context, I found silently ignoring -mtune not really helpful and thought feedback to the user would be better. But I accept your argument, so put up this little doc change for review. And you’re absolutely right that different things can be improved here too such as documentation and option handling.

Cheers.

Following up on the -mtune discussion yesterday, I have prepared a draft patch (https://reviews.llvm.org/D78565) that shows how to target different ARM CPU implementations and architecture combinations, for now only 2 M-profile cores.

In the -mtune thread, I mentioned that setting options is non-trivial and I think the Cortex-M55 as documented with examples in D78565 is a good example of that. I.e., it only shows 5 architecture combinations and their corresponding CLI options. Many more architecture combinations are possible, but these 5 combinations are the most likely to be implemented. In the V8-A architecture space, the challenge of many architecture combinations is similar.

My thinking was that it would be valuable to have a listing of architecture combinations and options, so that users can quickly look up an architecture configuration and find its flags to target it. These examples shows usage of this tool (i.e. open-source clang), add puts documentation/examples in the same place as the source, and allows easier edit/reviews and also from non-Arm people. On D78565, Peter raised the question whether this is suited for the Clang Documentation, or if we should move this to some place at arm.com.

So my question is if there are any ideas/preferences/objections for this?

Cheers.

Is the intention to have an exhaustive list, or target common use cases?

I intend to restart work on (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html) which proposed teaching Clang more about the options so it could list and validate them itself. So there’s scope to eventually generate a page that has all the combinations.
(it’s a long way off of being done though so don’t take this as reason to stop documentation efforts, in fact examples would be useful for me too)

I think documenting recent M and A cores would be an excellent start. An exhaustive list, also each core combination, is not very doable I think simply because of the many combinations. At least, not when generating the doc is a manual job. But even if we could auto-generate the exhaustive list, I don’t know yet how useful that is if we only expect an handful of implementations per core. Too much info maybe? A curated list could be better? Again, I think the M55 is a good example. Saying this from memory I thought there were 20+ architecture combinations, but we expect these 5 to be possibly implemented…

Looking forward to the option work. So I am guessing at this point this doc work could coexist with that, and in the mean time it can help users and also us when we add new cores.

Cheers.

David Blaikie via cfe-dev <cfe-dev@lists.llvm.org> writes:

Thanks for your reply, and I can see that this is the only benefit of this
no-op: if your move to Clang and if you still have -mtune set in your build
environment, you don't get an "unknown option" error. But for me
personally, silently ignoring things and giving the impression it works is
worse than a simple Makefile fix, but maybe I am wrong, which is why I
checked here on the list.

If it were a matter of correctness rather than performance, I'd agree -

For some of us, performance *is* correctness. :slight_smile:

IMO this option should emit a warning at least.

                 -David

For some of us, performance is correctness. :slight_smile:

IMO this option should emit a warning at least.

I obviously agree with that, but I don’t see the benefit when that is emitted by using an opt-in flag, which was the consensus if I’m not mistaken.
So the absolute minimum we can do is to document this behaviour, which is what I did with rG35cf2f42dda4.

Sjoerd Meijer via cfe-dev <cfe-dev@lists.llvm.org> writes:

This is addressing the hard problem of setting optimal compiler
options with -target, -march, -mcpu, and -mtune. In this equation
-mtune is just a minor annoyance, but if we could get rid of this part
of the confusion then that would be a good change to me and avoid the
regular question we get what the -mtune options should be.

Honestly, the whole system needs an overhaul:

http://clang-developers.42468.n3.nabble.com/Behavior-of-mcpu-td4064178.html

I noticed this odd difference in behavior based on target. In the end
the answer was, "We want to behave like gcc," but that is not a very
compelling argument to me. Yes "-m" options are machine-specific, but
giving the same option with the same name different behaviors is
non-intuitive.

It doesn't help that these options aren't very well documented:

https://clang.llvm.org/docs/UsersManual.html#target-specific-features-and-limitations

-target is barely mentioned for CPUs and there is certainly no
indication that one needs -target to make -mcpu/-mtune do something
useful.

Contrast that with gcc's extensive documentation of tuning parameters:

Given the way -target works, we're already incompatible with gcc (gcc
with happily cross-compile/tune with -mcpu), so why not just do the
Right Thing and make these options behave uniformly across targets? In
my mind it should be something like this:

* -mcpu implies -target (based on host machine), -march and -mtune

  Example: -mcpu=skylake-avx512 sets
    -target=x86_64-unknown-linux-gnu (when run on a Debian system)
    -march=skylake-avx512
    -mtune=skylake-avx512

  Note that if the host system is, say, an AArch64 Debian machine,
  -target would still be implied as x86_64-unknown-linux-gnu (i.e. we
  are cross-compiling).
  
* -mtune implies -triple (based on host machine as with -mcpu)

* -march implies -triple (based on host machine as with -mcpu)

Of course one could always pass -triple (or other options) explicitly to
suppress the implied behaviors. We still want -mtune and -march to
operate independently of each other (i.e. neither implies the other) so
that one can generate backward-compatible binaries while still tuning
for recent microarchitectures.

                     -David

Arthur O'Dwyer via cfe-dev <cfe-dev@lists.llvm.org> writes:

My naïve opinions FWIW:
- Personally I have never understood the difference between -target,
-march, -mcpu, and -mtune (particularly -march and -mtune). There are a lot
of blog posts out there (e.g.
It is more complicated than I thought: -mtune, -march in GCC – Daniel Lemire's blog
http://sdf.org/~riley/blog/2014/10/30/march-mtune/ ) indicating that I'm
not alone. (Anyone got a really good resource on this topic?)

It's confusing because it differs by target. I just sent a message
about how *I* think it should work, based on how it works for gcc on
X86:

-march specifies an instruction set to use (for example
-march=skylake-avx512 would enable AVX512F extensions, among others).

-mtune specifies tuning for a particular microarchitecture (for example
-mtune=broadwell would optimize for that chip's microarchitecture).

Note that "nonsensical" combinations like -march=skylake-avx512
-mtune=broadwell should be permitted. This would mean generate code for
AVX512F and other extensions but, for example, schedule based on the
broadwell microarchitecture. More often this would be used to generate
backward-compatible code but tuned for a newer microarchitecture (for
example -march=haswell -mtune=icelake).

Currently gcc defined -mcpu as an alias for -mtune on x86. I would
propose that clang's -mcpu imply -target, -march and -mtune for all
targets, as outlined in my other message.

- It sounds like -mtune's behavior is not "falsifiable"; it can never
produce "wrong" code; it's purely an optimization option, like -O2. So
there is no practical difference, but there may be a psychological
difference, between saying "Clang permanently treats -mtune as a no-op"
versus "Clang has a remarkably low *quality of implementation* for -mtune,
and has no immediate plans to either improve or regress it."

-mtune is very important to our customers and I suspect others'
customers as well. Having it behave as a no-op is not ideal and having
it do so without at least a warning is very nearly considered broken.

- I don't think Clang should do anything actively to break
scripts/Makefiles that invoke both GCC and Clang as "$(CXX) -mtune=...".
That would be a downside with no upside.

It depends on what "break" means. As I wrote in my other message, the
behavior of -march/-mtune also depends on -target which to me is very
non-intuitive. gcc has no -target option so we have already broken new
ground. I agree that using -march/-mtune should not abort the compiler
but I think it's fair game to have them behave differently from gcc,
especially given gcc's (and currently clang's because clang follows gcc
in this area) non-uniform behavior across targets.

                     -David

David Greene via cfe-dev <cfe-dev@lists.llvm.org> writes:

* -mtune implies -triple (based on host machine as with -mcpu)

* -march implies -triple (based on host machine as with -mcpu)

Of course one could always pass -triple (or other options) explicitly to
suppress the implied behaviors. We still want -mtune and -march to
operate independently of each other (i.e. neither implies the other) so
that one can generate backward-compatible binaries while still tuning
for recent microarchitectures.

s/-triple/-target/g

                 -David

* -mcpu implies -target (based on host machine), -march and -mtune

  Example: -mcpu=skylake-avx512 sets
    -target=x86_64-unknown-linux-gnu (when run on a Debian system)
    -march=skylake-avx512
    -mtune=skylake-avx512

This is just wrong. The CPU name has no 1:1 mapping to target
architectures. skylake-avx512 can still be happily used for
i386-unknown-linux-gnu to completement your example. The reverse is
somewhat true: the target triple can provide the default CPU
(-march/-mcpu).

Of course one could always pass -triple (or other options) explicitly to
suppress the implied behaviors. We still want -mtune and -march to
operate independently of each other (i.e. neither implies the other) so
that one can generate backward-compatible binaries while still tuning
for recent microarchitectures.

Please submit patches then for making scheduling independent of the
architecture flags. Until then, this whole discussion seems to be a
waste of time to me. Always using the/a default scheduling is IMO a
perfectly sensible behavior and more often than not, what GCC is doing
anyway.

Joerg