Hi CFE Devs,
For out-of-tree target developers, what is the best practice advice for when to use ‘-mcpu’ and when to use ‘-march’ to identify variants or various generations of a target?
Thanks,
MartinO
Hi CFE Devs,
For out-of-tree target developers, what is the best practice advice for when to use ‘-mcpu’ and when to use ‘-march’ to identify variants or various generations of a target?
Thanks,
MartinO
I guess Clang command line options tend to be compatible with GCC.
What GCC says about those options?
Hi Martin,
Sorry about the delay here, but I’ve got some advice even though a lot of things aren’t solid anywhere.
a) Does this match an out of tree target for gcc? If so, I’d match that.
b) Typically -march/-mtune (though the latter isn’t supported in llvm at the moment, see an earlier post from me to llvm-dev on how to support that) are what I’d use for new ports. We used it in a number of new/out of tree ports in gcc as well.
c) -mcpu is occastionally used for “-march±mtune” when -march by itself is generally “this architecture, but generic tuning”.
Hope this helps and happy to elaborate if you need it anywhere.
-eric
Does clang differ from g++ in this respect?
This is from the man page for g++ 4.8.5 for Intel x86 and x86-64 processors:
-march=cpu-type
Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code
for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the
one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.
...
-mtune=cpu-type
Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions.
While picking a specific cpu-type schedules things appropriately for that particular chip, the compiler does not generate any
code that cannot run on the default machine type unless you use a -march=cpu-type option. ...
...
-mcpu=cpu-type
A deprecated synonym for -mtune.
Hi Richard,
Does clang differ from g++ in this respect?
It shouldn’t, we strive for compatibility here.
This is from the man page for g++ 4.8.5 for Intel x86 and x86-64 processors:
This is because this is for the in-tree x86 target. ARM is different. Mips is more like what I said. mn10300 is more like what I said. Power is different again, and in a different way.
However, Martin was asking about a non-specific out of tree target and the advice there should still hold.
-eric
Thanks Eric,
After the original reply to my query I had a good look at the GCC documentation for these options, and what I discovered is that “there is no consensus” in GCC. Basically, saying do what GCC does was a non-answer as it clarified nothing.
X86 has deprecated ‘-mcpu’ in favour of ‘-mtune’, and it uses ‘-mtune’ to mean that the scheduling, etc. should be biased in favour of more performant code for the processor identified by ‘-mtune’, but that the code is still 100% functional on all other processors in the family identified by ‘-march’.
MIPS and ARM seem to have a very different notion of what they mean, and at the end of my examination of these options, I came to the conclusion that there is no common convention for usage and it all seems very haphazard.
With the processor that I am targeting, the notion of tuning schedules for a particular variant is meaningless, because the schedule is critical in the absence of interlocked instruction execution (and VLIW). This effectively means that ‘-mtune’ is not appropriate to my platform and leaves me with the toss-up of ‘-mcpu’ versus ‘-march’, and to tell you the truth, I can see little reason to choose one over the other.
Most of the time, I want to differentiate based on ISA versions. That is, the code should be generated for one version of the ISA or another, though for the same basic architecture (ISA can vary a lot from version to version). ARM seems to use ‘-march’ and ‘-mcpu’ more or less interchangeably, and uses ‘+’ to add features to the value specified.
The majority of GCC target only use ‘-mcpu’, so I think that in the absence of any real and meaningful best practice, that I will just stick with the triples and ‘-mcpu’.
All the best,
MartinO
Hi Martin,
Thanks Eric,
After the original reply to my query I had a good look at the GCC documentation for these options, and what I discovered is that “there is no consensus” in GCC. Basically, saying do what GCC does was a non-answer as it clarified nothing.
That’s why I made the comments I did on a generic new target which is the suggested consensus.
X86 has deprecated ‘-mcpu’ in favour of ‘-mtune’, and it uses ‘-mtune’ to mean that the scheduling, etc. should be biased in favour of more performant code for the processor identified by ‘-mtune’, but that the code is still 100% functional on all other processors in the family identified by ‘-march’.
This is correct.
MIPS and ARM seem to have a very different notion of what they mean, and at the end of my examination of these options, I came to the conclusion that there is no common convention for usage and it all seems very haphazard.
MIPS is very specific to using -march and -mtune. -mcpu is an alias to (-march/-mtune).
This is what I suggest to people wanting to have new ports.
With the processor that I am targeting, the notion of tuning schedules for a particular variant is meaningless, because the schedule is critical in the absence of interlocked instruction execution (and VLIW). This effectively means that ‘-mtune’ is not appropriate to my platform and leaves me with the toss-up of ‘-mcpu’ versus ‘-march’, and to tell you the truth, I can see little reason to choose one over the other.
-mtune is used for more than scheduling.
Most of the time, I want to differentiate based on ISA versions. That is, the code should be generated for one version of the ISA or another, though for the same basic architecture (ISA can vary a lot from version to version). ARM seems to use ‘-march’ and ‘-mcpu’ more or less interchangeably, and uses ‘+’ to add features to the value specified.
That’s not quite true either and is subtle behavior. Overall I think the ARM strategy is the worst compiler argument method to copy.
The majority of GCC target only use ‘-mcpu’, so I think that in the absence of any real and meaningful best practice, that I will just stick with the triples and ‘-mcpu’.
I’d really suggest you not do this, but since it’s going to be an out of tree target you, may, of course do what you want.
-eric
Thanks very much Eric for taking the time to carefully explain this to me.
So if I am the author of the backend for a new processor technology, or willing to modernise my existing implementation, you would recommend that the ‘-mcpu’ option is deprecated and probably best not used at all, or perhaps just as a synonym for ‘-march + -mtune’?
The first part of the target triple guides the overall high level processor architecture - essentially the CPU - and that I would be better using ‘-march’ to specify the variants of this CPU architecture for groups of variants that are not binary compatible with each other; effectively my ISA generations. And that ‘-mtune’ is used to tune the code-generation for performance of a particular version of my processor from a single group that are otherwise binary compatible.
For the past few years, our processor has had essentially the same conceptual hardware architecture, but each generation has had significant changes to the ISA. Sometimes this is just instructions added or removed, but sometimes it is changes in the binary encoding of instructions, or the assembly syntax used for expressing the instructions, or the scheduling constraints. I expect that ‘-march’ would be appropriate for differentiating between these incompatible variations of the architecture.
At the moment I can only think of one use for ‘-mtune’ though for my target, and that would be to allow fine-tuning of the instructions chosen to handle some kinds of hardware bug fixes; though this would be a rare scenario as usually HW fixes require different code generation, not just schedule tuning. At the moment my backend is not cache-aware, but if I did implement the prefetch ISDs, then ‘-mtune’ would I suppose be appropriate for this as the caches can vary from one configuration to another.
When you say “-mtune is used for more than scheduling”, what kinds of use-case would be good examples?
Would this be a reasonable summary? My original question was because I found that I could not determine a consistent usage of these options in GCC supported targets to guide my own decisions.
Of course, for targets common to LLVM and GCC, it is essential that LLVM follows the GCC uses of these switches for compatibility; but it is guidance for new targets that I think is missing that could establish best practices for both compiler development communities.
All the best and thanks again,
MartinO
Hi Martin,
Sorry about the delay
Thanks very much Eric for taking the time to carefully explain this to me.
So if I am the author of the backend for a new processor technology, or willing to modernise my existing implementation, you would recommend that the ‘-mcpu’ option is deprecated and probably best not used at all, or perhaps just as a synonym for ‘-march + -mtune’?
The first part of the target triple guides the overall high level processor architecture - essentially the CPU - and that I would be better using ‘-march’ to specify the variants of this CPU architecture for groups of variants that are not binary compatible with each other; effectively my ISA generations. And that ‘-mtune’ is used to tune the code-generation for performance of a particular version of my processor from a single group that are otherwise binary compatible.
Yes, I agree.
For the past few years, our processor has had essentially the same conceptual hardware architecture, but each generation has had significant changes to the ISA. Sometimes this is just instructions added or removed, but sometimes it is changes in the binary encoding of instructions, or the assembly syntax used for expressing the instructions, or the scheduling constraints. I expect that ‘-march’ would be appropriate for differentiating between these incompatible variations of the architecture.
Absolutely.
At the moment I can only think of one use for ‘-mtune’ though for my target, and that would be to allow fine-tuning of the instructions chosen to handle some kinds of hardware bug fixes; though this would be a rare scenario as usually HW fixes require different code generation, not just schedule tuning. At the moment my backend is not cache-aware, but if I did implement the prefetch ISDs, then ‘-mtune’ would I suppose be appropriate for this as the caches can vary from one configuration to another.
When you say “-mtune is used for more than scheduling”, what kinds of use-case would be good examples?
I’ll give you a few examples from the X86 backend that are currently subtarget features because -mtune support really doesn’t exist right now - FeatureFast* mostly. A particularly good one to look at is FeatureSlowLEA and a few of the others. These are mostly tuning flags that are in a “generic” x86-64 cpu target because we don’t have decent tuning support.
Would this be a reasonable summary? My original question was because I found that I could not determine a consistent usage of these options in GCC supported targets to guide my own decisions.
Of course, for targets common to LLVM and GCC, it is essential that LLVM follows the GCC uses of these switches for compatibility; but it is guidance for new targets that I think is missing that could establish best practices for both compiler development communities.
Very much so. Last I was in the gcc community this was the guidance that we were giving to new ports
Thanks!
-eric