Currently the X86 backend does CPU auto-detection and subtarget feature detection when the TargetMachine is created if no explicit CPU was specified. It's counterintuitive for low level tools like ‘llc’ to do this, as it means the same .ll file compiled on heterogenous machines generates different results from the same ‘llc’ command line. It is still useful to be able to opt-in to such behavior to, for example, replicate clang’s behavior when -mcpu=native is supplied to clang. My thought is to do something similar here and teach ‘llc’ to recognize -mcpu=native and probe the host CPU if that is given. The subtarget features will then be filled in according to the feature string for that CPU. This (a) changes the auto-detection from opt-out to opt-in and (b) moves the logic out of the core target backend and into the tools drivers.
Attached are draft patches that do this for X86. Similar but smaller cleanups can also be done for SystemZ and PowerPC if it’s agreed this is a good idea.
I'm not a huge fan of this because then you get to decide on a default
for all the ports, but I can understand if people want to move this
way to reduce uncertainty.
Alternately have a way for the backend to pretty print out the results
of the auto-detection when asked?
I'm not a huge fan of this because then you get to decide on a default
for all the ports, but I can understand if people want to move this
way to reduce uncertainty.
FWIW, since it's one of the three targets Jim mentioned, -march=z10 is
the obvious default for SystemZ.
On the other hand, I think it's good to run the tests on both z10 and z196,
since there were significant additions to the architecture in z196.
Until now we've been getting that by implicitly using -mcpu=z10 for
cross testing on the usual x86_64 buildbots and implicitly using
-mcpu=native (-mcpu=z196) on the z buildbot.
We could do it instead by having two llcs and FileChecks per
CodeGen/SystemZ test, which would give better cross-toolchain
coverage, but would almost double the test time. Would that
still be OK?
We already decide on a default. For these three targets is “whatever the user is running on.” For all the other targets it’s something the backend selects. I’m proposing we, as you say, remove the uncertainty and have the default always be the same from one run to the next even (especially) when those runs are on different machines.
For X86, there’s two options, the “generic” target that backend already supports or “core2” which is what clang typically defaults to for x86_64h. I lean towards the former for simplicity. We also don’t have to change all three of the ports. I’m personally motivated primarily by x86, and was intending to leave it to the maintainers of the other two targets to decide what they want to do here, perhaps with some strong encouragement.
It’s very important that a run of “llc” on one machine produce the same output on two heterogenous machines given the same input and command lines*. That’s not true right now, leading to lots of bot failures that patch originators can’t reproduce because they’re getting different code locally due to the auto-detection. The recent “Test failures with 3.4.1” thread for examples.
-Jim
* With the caveat of platform differences like elf vs. macho. A similar argument can be made there, but unlike CPU, there’s no sensible way to define a least-common-denominator default, so it’s harder. I’m explicitly avoiding opening that can of worms.
I'm not a huge fan of this because then you get to decide on a default
for all the ports, but I can understand if people want to move this
way to reduce uncertainty.
FWIW, since it's one of the three targets Jim mentioned, -march=z10 is
the obvious default for SystemZ.
Makes sense.
On the other hand, I think it's good to run the tests on both z10 and z196,
since there were significant additions to the architecture in z196.
Until now we've been getting that by implicitly using -mcpu=z10 for
cross testing on the usual x86_64 buildbots and implicitly using
-mcpu=native (-mcpu=z196) on the z buildbot.
We could do it instead by having two llcs and FileChecks per
CodeGen/SystemZ test, which would give better cross-toolchain
coverage, but would almost double the test time. Would that
still be OK?
If you really want to run both variations of the tests, yes, absolutely, and that’s the right way to do it. Having the tests run differently in a cross-compile vs. a host-compile situation is exactly the sort of thing I’m hoping to move away from.
We already decide on a default. For these three targets is “whatever the user is running on.” For all the other targets it’s something the backend selects. I’m proposing we, as you say, remove the uncertainty and have the default always be the same from one run to the next even (especially) when those runs are on different machines.
For X86, there’s two options, the “generic” target that backend already supports or “core2” which is what clang typically defaults to for x86_64h. I lean towards the former for simplicity. We also don’t have to change all three of the ports. I’m personally motivated primarily by x86, and was intending to leave it to the maintainers of the other two targets to decide what they want to do here, perhaps with some strong encouragement.
It’s very important that a run of “llc” on one machine produce the same output on two heterogenous machines given the same input and command lines*. That’s not true right now, leading to lots of bot failures that patch originators can’t reproduce because they’re getting different code locally due to the auto-detection. The recent “Test failures with 3.4.1” thread for examples.
As one of the contributors to pocl <http://pocl.sourceforge.net/>, I want to emphatically chime in here. Before our 0.9 release, we had to discover that running clang, llc, opt, etc. often give results that differ depending on which machine they are run, and also between each other. We ended up specifying exactly what kind of machine is to be targeted, via various -triple, -march, and -mcpu options.
The fact that these options have a different syntax for different LLVM tools didn't help...
Any step towards more unification among different machines and different LLVM-based tools would be a great help.
I think we should do this, but only to make llc's behavior more
deterministic and predictable. I don't think we should start checking in
tests that rely on the default subtarget features without explicitly
requesting the relevant features.
Consider somebody who works on an ARM or x86 variant like atom. Probably
what they'll want to do is set up a bot that runs the LLVM test suite in
such a way that their subtarget features are on by default. Our test suite
currently "supports" that if the host CPU features happen to be the ones
you want. Instead, we should probably move to a world where bots can set
different defaults by configuring the tools appropriately. For example,
they could rewrite 'llc' to 'llc -mcpu=blah' in lit if no mcpu flags exist.
This would be similar to what we do in Clang for the default C++ ABI. If
the test actually needs the MSVC or Itanium C++ ABI, they ask for it
explicitly. Otherwise they test one or the other depending on the default
target triple. I don't think we should double the number of RUN lines to
keep that test coverage, and I don't think the cost of Windows-bot-only
test failures is too high.
On the other hand, I think it's good to run the tests on both z10 and z196,
since there were significant additions to the architecture in z196.
Until now we've been getting that by implicitly using -mcpu=z10 for
cross testing on the usual x86_64 buildbots and implicitly using
-mcpu=native (-mcpu=z196) on the z buildbot.
We could do it instead by having two llcs and FileChecks per
CodeGen/SystemZ test, which would give better cross-toolchain
coverage, but would almost double the test time. Would that
still be OK?
If you really want to run both variations of the tests, yes, absolutely,
and that’s the right way to do it. Having the tests run differently in a
cross-compile vs. a host-compile situation is exactly the sort of thing
I’m hoping to move away from.
OK, in that case I'll do the SystemZ side if the x86 change goes in.
It's very important that a run of "llc" on one machine produce the same
output on two heterogenous machines given the same input and command lines*.
That's not true right now, leading to lots of bot failures that patch
originators can't reproduce because they're getting different code locally
due to the auto-detection. The recent "Test failures with 3.4.1" thread for
examples.
I think we should do this, but only to make llc's behavior more
deterministic and predictable. I don't think we should start checking in
tests that rely on the default subtarget features without explicitly
requesting the relevant features.
So this is largely my concern here. I hate breaking the Atom bots all
the time. That said, breaking the Atom bots all the time keeps me
aware that I want to make my tests resilient against various different
cpu types or that I should be picking explicit options to test a
particular backend feature.
It's very important that a run of "llc" on one machine produce the same
output on two heterogenous machines given the same input and command lines*.
That's not true right now, leading to lots of bot failures that patch
originators can't reproduce because they're getting different code locally
due to the auto-detection. The recent "Test failures with 3.4.1" thread for
examples.
I think we should do this, but only to make llc's behavior more
deterministic and predictable. I don't think we should start checking in
tests that rely on the default subtarget features without explicitly
requesting the relevant features.
So this is largely my concern here. I hate breaking the Atom bots all
the time. That said, breaking the Atom bots all the time keeps me
aware that I want to make my tests resilient against various different
cpu types or that I should be picking explicit options to test a
particular backend feature.
Won’t this make that even easier? Now those same tests will break locally for you and you won’t have to wait for the bots to yell at you. And even when there’s no bot at all that would have ever noticed.
Now, one additional thing we could do (outside the scope of this proposal) is also implement a fuzzer of some kind to look for target tests that don’t have an explicit arch/feature/whatever setting on the RUN line and run it with all available -mcpu= values and see if it still passes (which is what the lack of such things on the RUN line implies).