Hi Quentin,
Hi Kristof,
I’ve been digging a little bit deeper into the biggest performance regressions I’ve observed.
What I’ve observed so far is:
- A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I’ve raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.
I commented in the PR. This is a known problem and we have a solution. Given this is an optimization in the sense that it does not affect the correctness of the program, we didn’t push for fixing it now.
For O0 we wanted to focus ourselves on generating correct code. Unless the regressions you are seeing are preventing debugging/running of the program, I wouldn’t block the flip of the switch on that.
What do you think?
For O0, I think most users care about fast compile time and an excellent debug experience.
Next to that, I am told that some users also use -O0 to have a straightforward, simple, mapping between source code and the generated assembly.
Out of the issues I’ve seen so far, I think this is the single “optimization” issue that I feel gives a negative first impression of GlobalISel.
If users look at the generated assembly for floating point code it looks more like active de-optimization rather than “no optimization”.
My guess is also that this may be the biggest reason for the about 20% performance slow-down and 11% code size increase I measured earlier.
OTOH, I clearly agree this is an optimization issue, not a correctness issue.
Combining the above, I think it would be best if this issue got fixed before we turn on GlobalISel by default at -O0, or we may end up hearing quite a few (non-critical) complaints about this from users.
Basically I think this is a tradeoff between giving a better first impression of GlobalISel vs getting more people to use and test it earlier.
Thanks for the write-up on the PR, that is very useful.
Do you have any rough idea how much effort would be involved in getting this fixed? I got the impression Daniel is making good progress on the tablegen generation, which is key to getting this issue fixed?
I think that no matter whether we decide to switch the default before or after fixing this issue, this is one of the most urgent issues to fix as far as I can see.
If there is some way I can help contribute to fixing PR32550, I would like to help; but with the dependency on tablegen generation, I’m not sure what the best way is to help make that PR get fixed faster?
I think FastISel tries to generate the best code it can no matter what. For GISel O0 however, not doing this optimization sounds sensible to me.
Now, I would say that the same remark as the previous bullet point apply: we shouldn’t do it unless it gets in the way of running/debugging the program.
I agree that these optimizations should not be done at -O0. I think not doing them is actually an improvement: you give the user what they asked, i.e. “no optimization”, and an as-straigthforward-as-possible mapping from source to assembly.
- FastISel doesn’t\ seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that’s a lot better than GlobalISel for switch statement at -O0. I’m not sure if we need to do something here before enabling GlobalISel by default. I’m thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.
Sounds optimization-ish to me. Same remark.
Agreed, optimization-ish. In comparison to the above point on FastISel “peepholes”, I think that lowering switch statements to something else than a cascaded sequence of conditional branches doesn’t make the generated code harder to map to the source. So, in comparison to the above point on FastISel “peepholes”, I’m not actively against being smarter about lowering switch statements at -O0.
But I agree, this shouldn’t hold up turning on GlobalISel by default at -O0.
Other than the above remarks, before turning on GlobalISel by default, we’d better test/verify that debug info quality remains good.
I haven’t looked into that at all, but am hoping to start looking into that soon, with the help of the DIVA tool (https://github.com/SNSystems/DIVA) presented at last EuroLLVM (http://llvm.org/devmtg/2017-03//assets/slides/diva_debug_information_visual_analyzer.pdf). I don’t recall anyone so far making any statements about the quality of the debug info they’ve measured or experienced with GlobalISel enabled?
Other than the all of the above, I just wanted to mention that Oliver recently started running csmith on AArch64 GlobalISel and found one issue so far that already got fixed (https://bugs.llvm.org//show_bug.cgi?id=32733).
If he finds other correctness issues, I’m sure he’ll keep on reporting them.