In the following example we miss an optimization that would make the generated assembly shorter: Compiler Explorer . GCC doesn’t miss this optimization.
This was also recognized in this post: How to copy propagate physical register introduced before RA
I have created a draft in [MachineCopyPropagation] Detect and fix suboptimal instruction order to enable optimizations by spaits · Pull Request #98087 · llvm/llvm-project · GitHub . As you can see in the tests this PR has brought lots if improvements in not just load related contexts on almost all the targets, In the PR comments I was suggested to move this optimization to the new post RA Scheduler. I also think it would be better there (since I have kind of re-implemented the scheduler DAG ).
To move this optimization there first I would need to introduce anti dependency breaking to the new post RA scheduler.
Then I should further develop the anti dependency breaker to work in the context I want. (I know that it currently doesn’t work for the cases my PR fixes. I was able to get the old post RA scheduler working and try out the dep breaker. See Compiler Explorer)
My first question is how to introduce the dependency breaker to the new post RA scheduler? Can I do it the same way as in the old one? Extend the scheduler class and insert the needed calls for anti dep breaker? Would that be okay with you or you have other suggestions?
Also I don’t want this optimization to be hidden behind a hidden back-end flag. Could I add a dep breaker for -O3 or -O2 by default?
To make the optimizations I want possible it is needed to move around instructions in the dep breaker. Currently the dep breaker only renames. Would it be fine if it would move around instructions?
I would appreciate any input and suggestions.