FYI, planning to enable nontrivial loop unswitch in the new PM at O3

One of the last big missing pieces for the new PM is enabling non-trivial loop unswitch at O3.

The pass is now working well and passing all the testing I have done as well as some others’ testing (thanks Fedor!) so it should be ready to be enabled.

I’ve done preliminary benchmarking on the test suite and SPEC and haven’t seen any interesting regressions and quite a few improvements. Still, there may be some regressions out there. Not sure how many folks are using the new PM widely, but if you see regressions, don’t hesitate to send a note my way.

Anyways, just wanted to send a heads-up. Not expecting this to be disruptive so will probably land it next week unless someone gives a shout.

-Chandler

Is there any written description of what “non trivialness” is there?

Is there any written description of what “non trivialness” is there?

There is some in the comments in both old and new passes.

Short version is that a trivial unswitch does not require duplicating any part of the loop body. A non-trivial unswitch requires duplicating part of the loop body.

The reason for the term “trivial” is due the potential cost-model needed to decide whether the transform is a good idea. If no part of the loop body need to be duplicated, unswitching is considered “trivially” beneficial – IOW, we always do it. But duplicating some part of the loop body has a code size hit at least, and may in some cases worsen performance (icache, etc.) so it is no longer trivial to decide.

LLVM only does “non-trivial” unswitching at O3 because of the risk of code size growth without any specific performance gains.

Also, the cost modeling for non-trivial loop unswitch remains an area of active development. In particular, there has been some work on the old unswitch pass that hasn’t been ported to the new one and should be. There are also important differences between the old PM and the new PM here. The old PM’s unswitch has a budget for non-trivial unswitching that is managed in a non-obvious way (IIRC, it is per-module, but i’d have to double check). The new PM doen’t use this kind of budget. This makes it more risky for code size growth in some senses, but also makes it much more predictable overall. The old pass could be very hard to predict the behavior of where boring refactorings or code movement would dramatically change the budget and behavior.

-Chandler