I’m prototyping a software pipeliner for our architecture, and I am running into some inconveniences with the current PipelinerLoopInfo interface. I have small changes in the MachinePipeliner and ModuloScheduler that fix it, and I would appreciate feedback.
1- At the point where adjustTripCount is called, the original code has already been modified. That means that I can’t find back some of the Phi nodes that I tagged during analysis. The first workaround I had was to make the change from shouldUseSchedule, just before returning true.
2- The modulo schedule expansion creates a brand new copy of PipelinerLoopInfo, redoing all the analysis. Since I have been I have been modifying the loop, in shouldUseSchedule, that second run doesn’t always give the same answer as the earlier one.
My minimal change would be to have an extra ‘prepareUpdate’ call back from the start of the modulo expander. Perhaps it could be done lazily from the prolog block callbacks, but that sounds iffy.
Additionally, I am wondering whether the MachinePipeliner could pass a non-owning reference to the PipelinerLoopInfo to the constructor of ModuloSchedulerExpander. I think this would be more transparant for the target implementation, and would avoid computing the same thing twice. Note that it would also have made my first implementation work out of the box.
I can’t speak for the general infrastructure, but our downstream compiler has been using the MachinePipeliner for awhile (Longer than ARM, shorter than Hexagon/PPC, but not upstream).
For our back-end, adjustTripCount can be called at any time, really. We extract and keep around a special branch instruction (HardwareLoop pass) in the LoopPipelinerInfo that gets modified. Because of this design, we’re guaranteed that this instruction can be found, even if we have to regenerate the LPI after modifying the loop.
Thus, your suggestion to pass around the LPIs won’t cause any pain on our end. In fact it always did seem weird to me that it was regenerated using a modified, in-progress pipelined loop. You’d think the information about the original source loop would be more important to keep.
Thanks for this insight and kudos for the ‘always did seem weird to me’, which truly reassures me.
FYI, my current implementation recognizes a vanilla downcounting loop from the init-phi-decr-condbranch chain/cycle; it finds the stage in which the decrement is scheduled and with that compensates the incoming tripcount value for the prologue blocks that don’t contain a decrement. If I find the decrement in stage0 it’s trivial to create the guards for the prologue blocks, which takes away the need for static minimum tripcounts except for the code expansion consideration. Hence, I don’t have to force my decrement to sit in stage0 but can still use it if it happens to be there.