I think there is a mistake in the machinepipeliner interface. In the TargetInstrInfo.h in the class PipelinerLoopInfo there is a function "bool shouldIgnoreForPipelining(const MachineInstr *MI)". The description says that if this function returns true for a given MachineInstr it will not be pipelined.
However in reality it is not ignored and is being considered for pipelining. I implemented this function in my own backend, and put an instruction there that I want to be ignored, but still this instruction end up trying to be pipelined. I implemented the same way as in PPCInstrInfo.cpp, and I think it has the same bug. Is this a bug, or am I forgetting something?
It may be confusing, but it is not a bug.
And you are right, if you are using the in-tree machinepipeliner, it won’t have any effects,
because there is NO In tree reference of this API at all.
You can see the comments when James introduced this API in https://reviews.llvm.org/D67167
Jinsong Ji (纪金松), PhD.
XL/LLVM on Power Compiler Development
Sander via llvm-dev —06/02/2020 03:47:53 PM—Hi all, I think there is a mistake in the machinepipeliner interface. In the
Sorry to bring this thread from 3 months ago back, but I’m running into this issue too.
I do see that shouldIgnore is not called in the MachinePipeliner, however, James’ comment doesn’t really resolve the issue or make the story any clearer.
My summary of the comment is: “Hexagon and PPC9 do not need to ignore any instructions. However, in the case that you do, such as when the indvar update is explicit, this function is provided to allow the target to strip those instructions from the pipelined kernel.”
However, the reality is that the implementation seems incomplete and there’s no instructions on how to achieve the desired result. Is it left as an exercise to the reader/implementer? Is there something I’m missing?
As I mentioned before,
this API was introduced by James, mostly for his out-of-tree implementations,
the in tree implementation NEVER implement it.
If it is causing confusion, I think we have two choices:
- See whether James would like to upstream part of his code? Or any other target that would like to use this can post a patch to use it.
- Remove this confusing API in tree.
Jinsong Ji (纪金松), PhD.
XL/LLVM on Power Compiler Development
“Nagurne, James” —09/02/2020 01:43:29 PM—Sorry to bring this thread from 3 months ago back, but I’m running into this issue too. I do see tha
Ah, I apologize for not seeing the meaning of your first email. I had not considered that he was working on an out-of-tree target that utilizes the ignore capability. You’ve made things very clear, thank you!
Since he’s on the email thread now:
James, do you plan on upstreaming any portion of the ignore capability? If not, do you have any pointers for a target that may want to implement it? The one issue I see that might exist is that the scheduler wants a region begin and end as iterators, meaning contiguous instructions. It doesn’t really support the removal of instructions between those two points, so you’d have to synthesize a region, create a meta-iterator, or some other intrusive modification.
Adding Hendrik, who has taken over ownership of the downstream code involved.
I can also add background about the rationale, of that helps? It was added to ignore induction variable update code (scalar code) that is rewritten when we unroll / peel the prolog epilog anyway.
Targets like Hexagon or PPC with dedicated loop control instructions for pipelined loops don’t need this, but our target was simple RISC. It was for that reason that I felt the feature would be useful for other targets.
We have that behaves similarly to yours in this regard it seems. Specifically, our target utilizes the HardwareLoop pass with CounterInReg true, and then treats loops augmented by this pass as software pipeline candidates. It seems PPC does this as well, but has CounterInReg false. Our loop body ends up looking like this (in mildly simplified pseudocode):
%indvar = PHI(%init, %preheader, %dec, %body)
%dec = subtract %indvar, count
branch-compare (%dec > 0), %body
The instructions I’ve bolded cause problems in the pipeliner as you are well aware. They are the induction variable updates that shouldn’t be pipelined and will be re-inserted or updated post-scheduling. When I saw “ignoreForPipelining” I was excited that the problem was already solved, but alas! J
I’m in the process of trying to come up with a workaround pass before/after pipelining that temporarily hides these instructions inside the branch, but I’m always on the lookout for better alternatives.
Thanks for the response! I would appreciate anything that Hendrik could add to the discussion as well. If it’s just that the implementation isn’t robust enough to upstream, our team might be amenable to help out. I do, however, wonder if the implementation was used in a custom scheduler rather than the default SMS and expander. That would make generalizing quite a bit tougher.
Having not worked on this for circa one year I’ve gone and refreshed my memory.
We have a pretty capable implementation of swing modulo scheduling downstream, distinct from the MachinePipeliner implementation. Historically, MachinePipeliner had very tight coupling between the finding of a suitable schedule and emitting the code that adheres to that schedule.
I spent quite a bit of time separating the two; this led to the “ModuloSchedule” and “ModuloScheduleExpander” classes, which we use downstream too (we actually have a minor variant, where we use predication for the prolog and epilog). However, the code that analyzes loops and determines a good schedule is completely custom. It is in this code that we use “shouldIgnoreForPipelining”.
All of this code is designed to go upstream. There is nothing stopping us putting it upstream (modulo some non-upstreamable features that may have crept in, but those can be removed). The major hurdle is that the target architecture is NOT upstream, and no existing upstream architecture looks similar enough apart from Hexagon (RISC, VLIW, predicate register file), and we don’t have enough knowledge of Hexagon to use it as a testing target.
We toyed with the idea of coming up with a toy architecture just so we could have something to test against upstream, but the personpower required wasn’t available.
As always, it was with the best of intentions, and I do think that the cleanup of MachinePipeliner was worth it (the PeelingScheduleExpander is much easier to reason about, IMHO).
I greatly appreciate you going back to gather that intel, James. It actually helps my understanding of the whole pipeliner puzzle quite a bit!
I did identify, like you, that the MachinePipeliner pass (more precisely, SwingSchedulerDAG) was fairly rigid in that the target doesn’t get much of a say in some heuristics or in the generation of the result loop. I’m definitely still in a learning phase and am poking around to see where we can customize some things. It might very well come down to the same decision to write our own pass that implements our own scheduler, and then utilize ModuloSchedule and an Expander to make changes. One interesting topic that came up was unrolling to get more utilization of the FUs. Someone had already identified this optimization in https://reviews.llvm.org/D53005, but it seems to have gone stale. I’m also curious about customizing the expander, since we too have some ways to make the prolog and epilog more efficient.
My current solution hides the PHI and IndVar update instructions added by HardwareLoops inside the branch and rematerializes them after the stock MachinePipeliner runs. Not having to do that would be great, but now that it’s implemented, I think I can stop bothering you guys for historical data! J
One last thing - is your target upstream? or are you working on a downstream target?
The target is downstream.
Just seeing this, nothing to add to what James said. FWIW I agree the name is somewhat misleading, we use it rather like s/shouldIgnoreForPipelining/shouldFailWhenPipelined/, nor is it called by anything upstream, so maybe the API is somewhat unnecessary IMO.