Hi all!
I’m currently looking into extending the post-regalloc MachineLICM pass in our LLVM fork for AMD’s AI engines. I have seen some recent activity in MachineLICM, so I would be happy to hear some thoughts on the changes I’m planning. Especially because I’m getting the impression that the post-RA MachineLICM is left “feature-weak” on purpose.
I have three small patches:
-
[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM
Both are based on MachineLICMBase, and the functionality there is
“switched” based on aPreRegAlloc
flag. This commit is simply about
trusting the original value of that flag, instead of overwriting it
based on MRI.isSSA(), which is un-reliable. I see that we can now defineisSSA
in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong. -
[CodeGen] MachineLICM: Do not consider “constant loop liveins” as loop defs
This allows to hoist instructions using registers that are not re-defined/clobbered in the loop. Previous MachineLICM basically could not hoist any instruction using register inputs. There are a couple of test updates in AArch64, AMDGPU and AVR targets. They make sense to me, but I’m no expert. -
[CodeGen] MachineLICM: Use RegUnits and look at lane masks
This moves away from MCRegAliasIterator and uses RegUnits instead to
track interferences. As a side effect, it’s also easier to look at lane
masks coming from BB liveins. That means we see less registers as “clobbered”, and can hoist more instructions. I actually notice @Pierre-vh merged a similar change upstream in this PR, so the main difference now is that I look at lane masks for BB liveins. (cc @arsenm you had a question about this on the PR)
For our target, more late hoisting makes sense because it only appears after RA that some lanes of physical registers are constant. But generally, I am wondering if the changes would be accepted as-is upstream. In particular, MachineLICM has no cost-model and will hoist anything it can. I’m sure one can build scenarios where that extra hoisting can hurt performance. Obviously I could hide those new features behind a target hook, or CL option, but that’s sub-optimal.
TLDR: Would there be any objection in dropping this conservative check in MachineLICM?
// Conservatively treat live-in's as an external def.
// FIXME: That means a reload that're reused in successor block(s) will not
// be LICM'ed.
for (const auto &LI : BB->liveins()) {
for (MCRegAliasIterator AI(LI.PhysReg, TRI, true); AI.isValid(); ++AI)
PhysRegDefs.set(*AI);
}
and instead consider loop liveins as clobbered only if they are re-defined in the loop?