Extending post-regalloc MachineLICM

gbossu · October 24, 2024, 7:09am

Hi all!

I’m currently looking into extending the post-regalloc MachineLICM pass in our LLVM fork for AMD’s AI engines. I have seen some recent activity in MachineLICM, so I would be happy to hear some thoughts on the changes I’m planning. Especially because I’m getting the impression that the post-RA MachineLICM is left “feature-weak” on purpose.

I have three small patches:

[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM
Both are based on MachineLICMBase, and the functionality there is
“switched” based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, instead of overwriting it
based on MRI.isSSA(), which is un-reliable. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong.
[CodeGen] MachineLICM: Do not consider “constant loop liveins” as loop defs
This allows to hoist instructions using registers that are not re-defined/clobbered in the loop. Previous MachineLICM basically could not hoist any instruction using register inputs. There are a couple of test updates in AArch64, AMDGPU and AVR targets. They make sense to me, but I’m no expert.
[CodeGen] MachineLICM: Use RegUnits and look at lane masks
This moves away from MCRegAliasIterator and uses RegUnits instead to
track interferences. As a side effect, it’s also easier to look at lane
masks coming from BB liveins. That means we see less registers as “clobbered”, and can hoist more instructions. I actually notice @Pierre-vh merged a similar change upstream in this PR, so the main difference now is that I look at lane masks for BB liveins. (cc @arsenm you had a question about this on the PR)

For our target, more late hoisting makes sense because it only appears after RA that some lanes of physical registers are constant. But generally, I am wondering if the changes would be accepted as-is upstream. In particular, MachineLICM has no cost-model and will hoist anything it can. I’m sure one can build scenarios where that extra hoisting can hurt performance. Obviously I could hide those new features behind a target hook, or CL option, but that’s sub-optimal.

TLDR: Would there be any objection in dropping this conservative check in MachineLICM?

    // Conservatively treat live-in's as an external def.	
    // FIXME: That means a reload that're reused in successor block(s) will not	
    // be LICM'ed.	
    for (const auto &LI : BB->liveins()) {	
      for (MCRegAliasIterator AI(LI.PhysReg, TRI, true); AI.isValid(); ++AI)	
        PhysRegDefs.set(*AI);	
    }

and instead consider loop liveins as clobbered only if they are re-defined in the loop?

jayfoad · October 24, 2024, 8:57am

All of these changes sound good to me in principle. It is definitely worth putting them up for upstream review.

gbossu · October 24, 2024, 2:26pm

Thanks for your answer, I have opened the first PR here: [CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM by gbossu · Pull Request #113573 · llvm/llvm-project · GitHub

Topic		Replies	Views
Post-register allocation MachineLICM does not optimize loops LLVM Dev List Archives	0	117	February 23, 2017
Machine LICM and cheap instructions? LLVM Dev List Archives	5	107	January 12, 2015
Why does Machine LICM pass only examine outermost loops before RA? LLVM Dev List Archives	1	78	July 1, 2019
MachineLICM register pressure calculation Code Generation	3	142	May 13, 2024
[MachineLICM] how to add target-special constraints LLVM Dev List Archives	0	79	November 1, 2020

Extending post-regalloc MachineLICM

Related topics