Extending post-regalloc MachineLICM

Hi all!

I’m currently looking into extending the post-regalloc MachineLICM pass in our LLVM fork for AMD’s AI engines. I have seen some recent activity in MachineLICM, so I would be happy to hear some thoughts on the changes I’m planning. Especially because I’m getting the impression that the post-RA MachineLICM is left “feature-weak” on purpose.

I have three small patches:

  • [CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM
    Both are based on MachineLICMBase, and the functionality there is
    “switched” based on a PreRegAlloc flag. This commit is simply about
    trusting the original value of that flag, instead of overwriting it
    based on MRI.isSSA(), which is un-reliable. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong.

  • [CodeGen] MachineLICM: Do not consider “constant loop liveins” as loop defs
    This allows to hoist instructions using registers that are not re-defined/clobbered in the loop. Previous MachineLICM basically could not hoist any instruction using register inputs. There are a couple of test updates in AArch64, AMDGPU and AVR targets. They make sense to me, but I’m no expert.

  • [CodeGen] MachineLICM: Use RegUnits and look at lane masks
    This moves away from MCRegAliasIterator and uses RegUnits instead to
    track interferences. As a side effect, it’s also easier to look at lane
    masks coming from BB liveins. That means we see less registers as “clobbered”, and can hoist more instructions. I actually notice @Pierre-vh merged a similar change upstream in this PR, so the main difference now is that I look at lane masks for BB liveins. (cc @arsenm you had a question about this on the PR)

For our target, more late hoisting makes sense because it only appears after RA that some lanes of physical registers are constant. But generally, I am wondering if the changes would be accepted as-is upstream. In particular, MachineLICM has no cost-model and will hoist anything it can. I’m sure one can build scenarios where that extra hoisting can hurt performance. Obviously I could hide those new features behind a target hook, or CL option, but that’s sub-optimal.

TLDR: Would there be any objection in dropping this conservative check in MachineLICM?

    // Conservatively treat live-in's as an external def.	
    // FIXME: That means a reload that're reused in successor block(s) will not	
    // be LICM'ed.	
    for (const auto &LI : BB->liveins()) {	
      for (MCRegAliasIterator AI(LI.PhysReg, TRI, true); AI.isValid(); ++AI)	
        PhysRegDefs.set(*AI);	
    }

and instead consider loop liveins as clobbered only if they are re-defined in the loop?

All of these changes sound good to me in principle. It is definitely worth putting them up for upstream review.

Thanks for your answer, I have opened the first PR here: [CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM by gbossu · Pull Request #113573 · llvm/llvm-project · GitHub