Why does Machine LICM pass only examine outermost loops before RA?

Hello,

The Machine LICM pass has this comment:

// If this is done before regalloc, only visit outer-most preheader-sporting

// loops.

I was wondering why there is such a restriction? This prevents hoisting from the deeply-nested inner loops into the outer loop body, when the hoisted instruction is loop invariant everywhere except the outermost loop. I’ve encountered a problem that would be solved by such hoisting, however I couldn’t figure out the rationale behind LLVM’s MLICM decision. This change appears to be introduced in 2009 (https://github.com/llvm/llvm-project/commit/79618d1de89e76b1a23a02e8146057a6a21260db) but the only reason stated in the commit message seems to be just simplification/performance related.

Has anyone tried enabling MLICM for nested loops? Is it a good idea? What are the possible issues?

Thanks

Hello,

The Machine LICM pass has this comment:
// If this is done before regalloc, only visit outer-most preheader-sporting
// loops.

I was wondering why there is such a restriction? This prevents hoisting from the deeply-nested inner loops into the outer loop body, when the hoisted instruction is loop invariant everywhere except the outermost loop. I’ve encountered a problem that would be solved by such hoisting, however I couldn’t figure out the rationale behind LLVM’s MLICM decision. This change appears to be introduced in 2009 (https://github.com/llvm/llvm-project/commit/79618d1de89e76b1a23a02e8146057a6a21260db) but the only reason stated in the commit message seems to be just simplification/performance related.

Has anyone tried enabling MLICM for nested loops? Is it a good idea? What are the possible issues?

Hoisting from inner loops into outer loops certainly seems desirable in general. Aside from the compile-time downsides that the commit that you referenced was intended to avoid, one issue is that MLICM's hosting capability are currently more powerful than our rematerialization capability (as we can hoist multiple instructions but only remateralize single instructions), and that could cause some issues (e.g., the hoisting can increase register pressure that the regalloc framework can't undo).

-Hal

Thanks