loop invariant code hoisting

sjoerdmeijer · December 7, 2020, 1:25pm

Hello,

I would like to restrict LICM in the number of memory operations it hoists out of a loop. Blindly hoisting out all instructions out could result in a lot of spilling/reloading, which is what I’d like to avoid. We see this for example on (inner)loops with small constant bounds that are unrolled. I am drafting something in https://reviews.llvm.org/D92488, which contains a reduced version of my motivating example.
It was brought to my attention that LICM might not have any restrictions by design because hoisting out all instructions could be some sort of canonical form that other passes depend on, so sinking back defs closer to their user(s) might be a back-end problem. I was wondering if there are any opinions on this.

Cheers,
Sjoerd.

bmahjour · December 7, 2020, 3:38pm

Hi,

I think the issue is not specific to LICM and can be observed with other transforms like the SCEV expander. This was brought up in one of the Loop Optimization WG calls before http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html as well. The general feedback at the time was that it would be better to deal with the issue in the back-end perhaps through rematerialization of the hoisted instructions in RA using LiveRangeEdit.

I don’t have much in-depth knowledge in this area, but the blocker issue at that time appeared to be that the register allocator could only rematerialize single instructions and we needed groups of instructions to be moved/copied. Does anyone know if any progress is under way in that area?

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

Sjoerd Meijer via llvm-dev —2020/12/07 08:25:47 AM—Hello, I would like to restrict LICM in the number of memory operations it hoists out of a loop. Bli

fhahn · December 7, 2020, 3:52pm

Hi,

Hello,

I would like to restrict LICM in the number of memory operations it hoists out of a loop. Blindly hoisting out all instructions out could result in a lot of spilling/reloading, which is what I’d like to avoid. We see this for example on (inner)loops with small constant bounds that are unrolled. I am drafting something in https://reviews.llvm.org/D92488, which contains a reduced version of my motivating example.
It was brought to my attention that LICM might not have any restrictions by design because hoisting out all instructions could be some sort of canonical form that other passes depend on, so sinking back defs closer to their user(s) might be a back-end problem. I was wondering if there are any opinions on this.

I think one key benefit of hoisting as much as possible is that it enables/simplifies subsequent optimizations in the middle-end (especially for memory operations). Limiting hoisting is likely to have a bad knock-on effect on other transformations.

Accurately estimating the number of spills in LICM is probably going to be tricky and/or brittle. Another thing to consider is that there are plenty of other transformations that extend live-ranges of values, so they would also need to also need updating.

Sinking in the backend would catch those cases naturally, while making our lives easier in the middle-end. We already have plenty of infrastructure to reason about register pressure & co in CodeGen. As Bardia mentioned, currently there is limited support for re-materialzing instructions.

Besides that, I don’t think there are dedicated passes to move defs to uses to reduce register pressure, beside the MachineScheduler. But the MachineScheduler currently only operates on sub-regions of basic blocks.

I think there would be potential for having a dedicated pass to move defs across basic blocks to reduce register pressure. I think we have most building blocks (RegisterPressureTracker, MachineTraceMetrics for selecting a likely trace through the function). But reasoning about moving memory operations is probably going to be more difficult in the backend than in the middle-end though unfortunately.

Cheers,
Florian

sjoerdmeijer · December 8, 2020, 12:00pm

Many thanks for the feedback! I will investigate machineLICM first.
FYI, I have documented that LICM is also a canonicalization transform in https://reviews.llvm.org/rG1e260f955d31.

Cheers,
Sjoerd

preames · December 8, 2020, 11:36pm

I'd explored the notion of doing a register pressure aware global code motion a while back. There's a bunch of hard cases, but there's also a bunch of obvious profitable stuff too. I think this is entirely realistic if someone wanted to devote to time.

I doubt this would solve the triggering case for this thread though. Despite the framing as being about LICM, the transform actually involved was store promotion. I suspect the issue is that we simply haven't considered undoing store promotion in MachineLICM. (I haven't looked, so this may be a false statement.)

Philip

jdoerfert · December 8, 2020, 11:47pm

Hi,

Hello,

I would like to restrict LICM in the number of memory operations it hoists out of a loop. Blindly hoisting out all instructions out could result in a lot of spilling/reloading, which is what I'd like to avoid. We see this for example on (inner)loops with small constant bounds that are unrolled. I am drafting something in ⚙ D92488 [LICM] Add a maximum for the number of instructions to be hoisted, which contains a reduced version of my motivating example.
It was brought to my attention that LICM might not have any restrictions by design because hoisting out all instructions could be some sort of canonical form that other passes depend on, so sinking back defs closer to their user(s) might be a back-end problem. I was wondering if there are any opinions on this.

I think one key benefit of hoisting as much as possible is that it enables/simplifies subsequent optimizations in the middle-end (especially for memory operations). Limiting hoisting is likely to have a bad knock-on effect on other transformations.

Accurately estimating the number of spills in LICM is probably going to be tricky and/or brittle. Another thing to consider is that there are plenty of other transformations that extend live-ranges of values, so they would also need to also need updating.

Sinking in the backend would catch those cases naturally, while making our lives easier in the middle-end. We already have plenty of infrastructure to reason about register pressure & co in CodeGen. As Bardia mentioned, currently there is limited support for re-materialzing instructions.

Besides that, I don’t think there are dedicated passes to move defs to uses to reduce register pressure, beside the MachineScheduler. But the MachineScheduler currently only operates on sub-regions of basic blocks.

I think there would be potential for having a dedicated pass to move defs across basic blocks to reduce register pressure. I think we have most building blocks (RegisterPressureTracker, MachineTraceMetrics for selecting a likely trace through the function). But reasoning about moving memory operations is probably going to be more difficult in the backend than in the middle-end though unfortunately.

+1

Assuming the transformation is not irreversible, I would continue to perform normalization steps like LICM and opt (pun intended) for a dedicated, target-specific pass for register pressure minimization.
Years ago I tried to estimate register pressure/usage in the vectorizer, in addition to the heuristic the vectorizer uses, and it turned out that you might get close but won't get it right at that level anyway... at least that is my experience.

~ Johannes

P.S.
I know I'm late and people seem to have already converged on this conclusion anyway.

chandlerc · December 9, 2020, 12:16am

Note that we have exactly the “focus on easy cases” pass you’re all discussion: LoopSink. It is scheduled late in the optimization phase after canonicalization is complete. Probably can use some extra work though? It was definitely targeting exactly the kinds of cases raised here.

Michael_Kruse2 · December 9, 2020, 2:31am

To add to the passes that increase register pressure in loops: GVN
partial redundancy elimination (including LoadPRE),
LoopLoadElimination.

The current pass design seems to rely on rematerialization/reload
reducing register pressure to happen at a later stage, such as
register allocation and LoopSinkPass. The latter has the following
code in it:

// Enable LoopSink only when runtime profile is available.
// With static profile, the sinking decision may be sub-optimal.
if (!Preheader->getParent()->hasProfileData())
continue;

I.e. it is not a general purpose pass to decrease register pressure in loops.

Michael

Topic		Replies	Views
LICM as canonical form LLVM Dev List Archives	6	213	December 6, 2021
A question about LICM (Loop Invariant Code Motion) LLVM Dev List Archives	4	102	February 10, 2011
Speculative LICM? MLIR	17	470	December 27, 2024
question about licm LLVM Dev List Archives	5	76	February 11, 2015
LICM LLVM Dev List Archives	3	68	October 27, 2009

loop invariant code hoisting

Related topics