Loop Opt WG Meeting Minutes for Sep 11, 2019

Hi,


Wed, Sep 11, 2019:

  • LICM vs Loop Sink Strategy (Whitney)
  • LICM and SCEV expander host code with no regards to increased
    live-ranges. This is a long standing issue where historically
    preference has been to keep LICM more aggressive.

This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently. https://reviews.llvm.org/D64557

Thanks Florian.

Tim you said:

Some cases can be undone by rematerialization, but not all, and it can involve a lot of effort which increases compile time.

Do you have examples of cases where rematerialization is not possible? We are interested in learning about any previous attempts at trying to address the issue in RA. Have you tried it?

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab
bmahjour@ca.ibm.com (905) 413-2336

graycol.gif

Sorry for reviving this old thread.
Is this the case that you are talking about?
void use(int *);
void f(int *p) {
  for (int i = 0; i < 1000; ++i) {
    use(p);
    use(p + 1);
    use(p + 2);
    use(p + 3);
  }
}

LICM hoists all the (p + N) computations out of the loop, and there is
nothing that could sink them back.
entry:
  %add.ptr = getelementptr inbounds i32, i32* %p, i64 1
  %add.ptr1 = getelementptr inbounds i32, i32* %p, i64 2
  %add.ptr2 = getelementptr inbounds i32, i32* %p, i64 3
...
for.body:
...
  tail call void @_Z3usePi(i32* %p)
  tail call void @_Z3usePi(i32* nonnull %add.ptr)
  tail call void @_Z3usePi(i32* nonnull %add.ptr1)
  tail call void @_Z3usePi(i32* nonnull %add.ptr2)

With more calls to use(), these common expressions will be
pre-computed, spilled and then reloaded inside the loop. Each
individual instruction is not profitable to sink or rematerialize in
the loop, because that would simply reduce the liverange of (p+N) at
the cost of extending the liverange of (p).

I see this problem in ARM MTE stack instrumentation. We use a virtual
frame pointer there which makes all local variable access look like
(p+N) in the above example.

Hi Evgenii,

The specific issue that we ran into turned out to be related to expansion of a remainder instruction which caused it to not be considered by RA rematerialization. However the example you provided falls into the general category of problem with LICM and live range extension, which is where we started from. I don’t know the details but looks like when determining the cost of a sink or rematerialization we need to take a more holistic view than doing it on an instruction by instruction bases. Is that possible?

Adding Hussain to the discussion as well.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

graycol.gifEvgenii Stepanov —2020/01/07 02:15:52 PM—Sorry for reviving this old thread. Is this the case that you are talking about?

Hi Evgenii,

As Bardia mentioned, I had started work in the direction of extending LiveRangeEdit to allow grouped remat decisions, in response to an srem/urem remat issue which turned out to be more easily solved by modifying the SDAG combiner.

This is now done for the most part, so I am going back to complete the LRE work, going off of the discussion here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/107718.html and followed up here: http://llvm.1065342.n5.nabble.com/llvm-dev-Register-Rematerialization-td119906.html.

The goal is to allow LRE to make decisions that (i) take into account the live ranges of multiple independent values in the same basic block, or (ii) remat a group of dependent instructions in one basic block.

The former should solve the case you are dealing with, and I would be happy to work with you to test the solution with ARM MTE once I’m done implementing it.

Cheers,
Hussain