Missing InstCombine optimization.

Bader_Aleksey_A · June 4, 2013, 1:27pm

Hi Jakob,

I’ve a problem related to the commit #155362.

Consider the following snippet:

void bar(float* f) {

…

}

void foo(float* f, int idx) {

int hi = idx>>3;

int lo = idx&7;

bar(&f[hi8+lo]); // hi8 + lo == idx

bar(&f[hi*10+lo]);

}

Before 155362 revision InstCombine was able to optimize hi*8+lo to idx by applying following patterns:

hi*8 → hi << 3
((idx >> 3) << 3) → idx & -8
hi8+lo → hi8 | lo
(idx & -8) | (idx & 7) → idx & (-8 | 7) → idx

After 155362 pattern #2 is deferred to DAGCombine stage, so InstCombine is unable to apply pattern #4:

4*. ((idx >> 3) << 3) | (idx & 7) → idx // SimplifyOr can’t handle it.

So now LLVM IR contains a couple of redundant operations:

%mul312 = shl nsw i32 %shr, 3 ; hi*8

%add313 = or i32 %mul312, %and ; hi*8 + lo == idx

These few additional operations over index prevent our analysis pass from recognizing memory access patterns and I believe could harm not only us.

I think 4* optimization can be safely done at LLVM IR level.

Can you suggest the best way to fix this issue? I suppose adding new pattern just for particular 4* case is not the best way.

Thanks,

Aleksey

Rafael_Avila_de_Espi · June 4, 2013, 7:57pm

1. hi*8 -> hi << 3

2. ((idx >> 3) << 3) -> idx & -8

3. hi*8+lo -> hi*8 | lo

4. (idx & -8) | (idx & 7) -> idx & (-8 | 7) -> idx

After 155362 pattern #2 is deferred to DAGCombine stage, so InstCombine is
unable to apply pattern #4:

        4*. ((idx >> 3) << 3) | (idx & 7) -> idx // SimplifyOr can’t handle
it.

So now LLVM IR contains a couple of redundant operations:

  %mul312 = shl nsw i32 %shr, 3 ; hi*8

  %add313 = or i32 %mul312, %and ; hi*8 + lo == idx

These few additional operations over index prevent our analysis pass from
recognizing memory access patterns and I believe could harm not only us.

I think 4* optimization can be safely done at LLVM IR level.

Can you suggest the best way to fix this issue? I suppose adding new pattern
just for particular 4* case is not the best way.

We used handle arbitrary ((idx >> A) << B), right? Maybe it is enough
to handle ((idx >> A) << A) in instcombine?

Cheers,
Rafael

Bader_Aleksey_A · June 4, 2013, 8:39pm

Hi Rafael,

We used handle arbitrary ((idx >> A) << B), right?

We used to handle three patterns, which were deferred:
(from Jakob's log message)
  These transformations are deferred:

    (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) <--- my case
    (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
    (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)

So ((idx >> A) << A) (when idx >> A has multiple uses) was intentionally deferred.

Maybe it is enough to handle ((idx >> A) << A) in instcombine?

It would be enough for my particular case, but apparently is not be enough for some other cases.
I don't quite understand the motivation behind the original commit, that is why I need your assistance on fixing this issue.
It seems like applying "((X >> C) << C) -> X & -C" rule in instcombine is not efficient in some other cases.
Does anyone know such cases?
Otherwise the fix is very simple - just add back a few deleted lines of code.

Thanks,
Aleksey

Jakob_Stoklund_Olese · June 4, 2013, 10:38pm

Actually, your own code illustrates the problem with this transformation. Suppose you were using SCEV to analyze the behavior of the to f memory references:

f[hi*8 + lo]
f[hi*10 + lo]

If you rewrite 'hi*8 + lo' to ‘idx’, the affine relationship between the two memory references is no longer visible, and SCEV will probably tell you that the offsets are unrelated.

The fundamental problem is that there is no such thing as a canonical form of an expression DAG. Some expression graphs, like this one, can take different forms that each enable different analyses. Which one is correct?

I think that the right approach in this case is to preserve the relationships that were already exposed in the original code. That is why pattern 4* is disabled when the inner expression has multiple uses. It preserves the relationships that are expressed through the ‘hi’ variable.

Thanks,
/jakob

Topic		Replies	Views
optimizer problem, possibly involving instcombine LLVM Dev List Archives	1	78	November 16, 2012
InstCombine wrongful (?) optimization on BinOp with SameOperands LLVM Dev List Archives	2	78	October 26, 2015
Issue with instcombine pass inserting llvm.assume IR & Optimizations	2	215	January 17, 2023
InstCombine adds bit masks, confuses self, others LLVM Dev List Archives	10	73	April 17, 2012
How to add optimizations to InstCombine correctly? LLVM Dev List Archives	20	250	September 22, 2017

Missing InstCombine optimization.

Related topics