Hi,
I've stumbled upon a case where I think gvn does a bad (wrong) optimization. It's a bit messy to debug though so I'm not sure if I should just write a PR about it a let someone who knows the code look at it instead.
Anyway, for the bug to trigger I need to run the following passes in the same opt invocation:
-sroa -instcombine -simplifycfg -instcombine -gvn
The problem seems to be that GVN::PerformLoadPRE triggers and I see a
GVN REMOVING PRE LOAD: %_tmp79 = load i16, i16* %_tmp78, align 2
printout.
If I instead first run
-sroa -instcombine -simplifycfg -instcombine
and then
-gvn
I don't get the
GVN REMOVING PRE LOAD
printout, and the resulting code looks ok to me.
Is this expected? Should the output differ in these two cases?
The input to gvn looks the same when running all passes and just gvn, but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI) behaves:
I get different results from
// Step 1: Find the non-local dependencies of the load.
LoadDepVect Deps;
MD->getNonLocalPointerDependency(LI, Deps);
So we get different results from MemoryDependenceResults when invoking gvn alone or after a bunch of other passes.
Is this expected or not?
I tried to dig into MemoryDependenceResults::getNonLocalPointerDependency but I have to say I'm quite lost.
I ran some git bisect and as far as I can tell it starts going wrong with commit
[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
2016-09-01, but that commit doesn't change GVN/MemoryDependenceResults so I suppose it just changes the input to GVN so the problem suddenly surfaces.
Finally, the problem that I see is:
In the input we have something like
for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
{
lb[step1] = step1 + 7;
ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
switch(lb[step1]) {
case 7: CVAL_VERIFY(step1 == 0);
CVAL_VERIFY(ub[step1] == 10);
__label(511);
break;
case 8: CVAL_VERIFY(step1 == 1);
CVAL_VERIFY(ub[step1] == 100);
__label(512);
break;
case 9: CVAL_VERIFY(step1 == 2);
CVAL_VERIFY(ub[step1] == 1000);
__label(513);
break;
default:;
}
[...]
}
int i, j;
for ( j = 10, i = 0; j < 101; j=j*10, i++ )
{
CVAL_VERIFY(ub[i] == j);
}
So we have two loops. In the first one the array ub is written (low index first, high index last), and then in the second loop the content of ub is checked (low index first, high index last).
After gvn this is changed into something like
int tmp;
for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
{
lb[step1] = step1 + 7;
ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
switch(lb[step1]) {
case 7: CVAL_VERIFY(step1 == 0);
tmp = ub[step1];
CVAL_VERIFY(tmp == 10);
__label(511);
break;
case 8: CVAL_VERIFY(step1 == 1);
tmp = ub[step1];
CVAL_VERIFY(tmp == 100);
__label(512);
break;
case 9: CVAL_VERIFY(step1 == 2);
tmp = ub[step1];
CVAL_VERIFY(tmp == 1000);
__label(513);
break;
default:;
}
[...]
}
int i, j;
for ( j = 10, i = 0; j < 101; j=j*10, i++ )
{
CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
}
Note the introduced variable tmp.
Also note that with this rewrite, after the first loop tmp will contain the value of the element at the highest index in ub, and then we use tmp in the first round of the second loop, but there we expect the value of the lowest index element in ub.
Any ideas? Should I file a PR?
Thanks,
Mikael
tr12053_nophx.ll (5.24 KB)