IndVarSimplify too aggressive ?

Hi all,

The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types.

I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users.

Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases.

The patch could probably be made smarter : I am welcoming all suggestions.

Best Regards,

IndVarSimplify-nativeType.patch (2.04 KB)

test.c (256 Bytes)

test.s.patch.arm (898 Bytes)

test.s.patch.x86_32 (2.03 KB)

test.s.wo_patch.arm (1.08 KB)

test.s.wo_patch.x86_32 (2.25 KB)

It's worth pointing out that LoopStrengthReduce is doing essentially
the same transformation. The only reason the generated code is
improved at all with your change is that ISel has a longstanding issue
where it can't conclude that the upper half of zext i32 %x to i64 is
zero if the zext is in a different block from the user of the zext.

-Eli

Arnaud,

I also noticed that IndVarSimplify increases variable size, and in some cases pessimize the program. I just wanted to add that I have seen cases where i64 types were converted to i65 types, for which there is no native support. In the case of i65 multiplication, for some platforms there is not even a library call to perform a 128bit multiplication. So, I welcome your change and I will test your patch locally.

Nadav

Thanks Eli,

After digging thru mail archives & bugzilla, it seems fixing properly this issue would require a major change in the selectionDAG code --- to have it operate on a per function basis instead of per basic-block.

This however, does not seem to be the only issue. The following C code does not produce an efficicient assembly sequence either.

extern void f(unsigned long long v);

void test2()
{
  for (unsigned i=0; i<512; i++)
    f(i);
}

The resulting .ll out of clang looks reasonnable (with and without the patch), but the arm assembly output looks ugly, though marginally better with my patch : the induction variable should be counting up, and it could be zero extended before the call to f. This again points to Isel, but to a different area, as everything is taking place in the same BB.

Is this some known issue ? I could not find a bug report matching this.

test2.s.wo_patch.arm (447 Bytes)

test2.c (98 Bytes)

test2.ll.w_patch.arm (744 Bytes)

test2.ll.wo_patch.arm (676 Bytes)

test2.s.w_patch.arm (440 Bytes)

Andy is working on gutting indvarsimplify.

Evan

Arnaud,

I've been investigating whether it's safe to apply your patch. I still need to understand why our generated code is slower in some cases. I noticed a particularly bad regression in
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow that I documented here:
http://llvm.org/bugs/show_bug.cgi?id=9490

We would like to avoid generating canonical induction variables in IndVarSimplify. Once that work is complete, your patch should no longer be needed. Although in the meantime, it would be nice to understand why promoting IVs to wider types is sometimes required for codegen.

-Andy

Hi Andy,

Thanks for looking into this.

I have tried today to make a reduced testcase from the value function, but as I do not have any arm hardware available to measure the real cycle count, it can be quite errorprone, especially with all those loops. Maybe I should give a try at qemu.

Best regards,

Hi Arnaud,

This should be fixed in r127884. In some cases, your patch could result in multiple IVs for the same recurrence, and LSR was not able to cleanup afterward. Dan Gohman proposed an alternative, which seems to work great. See http://llvm.org/bugs/show_bug.cgi?id=9490.

-Andy

Thanks Andy & Dan for looking into this.

Tested it on my own backends, and it works great !

Best regards,