Loop elimination with floating point counter.

Hi LLVM-ers,

I'd like to eliminate dead loop with floating point counter using
LLVM, but the following loop wasn't optimized by opt.

void
func() {
    float i;
    for (i = 0.0f; i < 1000.0f; i += 1.2f) {
    }
}

$ clang -emit-llvm-bc floop.c
$ opt -std-compile-opts floop.bc | llvm-dis

define void @func(...) nounwind {
entry:
  br label %forinc

forinc: ; preds = %forinc, %entry
  %i.0.reg2mem.0 = phi float [ 0.000000e+00, %entry ], [ %add, %forinc
] ; <float> [#uses=1]
  %add = add float %i.0.reg2mem.0, 0x3FF3333340000000 ; <float> [#uses=2]
  %cmp = fcmp olt float %add, 1.000000e+03 ; <i1> [#uses=1]
  br i1 %cmp, label %forinc, label %afterfor

afterfor: ; preds = %forinc
  ret void
}

What I expected is just one instruction "ret void" in function "func".

Should I specify some specific optimizer pass for opt?
If so, what is the right optimization pass to specify for opt to
remove dead loop with floating point counter?

I've tested some loop optimization pass, e.g. -licm, -loop-deletion,
but nothing takes effect.

FYI, gcc -O3 also can't optimize such a loop, but icc -O3 can.

LLVM and clang used are recent version from svn.

Thanks in advance.

FWIW, LLVM optimizer can eliminate this loop if i is incremented by 1.0f

Pl. file a bugzilla report.
Thanks,

Hi Devang,

Thanks. Yes, in the case variable i incremented by 1.0f is optimized.
I don't know why...
Anyway, I've filed this problem into bugzilla(Bug 3299)

It's because with 1.0f, the loop index is simplified into an integer. With 1.2f, it isn't. The loop deletion pass is dependent on the loop analyses being able to determine that the loop is finite, which they don't attempt to do for loops with floating point indices. Attempting to do so would require additional reasoning about floating point precision issues.

--Owen

Isn't "simplifying" the loop index to an integer also dependent on precision issues? The following loop is infinite:

for (float i=0.0; i<...; i+=1.0) {}

where ... is a large "integral" float literal, e.g. 2^40. At some point, adding 1.0 to the loop index would not cause any changes because of precision issues. However, if the float type is replaced by some sufficiently large integer type, the loop terminates.

The necessary reasoning may be simpler though.

Greetings,
Martin Geisse

Martin,

Isn't "simplifying" the loop index to an integer also dependent on
precision issues? The following loop is infinite:

for (float i=0.0; i<...; i+=1.0) {}

where ... is a large "integral" float literal, e.g. 2^40.

it is. But luckily at least llvm-gcc appears to be smart enough to check against the end
value. I've just tested the following snippet (compiled using llvm-gcc -c -O3 --emit-llvm)

void foo() {
     float f;
     for( f = 0.0; f < 100.0f; f += 1.0f ) {
     }
}
=>
define void @foo() nounwind readnone {
entry:
  ret void
}

void foo() {
     float f;
     for( f = 0.0; f < 101.0f; f += 1.0f ) {
     }
}
=>
define void @foo() nounwind readnone {
entry:
  br label %bb

bb: ; preds = %bb, %entry
  %f.0.reg2mem.0 = phi float [ 0.000000e+00, %entry ], [ %0, %bb ] ; <float> [#uses=1]
  %0 = add float %f.0.reg2mem.0, 1.000000e+00 ; <float> [#uses=2]
  %1 = fcmp olt float %0, 1.010000e+02 ; <i1> [#uses=1]
  br i1 %1, label %bb, label %return

return: ; preds = %bb
  ret void
}

So everything is fine here

Jan

I assume it checks that the end condition and the increment can both be represented precisely with integer types.

--Owen

FWIW, I believe icc -O3 turns on the equivalent of -ffast-math by default.
I could be misremembering which compilers do this though :slight_smile:
This flag allows you to make all kinds of nice simplfiying assumptions
about floating point.

Thanks for many comments.

The loop with finite fp values(which could be representable in IEEE754
fp format) such like,

void
foo() {
    float i;

    for (i = 0.0f; i < 1000.0f; i += 1.2f) {
    }
}

could reach the end condition under any fp rounding mode,
and eliminating the loop has no side effects.
(for example, floating point control register does not change because
the increment does not cause the overflow)

Thus I believe the loop as shown above could be removed safely.

FYI, gcc -O3 turns fp loop counter into integer, but gcc does not
optimize further(eliminate the loop).

$ gcc -O3 floop.c

_foo:
  pushl %ebp
  movl $834, %eax
  movl %esp, %ebp
  .align 4,0x90
L2:
  decl %eax
  jne L2

Sure, LLVM could definitely do this. If you're interested, I'd suggest starting by extending the existing code that we have to do this. The existing code just handles increments by unit constants, so it doesn't trigger with 1.2.

-Chris

Yes, I’d like to commit the patches as much as possible.
Would you tell me the source code location where I have to investigate?