missed optimizations

Hi,

I have two questions about optimizations performed by llvm.

Consider these simple functions:
int x(int b) { return b?4:6; }
int y() { return x(0); }

int x2() { return 5; }
int y2() { return x2(); }

the optimized bitcode (with clang + opt -std-compiler-opts) is:
define i32 @y(...) nounwind {
entry:
ret i32 6
}

define i32 @y2(...) nounwind {
entry:
%call = call i32 (...)* @x2( ) ; <i32> [#uses=0]
ret i32 5
}

So why does LLVM optimizes the more difficult case, but leaves behind the function call in the easiest case? :slight_smile:

Second question:
int f();

int g() {
if (f() == 5) return 3;
return 4;
}

int h() {
if (g() == 6) return 2;
return 1;
}

gives the following optimized bc:

define i32 @g(...) nounwind {
entry:
%call = call i32 (...)* @f( ) ; <i32> [#uses=1]
%cmp = icmp eq i32 %call, 5 ; <i1> [#uses=1]
%retval = select i1 %cmp, i32 3, i32 4 ; <i32> [#uses=1]
ret i32 %retval
}

define i32 @h(...) nounwind {
entry:
%call = call i32 (...)* @g( ) ; <i32> [#uses=1]
%cmp = icmp eq i32 %call, 6 ; <i1> [#uses=1]
%retval = select i1 %cmp, i32 2, i32 1 ; <i32> [#uses=1]
ret i32 %retval
}

In function h(), llvm doesn't realize that g() never returns 6, and thus it could reduce that function to { g(); return 1; }. Doesn't llvm produce some sort of summary for functions with e.g. their possible return values? If not, is there any pass where I could implement this? (I have some code that has many opportunities for this kind of optimization)
Also, shouldn't the function g() get inlined in the h() function? It is inlined only if I change the g() function to be static. So isn't llvm being too conservative when inlining global functions?

Thanks,
Nuno

Nuno Lopes a écrit :

Hi,

I have two questions about optimizations performed by llvm.

Consider these simple functions:
int x(int b) { return b?4:6; }
int y() { return x(0); }

int x2() { return 5; }
int y2() { return x2(); }

the optimized bitcode (with clang + opt -std-compiler-opts) is:
define i32 @y(...) nounwind {
entry:
ret i32 6
}

define i32 @y2(...) nounwind {
entry:
%call = call i32 (...)* @x2( ) ; <i32> [#uses=0]
ret i32 5
}

So why does LLVM optimizes the more difficult case, but leaves behind the function call in the easiest case? :slight_smile:

I don't know why vararg would inhibit optimisation but maybe it is the problem.
You should try with:

int x2(void) { return 5; }
int y2(void) { return x2(); }

it may solve the problem (or use C++ as langage)

Cédric

The LLVM browser test optimizes both examples as expected... What
optimization options are different? Although that optimization could
be gcc-llvm level and not actual llvm level...

Hi,

I have two questions about optimizations performed by llvm.

Consider these simple functions:
int x(int b) { return b?4:6; }
int y() { return x(0); }

int x2() { return 5; }
int y2() { return x2(); }

the optimized bitcode (with clang + opt -std-compiler-opts) is:
define i32 @y(...) nounwind {
entry:
ret i32 6
}

define i32 @y2(...) nounwind {
entry:
%call = call i32 (...)* @x2( ) ; <i32> [#uses=0]
ret i32 5
}

So why does LLVM optimizes the more difficult case, but leaves behind the
function call in the easiest case? :slight_smile:

Pretty simple: LLVM doesn't know how to eliminate the call.

There are a couple of ways it could be eliminated: DCE or inlining.
The current DCE infrastructure isn't clever enough to do
interprocedural analysis, so that doesn't eliminate it. And the
inliner immediately gives up on varargs functions, so that doesn't
eliminate it. And

That said, clang really should be turning int x2() { return x(0); }
into "define i32 @x2()" rather than "define i32 @x2(...)"; the
function isn't varargs, and marking it as such could lead to wrong
code for exotic calling conventions.

Second question:
int f();

int g() {
if (f() == 5) return 3;
return 4;
}

int h() {
if (g() == 6) return 2;
return 1;
}

gives the following optimized bc:

define i32 @g(...) nounwind {
entry:
%call = call i32 (...)* @f( ) ; <i32> [#uses=1]
%cmp = icmp eq i32 %call, 5 ; <i1> [#uses=1]
%retval = select i1 %cmp, i32 3, i32 4 ; <i32> [#uses=1]
ret i32 %retval
}

define i32 @h(...) nounwind {
entry:
%call = call i32 (...)* @g( ) ; <i32> [#uses=1]
%cmp = icmp eq i32 %call, 6 ; <i1> [#uses=1]
%retval = select i1 %cmp, i32 2, i32 1 ; <i32> [#uses=1]
ret i32 %retval
}

In function h(), llvm doesn't realize that g() never returns 6, and thus it
could reduce that function to { g(); return 1; }. Doesn't llvm produce some
sort of summary for functions with e.g. their possible return values? If
not, is there any pass where I could implement this? (I have some code that
has many opportunities for this kind of optimization)

No, there isn't any such infrastructure at the moment. The LLVM pass
that might do that sort of thing is predsimplify, but it isn't an
interprocedural pass. It's not necessarily difficult to implement,
depending on how precise you want it to be, but nobody's implemented
it; as far as I know, it doesn't provide significant benefits for
normal C/C++ code. If there's some specific pattern in your code, you
could probably throw together a specialized pass pretty quickly.

Also, shouldn't the function g() get inlined in the h() function? It is
inlined only if I change the g() function to be static. So isn't llvm being
too conservative when inlining global functions?

The inliner doesn't like varargs; see above.

-Eli

Hi Eli,

That said, clang really should be turning int x2() { return x(0); }
into "define i32 @x2()" rather than "define i32 @x2(...)"; the
function isn't varargs, and marking it as such could lead to wrong
code for exotic calling conventions.

I always understood that this is correct per C language specification. For
functions that are internal (static), the DeadArgElim pass removes the vararg
spec if it's unused, but that's not applicable in this case.

Any particular reason the inliner doesn't touch varargs? I think that when you
call a varargs function without any varargs, it should be able to inline it
just fine? Or at least when the function in question also doesn't call the
va_* functions.

> Also, shouldn't the function g() get inlined in the h() function? It is
> inlined only if I change the g() function to be static. So isn't llvm being
> too conservative when inlining global functions?

The inliner doesn't like varargs; see above.

And the fact that it is inlined when it's static, is probably due to the
DeadArgElim I mentioned above.

Gr.

Matthijs

Hi Eli,

That said, clang really should be turning int x2() { return x(0); }
into "define i32 @x2()" rather than "define i32 @x2(...)"; the
function isn't varargs, and marking it as such could lead to wrong
code for exotic calling conventions.

I always understood that this is correct per C language specification.

What's the question here? The C standard says that int(*)() is not a
varargs function, but rather a function without an explicitly declared
argument list, and is actually incompatible with any varargs function.

As far as I can tell, though, the LLVM varargs specifier is more
forgiving; llvm-gcc seems to be willing to call int(*)() using an llvm
varargs type.

Any particular reason the inliner doesn't touch varargs? I think that when you
call a varargs function without any varargs, it should be able to inline it
just fine? Or at least when the function in question also doesn't call the
va_* functions.

I think it's just unimplemented.

-Eli

Hi,

As a follow up of this thread I've made a patch that implements a simple approach to propagate the function return values as described previously.

It can transform e.g. the following program:

define i32 @f(...) nounwind {
(...)
%cond = select i1 %tobool, i32 2, i32 3 ; <i32> [#uses=1]
ret i32 %cond
}

define i32 @g(...) nounwind {
entry:
%call = call i32 (...)* @f() ; <i32> [#uses=1]
switch i32 %call, label %sw.default [
  i32 5, label %sw.bb
  i32 2, label %sw.bb1
  i32 3, label %sw.bb2
]

sw.bb: ; preds = %entry
ret i32 -1

sw.bb1: ; preds = %entry
ret i32 3

sw.bb2: ; preds = %entry
ret i32 55

sw.default: ; preds = %entry
ret i32 0
}

into:

define i32 @g(...) nounwind {
entry:
%call = call i32 (...)* @f() ; <i32> [#uses=1]
%cond = icmp eq i32 %call, 2 ; <i1> [#uses=1]
%retval = select i1 %cond, i32 3, i32 55 ; <i32> [#uses=1]
ret i32 %retval
}

This kind of transformation isn't currently done by LLVM (note that here this pass is only removing case statements. other transformations are not of my responsibility :).
The patch is available at http://web.ist.utl.pt/nuno.lopes/llvm_function_ret_infer.txt
I would love to ear some feedback, so that this pass can eventually be merged to the LLVM tree. In particular I'm not sure if there's a better way to track the possible return values of each function (e.g. is there any code that I can reuse?)

Thanks,
Nuno

AnalyzeFunction can use BB->getTerminator() rather than iterating over
all instructions to find the terminator.

After you Analyze all the functions, why not just iterator on the uses
of the functions you found you could transform rather than on ever
instruction in the module?

It seems you should be able to handle any comparison with a constant
by using the min or max of your set as appropriate to the cmp.

Andrew

Hi,

Thank you for your review. I've updated the patch to reflect your suggestions.
I've also completed the ICmp instruction folding.

Regards,
Nuno