Constexpr prevents optimization?

Dear all,

I was just playing around with a toy example when I noticed an oddity in
the code generated by clang-5.0.0 (and also in clang-5.0.1) regarding
constexpr.

Given the code:

int fib(int i) { if (i <= 0) return i; else return (fib(i - 1) + fib(i - 2)) % 100; }
int main()
{
    int ret = 0;
    for (int i = 0; i < 10; ++i)
        ret += fib(39);
    return ret;
}

Compile it with clang++ -O3 and what you get is (gdb disassembly of "main"):

7 {
8 int ret = 0;
9 for (int i = 0; i < 10; ++i)
10 ret += fib(39);
   0x00000000004004e0 <+0>: push rax
   0x00000000004004e1 <+1>: mov edi,0x27
   0x00000000004004e6 <+6>: call 0x400490 <fib(int)>

9 for (int i = 0; i < 10; ++i)
   0x00000000004004eb <+11>: add eax,eax
   0x00000000004004ed <+13>: lea eax,[rax+rax*4]

11 return ret;
   0x00000000004004f0 <+16>: pop rcx
   0x00000000004004f1 <+17>: ret

A call to fib(39) once followed by a multiplication with 10.

Now, if you make "fib" constexpr, i.e.:

constexpr int fib(int i) { if (i <= 0) return i; else return (fib(i - 1) + fib(i - 2)) % 100; }

And, again, compile it with -O3 and disassemble "main":

7 {
8 int ret = 0;
9 for (int i = 0; i < 10; ++i)
10 ret += fib(39);
   0x0000000000400490 <+0>: push rbp
   0x0000000000400491 <+1>: push rbx
   0x0000000000400492 <+2>: push rax
   0x0000000000400493 <+3>: mov edi,0x27
   0x0000000000400498 <+8>: call 0x400530 <fib(int)>
   0x000000000040049d <+13>: mov ebx,eax
   0x000000000040049f <+15>: mov edi,0x27
   0x00000000004004a4 <+20>: call 0x400530 <fib(int)>
   0x00000000004004a9 <+25>: mov ebp,eax
   0x00000000004004ab <+27>: add ebp,ebx
   0x00000000004004ad <+29>: mov edi,0x27
   0x00000000004004b2 <+34>: call 0x400530 <fib(int)>
   0x00000000004004b7 <+39>: mov ebx,eax
   0x00000000004004b9 <+41>: add ebx,ebp
   0x00000000004004bb <+43>: mov edi,0x27
   0x00000000004004c0 <+48>: call 0x400530 <fib(int)>
   0x00000000004004c5 <+53>: mov ebp,eax
   0x00000000004004c7 <+55>: add ebp,ebx
   0x00000000004004c9 <+57>: mov edi,0x27
   0x00000000004004ce <+62>: call 0x400530 <fib(int)>
   0x00000000004004d3 <+67>: mov ebx,eax
   0x00000000004004d5 <+69>: add ebx,ebp
   0x00000000004004d7 <+71>: mov edi,0x27
   0x00000000004004dc <+76>: call 0x400530 <fib(int)>
   0x00000000004004e1 <+81>: mov ebp,eax
   0x00000000004004e3 <+83>: add ebp,ebx
   0x00000000004004e5 <+85>: mov edi,0x27
   0x00000000004004ea <+90>: call 0x400530 <fib(int)>
   0x00000000004004ef <+95>: mov ebx,eax
   0x00000000004004f1 <+97>: add ebx,ebp
   0x00000000004004f3 <+99>: mov edi,0x27
   0x00000000004004f8 <+104>: call 0x400530 <fib(int)>
   0x00000000004004fd <+109>: mov ebp,eax
   0x00000000004004ff <+111>: add ebp,ebx
   0x0000000000400501 <+113>: mov edi,0x27
   0x0000000000400506 <+118>: call 0x400530 <fib(int)>
   0x000000000040050b <+123>: mov ebx,eax
   0x000000000040050d <+125>: add ebx,ebp
   0x000000000040050f <+127>: mov edi,0x27
   0x0000000000400514 <+132>: call 0x400530 <fib(int)>
   0x0000000000400519 <+137>: add eax,ebx

11 return ret;
   0x000000000040051b <+139>: add rsp,0x8
   0x000000000040051f <+143>: pop rbx
   0x0000000000400520 <+144>: pop rbp
   0x0000000000400521 <+145>: ret

That's 10 calls to function "fib" (for which the assembly is essentially
the same as in the example above).

Regardless of whether the function is evaluated at compile time or not,
it seems odd to me that using constexpr here prohibits clang from
emitting the very same code as in the non-constexpr example. Note
however, that if you declare "fib" to be "static constexpr" clang,
again, emits the multiplication code.

Is there something keeping clang from producing the multiplication code
for a non-static constexpr example that I don't see? And why is the
optimization possible again if one makes "fib" static?

Greetings,
Steffen

Dear all,

a while ago I posted odd (in the sense that I cannot explain it)
constexpr behavior to the cfe-user mailing list and never received a
reply. Since my observations are still valid for clang-6.0.0, I am
reposting my original message to cfe-dev.

tl;dr: It seems that the use of constexpr in this stupid example I ran
across back in February prohibits a certain type of optimization that
clang does. I cannot think of a reason for this behavior, therefore, I
ask you.

Greetings,
Steffen

P.S.: This also happens if one defines "fib" correctly (i <= 1). :slight_smile:

The problem is not that constexpr prevents optimizations. The problem is
that constexpr implies inline, and inline prevents optimizations. For
details, please see

https://www.playingwithpointers.com/blog/ipo-and-derefinement.html

The problem here is that we cannot deduce that 'fib' is side-effect-free,
and use that information to call it only once, if it's an inline function,
because we don't know that it was "originally" side-effect-free, and it
could be derefined to a version with side-effects in a way that makes the
transformation to call it only once be somehow non-conforming. However, if
'fib' is not inline, or if it's file-static, then we can transfer that
information from 'fib' to its caller, because we know the version of 'fib'
we can see is the same one that's actually going to be used at runtime.

Thanks for the explanation and the link! This is very interesting.

Greetings,
Steffen