How to configure clang, to get const functions out of the loop (like on FreeBSD) ?

I have a pretty severe performance problem.

I would have thought that clang would hoist my __attribute__((const) functions out of a loop, if the input to that function is constant too. But out of the box it rarely seems to do it.

Here is a simple example:

extern __attribute__(( const))  int  foo( int i);

extern int  bar( int i);

int  foobar( void)
{
     int   i;
     int   x;

     x = 0;
     for( i = 0; i < 100; i++)
     {
        x += foo( 0x2373);
        x  = bar( x);
     }
     return( x == 1848);
}

When I put this into https://godbolt.org/ for x86-64 trunk with options `-O3 -S -emit-llvm` I get:

; ...
; <label>:1: ; preds = %1, %0
   %2 = phi i32 [ 0, %0 ], [ %6, %1 ]
   %3 = phi i32 [ 0, %0 ], [ %7, %1 ]
   tail call void @llvm.dbg.value(metadata i32 %3, metadata !12, metadata !14), !dbg !16
   tail call void @llvm.dbg.value(metadata i32 %2, metadata !13, metadata !14), !dbg !15

   %4 = tail call i32   @foo(int)(i32 9075)   #4, !dbg !19

   %5 = add nsw i32 %4, %2, !dbg !22
   tail call void @llvm.dbg.value(metadata i32 %5, metadata !13, metadata !14), !dbg !15
   %6 = tail call i32 @bar(int)(i32 %5), !dbg !23
   tail call void @llvm.dbg.value(metadata i32 %6, metadata !13, metadata !14), !dbg !15
   %7 = add nuw nsw i32 %3, 1, !dbg !24
   tail call void @llvm.dbg.value(metadata i32 %7, metadata !12, metadata !14), !dbg !16
   tail call void @llvm.dbg.value(metadata i32 %6, metadata !13, metadata !14), !dbg !15
   tail call void @llvm.dbg.value(metadata i32 %7, metadata !12, metadata !14), !dbg !16
   %8 = icmp eq i32 %7, 100, !dbg !25
   br i1 %8, label %9, label %1, !dbg !17, !llvm.loop !26
; ...
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 6.0.0 (trunk 310909)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)

My const function is clearly in the loop. Somehow the presence of `bar` is the problem. If I remove the bar call, the optimizer can even collapse the loop. (I observe the same not just with godbolt, but also with my own clang 4.0.0 derivative and Apple's Xcode 8.3.3 clang.)

So I tried a few other versions of clang (like 3.4.5 for instance...) in the godbolt explorer, but all exhibited the same behaviour.

But now comes the crazy part, when I do it on FreeBSD with clang-3.4.5 it works and produces:

   %1 = tail call i32 @foo(i32 9075) #3
   br label %2

; <label>:2                                       ; preds = %2, %0
   %x.02 = phi i32 [ 0, %0 ], [ %4, %2 ]
   %i.01 = phi i32 [ 0, %0 ], [ %5, %2 ]
   %3 = add nsw i32 %1, %x.02
   %4 = tail call i32 @bar(i32 %3) #4
   %5 = add nsw i32 %i.01, 1
   %exitcond = icmp eq i32 %5, 100
   br i1 %exitcond, label %6, label %2
...
!0 = metadata !{metadata !"FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512"}

So how do I get the desirable FreeBSD clang behaviour ? Do I have to configure clang in a special way ?

Ciao
    Nat!

I have a pretty severe performance problem.

I would have thought that clang would hoist my __attribute__((const) functions out of a loop, if the input to that function is constant too. But out of the box it rarely seems to do it.

I don't know of any reason why this would be different on different platforms, and I can verify that hacking a .ll file to pretend to be freebsd doesn't convince LLVM to hoist it. Much more likely is that the FreeBSD version is a very old version of Clang; maybe there is a more recent change that prevents hoisting of const calls in some situations. At any rate, your best bet is to file a bug.

John.

John McCall schrieb:

I don't know of any reason why this would be different on different platforms, and I can verify that hacking a .ll file to pretend to be freebsd doesn't convince LLVM to hoist it. Much more likely is that the FreeBSD version is a very old version of Clang; maybe there is a more recent change that prevents hoisting of const calls in some situations. At any rate, your best bet is to file a bug.

John.

Hi John,

Thanks for the reponse.

In the meantime, I've tried various freebsd clang versions. The regression happens between 3.8.1 and 3.9.0. I tried 3.8.1 on my own linux machine and the compiler also did hoist `foo` out of the loop.

So it seems that the godbolt explorer is somehow fooling with me, since there 3.8.1 does not do it. That's so curious though, that I still believe it must be a change in the config options somehow. (I tried with Firefox and Chrome).

I put up a bug report https://bugs.llvm.org/show_bug.cgi?id=34208 on this matter. But filing a bug is likely not my best bet. I say that with my experience of another optimizing bug which pains me a lot (https://bugs.llvm.org/show_bug.cgi?id=24448). It's been open for ~ 2 years :slight_smile:

So is there an easy way for me to narrow this bug down further to the commit level ? I read about llvmlab somewhere once, but http://lab.llvm.org/ seems closed ...

Ciao
    Nat!

Well, you could certainly just check out the source code and use 'git bisect'.

John.

I think I know now, what the problem is. This summary is also in https://bugs.llvm.org/show_bug.cgi?id=34208#c4 preceeded by a lot of other comments, how I got there.

Note: The loop header is essentially the loop body in this test case.

Summary

I did, and it apparently regressed because of this commit:

https://reviews.llvm.org/rL272489 ("[LICM] Make isGuaranteedToExecute more accurate")

Apparently this was a fix for <https://bugs.llvm.org/show_bug.cgi?id=27857&gt;\.

-Dimitry

This seems like an unintentional regression, then.

John.