I have a pretty severe performance problem.
I would have thought that clang would hoist my __attribute__((const) functions out of a loop, if the input to that function is constant too. But out of the box it rarely seems to do it.
Here is a simple example:
extern __attribute__(( const)) int foo( int i);
extern int bar( int i);
int foobar( void)
{
int i;
int x;
x = 0;
for( i = 0; i < 100; i++)
{
x += foo( 0x2373);
x = bar( x);
}
return( x == 1848);
}
When I put this into https://godbolt.org/ for x86-64 trunk with options `-O3 -S -emit-llvm` I get:
; ...
; <label>:1: ; preds = %1, %0
%2 = phi i32 [ 0, %0 ], [ %6, %1 ]
%3 = phi i32 [ 0, %0 ], [ %7, %1 ]
tail call void @llvm.dbg.value(metadata i32 %3, metadata !12, metadata !14), !dbg !16
tail call void @llvm.dbg.value(metadata i32 %2, metadata !13, metadata !14), !dbg !15
%4 = tail call i32 @foo(int)(i32 9075) #4, !dbg !19
%5 = add nsw i32 %4, %2, !dbg !22
tail call void @llvm.dbg.value(metadata i32 %5, metadata !13, metadata !14), !dbg !15
%6 = tail call i32 @bar(int)(i32 %5), !dbg !23
tail call void @llvm.dbg.value(metadata i32 %6, metadata !13, metadata !14), !dbg !15
%7 = add nuw nsw i32 %3, 1, !dbg !24
tail call void @llvm.dbg.value(metadata i32 %7, metadata !12, metadata !14), !dbg !16
tail call void @llvm.dbg.value(metadata i32 %6, metadata !13, metadata !14), !dbg !15
tail call void @llvm.dbg.value(metadata i32 %7, metadata !12, metadata !14), !dbg !16
%8 = icmp eq i32 %7, 100, !dbg !25
br i1 %8, label %9, label %1, !dbg !17, !llvm.loop !26
; ...
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 6.0.0 (trunk 310909)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
My const function is clearly in the loop. Somehow the presence of `bar` is the problem. If I remove the bar call, the optimizer can even collapse the loop. (I observe the same not just with godbolt, but also with my own clang 4.0.0 derivative and Apple's Xcode 8.3.3 clang.)
So I tried a few other versions of clang (like 3.4.5 for instance...) in the godbolt explorer, but all exhibited the same behaviour.
But now comes the crazy part, when I do it on FreeBSD with clang-3.4.5 it works and produces:
%1 = tail call i32 @foo(i32 9075) #3
br label %2
; <label>:2 ; preds = %2, %0
%x.02 = phi i32 [ 0, %0 ], [ %4, %2 ]
%i.01 = phi i32 [ 0, %0 ], [ %5, %2 ]
%3 = add nsw i32 %1, %x.02
%4 = tail call i32 @bar(i32 %3) #4
%5 = add nsw i32 %i.01, 1
%exitcond = icmp eq i32 %5, 100
br i1 %exitcond, label %6, label %2
...
!0 = metadata !{metadata !"FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512"}
So how do I get the desirable FreeBSD clang behaviour ? Do I have to configure clang in a special way ?
Ciao
Nat!