clang -O2 versus opt -O2 | llc | clang

I’m investigating a miscompilation bug ( http://llvm.org/bugs/show_bug.cgi?id=19823 ), but I’ve run into a problem: the output of the program is different when I compile the IR with clang compared to opt | llc | clang. Any clues on how to resolve this difference?

$ ./opt -O1 19823.ll | ./llc | ./clang -x assembler - -o a.out ; ./a.out ; echo $?
1
$ ./opt -O2 19823.ll | ./llc | ./clang -x assembler - -o a.out ; ./a.out ; echo $?
1
$ ./clang -O1 19823.ll ; ./a.out ; echo $?
1
$ ./clang -O2 19823.ll ; ./a.out ; echo $?

0 <— that’s bad!

The IR is pretty simple:
@a = common global i16 0, align 2
@c = global i16* @a, align 8
@d = global i8 0, align 1
@b = common global i16 0, align 2

; Function Attrs: nounwind ssp uwtable
define i32 @main() #0 {
entry:
%0 = load i16** @c, align 8, !tbaa !1
%d.promoted = load i8* @d, align 1, !tbaa !5
br label %lbl

lbl: ; preds = %lbl, %entry
%1 = phi i8 [ %dec, %lbl ], [ %d.promoted, %entry ]
%dec = add i8 %1, -1
%conv = sext i8 %1 to i16
store i16 %conv, i16* @b, align 2, !tbaa !6
store i16 1, i16* %0, align 2, !tbaa !6
%tobool = icmp eq i8 %dec, 0
br i1 %tobool, label %if.end, label %lbl

if.end: ; preds = %lbl
store i8 0, i8* @d, align 1, !tbaa !5
%2 = load i16* @a, align 2, !tbaa !6
%conv1 = sext i16 %2 to i32
ret i32 %conv1
}

Hi Sanjay,

I'm investigating a miscompilation bug (
19823 – wrong code at -O2 and -O3 on x86_64-linux-gnu in 64-bit mode (runtime unroller) ), but I've run into a problem:
the output of the program is different when I compile the IR with clang
compared to opt | llc | clang. Any clues on how to resolve this difference?

From the bug, it looks like there might be a problem with loop

unrolling. Running "opt -loop-unroll -unroll-runtime" on its own
performs the dodgy transformation and changes the output.

I've not tracked down quite how Clang sets that extra
"-unroll-runtime" option (I'd be interested to know myself, actually,
having failed).

Cheers.

Tim.

From: "Tim Northover" <t.p.northover@gmail.com>
To: "Sanjay Patel" <spatel@rotateright.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, May 23, 2014 2:22:55 PM
Subject: Re: [LLVMdev] clang -O2 versus opt -O2 | llc | clang

Hi Sanjay,

> I'm investigating a miscompilation bug (
> 19823 – wrong code at -O2 and -O3 on x86_64-linux-gnu in 64-bit mode (runtime unroller) ), but I've run into a
> problem:
> the output of the program is different when I compile the IR with
> clang
> compared to opt | llc | clang. Any clues on how to resolve this
> difference?

From the bug, it looks like there might be a problem with loop
unrolling. Running "opt -loop-unroll -unroll-runtime" on its own
performs the dodgy transformation and changes the output.

I've not tracked down quite how Clang sets that extra
"-unroll-runtime" option (I'd be interested to know myself, actually,
having failed).

It doesn't, but the backend can by overriding TTI::getUnrollingPreferences or setting LoopMicroOpBufferSize in the processor scheduling model.

-Hal

I've not tracked down quite how Clang sets that extra
"-unroll-runtime" option (I'd be interested to know myself, actually,
having failed).

It doesn't, but the backend can by overriding TTI::getUnrollingPreferences
or setting LoopMicroOpBufferSize in the processor scheduling model.

I don't think that explains what I'm seeing either: only PPC and R600
seem to mention that function. All code I'm compiling is x86_64.

Cheers.

Tim.

(Incidentally, "opt -loop-unroll -unroll-runtime" may need an already
optimised .ll file. I'm attaching my copy here in case it helps
anyone).

tmp1.ll (1.19 KB)

From: "Tim Northover" <t.p.northover@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Sanjay Patel" <spatel@rotateright.com>
Sent: Friday, May 23, 2014 2:49:16 PM
Subject: Re: [LLVMdev] clang -O2 versus opt -O2 | llc | clang

>> I've not tracked down quite how Clang sets that extra
>> "-unroll-runtime" option (I'd be interested to know myself,
>> actually,
>> having failed).
>
> It doesn't, but the backend can by overriding
> TTI::getUnrollingPreferences
> or setting LoopMicroOpBufferSize in the processor scheduling model.

I don't think that explains what I'm seeing either: only PPC and R600
seem to mention that function. All code I'm compiling is x86_64.

You missed my "or" :slight_smile: -- The x86_64 processor models (for Haswell, Sandy Bridge, etc.) do set LoopMicroOpBufferSize and will runtime unroll small loops.

-Hal

> It doesn't, but the backend can by overriding
> TTI::getUnrollingPreferences
> or setting LoopMicroOpBufferSize in the processor scheduling model.

I don't think that explains what I'm seeing either: only PPC and R600
seem to mention that function. All code I'm compiling is x86_64.

You missed my "or" :slight_smile: -- The x86_64 processor models (for Haswell, Sandy Bridge,
etc.) do set LoopMicroOpBufferSize and will runtime unroll small loops.

Ah, that does explain it: "opt -mtriple=x86_64-linux-gnu -mcpu=core-i7
-unroll-loops" at least does it.

Thanks Hal!

Tim.