bug of tail call optimization on x86 target

Dear LLVM developers,

I am a developer of SML#, an ML-style functional programming language
developed at Tohoku University. Currently we are intending to use
LLVM as the backend of our SML# compiler in our upcoming release, and
have rewritten our frontend and runtime so that they can cooperate
with LLVM. LLVM works extremely fine with our SML# compiler. We are
grateful to LLVM community for providing such an excellent software.

However, when generating x86 code with enabling tail call optimization,
unfortunately I found a bug that destroys callee-save registers.
The following is the bug I found and the attached file is the fix of
this bug.

The following is the LLVM IR code that causes the bug:

; bug.ll
target triple = "i686-apple-darwin12.4.0"
declare fastcc void @foo(i32, i32, i32, i32, i32, i32)
declare i32* @bar(i32*)
define fastcc void @hoge(i32 %b) nounwind {
  %a = alloca i32
  store i32 0, i32* %a
  %d = tail call i32* @bar(i32* %a) nounwind
  store i32 %b, i32* %d
  tail call fastcc void @foo(i32 1, i32 2, i32 3, i32 4, i32 5, i32 6) nounwind
  ret void
}

LLVM produces the following x86 code against the above source:

% llc -tailcallopt -filetype=asm -disable-cfi -o - bug.ll
...
_hoge: ## @hoge
## BB#0:
        subl $16, %esp
        pushl %esi ;;;; this preserves %esi, but
        subl $40, %esp
        movl %ecx, %esi
        movl $0, 40(%esp) ;;;; this overwrites preserved %esi
        leal 40(%esp), %eax
        movl %eax, (%esp)
        calll _bar
        movl %esi, (%eax)
        movl 60(%esp), %eax
        movl $6, 60(%esp)
        movl $5, 56(%esp)
        movl $4, 52(%esp)
        movl $3, 48(%esp)
        movl %eax, 44(%esp)
        movl $1, %ecx
        movl $2, %edx
        addl $40, %esp
        popl %esi ;;;; this always fails to restore %esi
        jmp _foo ## TAILCALL

In the above code, 2nd instruction "pushl %esi" saves a callee-save
register %esi to the stack, but 5th instruction "movl $0, 40(%esp)"
overwrites the saved %esi. Eventually at the epilogue "popl %esi"
fails to restore the previous %esi.

This bug is due to wrong computation of stack object indices by the
x86 backend. The attached patch indicates the wrong points. Due to
integral promotion, explicit conversion to signed integer is needed
at those points.

Eliminating tail calls (with arbitrary number of arguments) is mandatory
for practical functional languages. I would appreciate it if this bug
would be fixed in the next release.

Best regards,

Katsuhiro Ueno

x86-tailcallopt-fix.diff (1.75 KB)

Hi Katsuhiro,

Thanks very much for taking the time to dig into this and produce a
patch. Is there any chance you could make a couple of tweaks?

This bug is due to wrong computation of stack object indices by the
x86 backend. The attached patch indicates the wrong points. Due to
integral promotion, explicit conversion to signed integer is needed
at those points.

I'm not convinced that's the best solution, at least conceptually.
SlotSize really is an unsigned quantity, and though it's unlikely we'd
like 0x80000000 to be interpreted as positive, rather than negative if
it ever does occur.

Also, CreateFixedObject seems to take an int64_t

I'd suggest:
+ ISelLowering line 2459: cast FPDiff to int64_t.
+ ISelLowering line 3327: cast SlotSize to int64_t.
+ FrameLowering: declare TailCallReturnAddrDelta as int64_t and
subtract SlotSize?

It would also be really good if you could convert your IR into a
test-case (like the others in "test/CodeGen/X86/"). We like to have a
test-case for every commit if it's humanly possible.

Cheers.

Tim.

Hi Tim,

Thank you for your quick response.

I'm not convinced that's the best solution, at least conceptually.
SlotSize really is an unsigned quantity, and though it's unlikely we'd
like 0x80000000 to be interpreted as positive, rather than negative if
it ever does occur.

I totally agree with you.
I rewrote my fix and made a test case according to your suggestion.
All of them are included in the attached file.

Thanks and regards,

Katsuhiro Ueno

x86-tailcallopt-fix-2.diff (2.88 KB)

Hi Katsuhiro,

I rewrote my fix and made a test case according to your suggestion.
All of them are included in the attached file.

Thanks very much for doing that and taking the time in the first place
to make the patch. It looked good to me so I committed it in r187703.

It'll automatically make its way into the 3.4 release later this year.

Cheers.

Tim.

Hi Tim,

Thanks very much for doing that and taking the time in the first place
to make the patch. It looked good to me so I committed it in r187703.

It'll automatically make its way into the 3.4 release later this year.

Thank you very much for your cooperation and merging my patch.
We are looking forward to the 3.4 release.

Best regards,

Katsuhiro Ueno