segv inside loop on x86_64

Hi,

One of my test cases is throwing a segv on x86_64 linux using llvm 3.7.
I cant see what is wrong unless its an alignment problem causing stack corruption. Its a simple naive absolute value calculation inside a loop which crashes after about 500000 iterations.

Here is the IR.

; Function Attrs: nounwind uwtable
define void @Main__TestProb() #0 {
entry:
%i = alloca i64, align 8
%j = alloca i64, align 8
store i64 0, i64* %j, align 8
store i64 1, i64* %i, align 8
br label %label_1

label_1: ; preds = %label_2, %entry
%v.87 = load i64, i64* %i, align 8
%abs_icmp = icmp slt i64 %v.87, 0
%itetmp = alloca i64
br i1 %abs_icmp, label %abs_then, label %abs_else

abs_then: ; preds = %label_1
%abs_ineg = sub nsw i64 0, %v.87
store i64 %abs_ineg, i64* %itetmp
br label %abs_end

abs_else: ; preds = %label_1
store i64 %v.87, i64* %itetmp
br label %abs_end

abs_end: ; preds = %abs_else, %abs_then
%abs_load = load i64, i64* %itetmp
store i64 %abs_load, i64* %j, align 8
%v.871 = load i64, i64* %i, align 8
%add = add nsw i64 1, %v.871
store i64 %add, i64* %i, align 8
br label %label_2

label_2: ; preds = %abs_end
%v.872 = load i64, i64* %i, align 8
%icmp = icmp sge i64 600000, %v.872
br i1 %icmp, label %label_1, label %else_1

else_1: ; preds = %label_2
br label %label_3

label_3: ; preds = %else_1
ret void
}

compiles ok with stock llc

Here’s the generated assembly

.globl Main__TestProb
.align 16, 0x90
.type Main__TestProb,@function
Main__TestProb: # @Main__TestProb
.cfi_startproc

BB#0: # %entry

pushq %rbp
.Ltmp5:
.cfi_def_cfa_offset 16
.Ltmp6:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp7:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq $0, -16(%rbp)
movq $1, -8(%rbp)
.align 16, 0x90
.LBB8_1: # %label_1

=>This Inner Loop Header: Depth=1

movq -8(%rbp), %rcx
movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp
testq %rcx, %rcx
jns .LBB8_3

BB#2: # %abs_then

in Loop: Header=BB8_1 Depth=1

negq %rcx
.LBB8_3: # %abs_else

in Loop: Header=BB8_1 Depth=1

movq %rcx, (%rax)
movq (%rax), %rax
movq %rax, -16(%rbp)
movq -8(%rbp), %rax
incq %rax
movq %rax, -8(%rbp)
cmpq $600001, %rax # imm = 0x927C1
jl .LBB8_1

BB#4: # %label_3

movq %rbp, %rsp
popq %rbp
retq
.Lfunc_end8:
.size Main__TestProb, .Lfunc_end8-Main__TestProb
.cfi_endproc

It crashes at movq %rcx,(%rax)

Any clues as to what I am doing wrong?

Regards Peter

You are supposed to have all allocas in the entry block, although I’m not sure if this is not expected to work.

This will work, but without a stacksave / stackrestore, each loop iteration will allocate 8 bytes of space on the stack. The default stack size on x86-64 is likely to be about 8MB, so if your loop runs for a million iterations, it’s pretty much guaranteed to run out of stack space and segfault. If it runs for less and is not one of the first calls in the stack, the same applies.

David

And because this is rounded up to 16 bytes, it runs out of half as many
iterations - in other words, around 500k iterations.

Not sure why the compiler generates such awkward code here:

movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp

but you can see that it subtracts 16 from the stack pointer each iteration,
and will eventually run out of stack-space.

Move the alloca out to the beginning of the function.

label_1: ; preds = %label_2,
%entry
  %v.87 = load i64, i64* %i, align 8
  %abs_icmp = icmp slt i64 %v.87, 0
  %itetmp = alloca i64
  br i1 %abs_icmp, label %abs_then, label %abs_else

You are supposed to have all allocas in the entry block, although I’m not
sure if this is not expected to work.

You can have allocas outside the entry block *but* they are not statically
allocated, they are dynamically allocated.

What this loop is running out of stack because it is allocating 8 bytes on
each iteration and executing ~500000 iterations.

label_1: ; preds = %label_2,
%entry
  %v.87 = load i64, i64* %i, align 8
  %abs_icmp = icmp slt i64 %v.87, 0
  %itetmp = alloca i64
  br i1 %abs_icmp, label %abs_then, label %abs_else

You are supposed to have all allocas in the entry block, although I’m not
sure if this is not expected to work.

You can have allocas outside the entry block *but* they are not statically
allocated, they are dynamically allocated.

What this loop is running out of stack because it is allocating 8 bytes on
each iteration and executing ~500000 iterations.

Apologies for the noise, I didn't see the follow on mails... My mail client
must be misconfigured....