Non "folding" Stack Allocation

Following a question on StackOverflow [1], I was wondering if for big allocations, LLVM would “delay” the allocation or rather perform it upfront.

The following code was thus submitted to the LLVM Try Out page:

void doSomething(char*,char*);

void function(bool b)
{

    char b1[1 * 1024];

    if( b ) {
       char b2[1 * 1024];

       doSomething(b1, b2);
    } else {

       char b3[512 * 1024];

       doSomething(b1, b3);
    }
}


Certainly nothing spectacular.

I was however quite surprised by the output:

; ModuleID = '/tmp/webcompile/_28066_0.bc'

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

target triple = "x86_64-unknown-linux-gnu"

define void @_Z8functionb(i1 zeroext %b) {

entry:
  %b1 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]

  %b2 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]

  %b3 = alloca [524288 x i8], align 1            ; <[524288 x i8]*> [#uses=1]

  %arraydecay = getelementptr inbounds [1024 x i8]* %b1, i64 0, i64 0 ; <i8*> [#uses=2]

  br i1 %b, label %if.then, label %if.else

if.then:                                          ; preds = %entry

  %arraydecay2 = getelementptr inbounds [1024 x i8]* %b2, i64 0, i64 0 ; <i8*> [#uses=1]

  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay2)

  ret void

if.else:                                          ; preds = %entry

  %arraydecay6 = getelementptr inbounds [524288 x i8]* %b3, i64 0, i64 0 ; <i8*> [#uses=1]

  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay6)

  ret void
}

declare void @_Z11doSomethingPcS_(i8*, i8*)

(Compiled with “Standard” optimizations as C++ code)

My surprise stems from the fact that Clang/LLVM seems to reserve (at least in its bytecode) space for all temporary variables, not taking into account that some are mutually exclusive. I would have expected the space to be folded. However, since this is LLVM IR, and not the final assembly, and since LLVM IR is strongly typed, it makes sense to keep them separated.

Therefore I was wondering if in the x86 representation (say) these would be folded, and if so what is the name of the Optimization/CodeGen pass responsible ?

– Matthieu

[1] http://stackoverflow.com/questions/7089035/at-what-moment-is-memory-typically-allocated-for-local-variables-in-c

I commented on stack overflow. The rough plan of record is captured here:
http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

The basic idea is that we capture the lifetime of the memory object in IR, then have the code generator allocate multiple alloca’s with non-overlapping lifetimes to the same stack offset.

-Chris

2011/8/17 Chris Lattner <clattner@apple.com>