Does Clang/LLVM not reuse memory locations for local variables with disjoint scopes?

I am interested in learning more about how compilers work, and one topic I wanted to check was how to compute an optimal stack layout. It was my understanding that to LLVM, a “static alloca” is an alloca in the entry block with a static size, and statically known stack allocations should be normalized to static allocas when a compiler frontend generates LLVM IR. Therefore, I was curious how LLVM recorded information about alloca lifetimes so it could reuse stack memory for local variables of disjoint lifetimes.

Consider the following two functions:

char* foo(bool b) {
    char* dangling = nullptr;
    if (b) {
        char s[50];
        *s = '9';
        dangling = &s[12]; 
    } else {
        int i[100];
        *i = 123;
    }
    return dangling;
}

char* bar(bool b) {
    char* dangling = nullptr;
    char s[50];
    int i[100];
    if (b) {
        *s = '9';
        dangling = &s[12]; 
    } else {
        *i = 123;
    }
    return dangling;
}

The only difference is the scope in which the arrays are declared. According to Godbolt, Clang 15 emits the following LLVM IR: Compiler Explorer

Both functions result in virtually the same IR. The only difference is the location of the @llvm.debug.declare intrinsic recording variable metadata. When searching up what this intrinsic meant, I found Source Level Debugging with LLVM — LLVM 16.0.0git documentation, which states:

In many languages, the local variables in functions can have their lifetimes or scopes limited to a subset of a function. In the C family of languages, for example, variables are only live (readable and writable) within the source block that they are defined in. In functional languages, values are only readable after they have been defined. Though this is a very obvious concept, it is non-trivial to model in LLVM, because it has no notion of scoping in this sense, and does not want to be tied to a language’s scoping rules.

In order to handle this, the LLVM debug format uses the metadata attached to llvm instructions to encode line number and scoping information.

So, I assumed that the stack frame layout phase probably used this information to give local variables of disjoint lifetime the same stack memory.

However, when I checked the assembly output, it appears that memory is not shared: Compiler Explorer Notice that both functions have the byte ptr [rbp - 80], 57 and the mov dword ptr [rbp - 480], 123, indicating that LLVM computed the same frame layout for both. However, I am fluent at assembly, so perhaps I made a mistake.

Am I misunderstanding something? Do I need to turn on an optimization flag?

Source Level Debugging with LLVM — LLVM 16.0.0git documentation states:

In the example above, every variable assignment uniquely corresponds to a memory store to the variable’s position on the stack. However in heavily optimized code LLVM promotes most variables into SSA values, which can eventually be placed in physical registers or memory locations. To track SSA values through compilation, when objects are promoted to SSA values an llvm.dbg.value intrinsic is created for each assignment, recording the variable’s new location.

Does this mean that local variables with disjoint lifetime are only given the same stack memory if they are promoted to registers, then register allocated? If the local variable cannot be promoted to a register, does LLVM not try to minimize the frame size by reusing stack memory?

The @llvm.dbg.* intrinsics are there to guide generating debugging information; they (should) have no effect on optimization. The IR for the two different functions shows all the alloca instructions at the top because that’s where Clang puts all the statically sized allocas; that’s done so that the stack frame will have a fixed size throughout the function.

Analyzing lifetimes in order to do things like reuse stack slots is an optimization, and your Compiler Explorer example did not specify any optimization (clang’s default is -O0). Adding -O1 to the options, I see that the else clauses are correctly optimized away entirely, so this isn’t a great example for reusing stack slots. You’ll need an example where the effects of the else clause are used in a way that is “visible” outside of the function.

Here’s my half-hearted attempt to get some stack reuse: Compiler Explorer but GCC does seem to reuse the same stack slot for both a and b whereas Clang doesn’t - I’m not sure why, but I believe Clang does do stack reuse in some cases - not sure which ones/why not this one. (or I’m misunderstanding something about the assembly, etc)

There’s definitely codegen passes that do stack slot coloring. We even do that for the unsafe stack now too in LLVM IR.

Tweaking @dblaikie 's Compiler Explorer example to use long will cause the slots to be reused. It may be that there’s an alignment requirement preventing the slot reuse in the previous example? But it does feel like this should work out of the box on such simple program.

I see that when -O1 is added, Clang emits the intrinsics @llvm.lifetime.start.p0 and @llvm.lifetime.end.p0 to indicate the lifetime start and end of the static allocas. So, even though all the allocas are at the entry block, with the lifetime intrinsics, LLVM knows their liveness. I don’t know when the optimization is prevented from kicking in, but @pogo59’s suggestion to add -O1 answers my initial question of how LLVM records lifetime information for stack slotting if all allocas are placed at the start.

1 Like