Added AllocaInsts are relocated in stack

Hi there,

I am wondering how I can prevent the LLVM from re-ordering the added local variables during instrumentation?
Because, during the instrumentation, I add some metadata to some local variables, exactly next to it, and the generated bitcode looks good. However, when it is executed, basically the stack is formed as all original local variables are located next to each other, and then all the metadata is inserted. In other words, I was expecting to have “data, metadata, data, metadata”, however, I was seeing “data, data, metadata, metadata” in the actual stack.

Investigating into this problem, I realized that when printing the instruction instance (errs() << AI << “\n”), there is a numbering for each AllocaInst. However, the newly-added AllocaInsts, during instrumentation, are having a way higher number than the original AllocaInsts’, and I guess that is why all the original local variables are first located in the stack, and then the metadata that I added are inserted into the stack.

I am wondering how I can prevent LLVM from doing re-ordering them, or even reset the AllocaInst numbers, so the newly added AllocaInst can be inserted between the existing and original local variables?

Look forward to hearing from you.
Best regards,
Saman

As far as I know you can't. LLVM makes no guarantees about stack
layout at any point.

What you almost certainly need to do is replace the original alloca
with one big enough to contain both the data and the metadata. You can
use ReplaceAllUsesWith directly on the new alloca for the existing
uses, and then your metadata handling would GEP into it before access
to get to the right part.

Cheers.

Tim.

Hi Tim,

Thanks for your reply. However, I have seen that addressSanitizer has done this by placing redzones around each local variable. But i have not figured out yet how they have done it, I was wondering if there is a switch or a method by which I can reset the slotNumbering given to each instruction. By doing so, LLVM would place them in the expected order I guess.

Best regards,
Saman

Hi Sam,

Thanks for your reply. However, I have seen that addressSanitizer has done this by placing redzones around each local variable.

Maybe conceptually, but as far as I can see from the IR ASAN maintains
a completely separate stack via calls to runtime support (like
__asan_stack_malloc).

By doing so, LLVM would place them in the expected order I guess.

I doubt it. Allocating objects on the stack involves a reasonably
sophisticated algorithm to try and minimize space consumed. Some of
that involves reordering variables with different sizes so that
they're contiguous. Some involves lifetime tracking to try and share
slots if two variables are live at different points.

Combined, it means LLVM isn't going to make any guarantees that the
order you write your allocas (or anything else you have access to)
dictates the order they get laid out in memory.

Why do you think the single-allocation approach isn't appropriate?
It's really the only way to guarantee you get a block, whether done
via alloca or via callbacks like ASAN.

Cheers.

Tim.

Oh, I see. Regarding single stack allocation i am not sure how it is possible.
For example for each local variable, I need to maintain a 4-byte metadata. for example, if it is a 4 bytes variable (e.i. int), I need to allocate 8, or if it is a struct, let’s say 26 bytes, I need to allocate 30 bytes.
How it is possible using AllocaInst?

Thanks in advance

Hi Sam,

For example for each local variable, I need to maintain a 4-byte metadata. for example, if it is a 4 bytes variable (e.i. int), I need to allocate 8, or if it is a struct, let's say 26 bytes, I need to allocate 30 bytes.
How it is possible using AllocaInst?

Your pass would look through the IR, find an alloca it needs to
instrument, say (in the most general case):

    %var = alloca %complex.type, i32 %some.arr.len, align whatever

It would then calculate the total number of bytes allocated (which
might not necessarily be statically known), and emit a new alloca (in
the same place) for that plus 4 bytes. In the most general case[*]:

    %n.elts = zext i32 %some.arr.len to i64
    %alloc.size = mul i64 COMPLEX_TYPE_SIZE, %n.elts ;
COMPLEX_TYPE_SIZE *is* statically known
    %alloc.plus.metadata.size = add i64 %alloc.size, 4
    %var.as.i8ptr = alloca i8, i64 %alloc.plus.metadata.size, align whatever
    %var = bitcast i8* %var.as.i8ptr to %complex.type*

At this point you'd call ReplaceAllUses to change everything that
referred to the original alloca to refer to the new bitcast instead
and all code that existed before your pass should work fine (after all
it's just a slightly bigger alloca, and well behaved code doesn't
overflow buffers). Then, when you wanted to access your metadata (or
at the beginning here to do it just once) you'd emit:

    %metadata.as.i8ptr = getelementptr inbounds i8, i8* %var.as.i8ptr,
i64 %alloc.plus.metadata.size
    %metadata = bitcast i8* %metadata.as.i8ptr to i32*

Then anything that wanted the metadata could access that last bitcast.

A lot of that would be simplified in the common case, where the total
size of the alloca is known statically. The mul+add could be done by
your pass and never appear in the IR.

Oh, and delete the original alloca from the basic block so you don't
waste space!

Cheers.

Tim.

[*] I'm assuming your 4-byte metadata doesn't need to be aligned to 4
bytes. If it does then the calculations get a little more complicated
but not fundamentally different. You'd just round the original size up
to a multiple of 4 before adding your metadata bit and make sure the
new alloca requested at least 4 alignment.

    %metadata.as.i8ptr = getelementptr inbounds i8, i8* %var.as.i8ptr,
i64 %alloc.plus.metadata.size

Sorry, typo here. The offset should obviously just try to get you past
the original object rather than the whole allocation:

    %metadata.as.i8ptr = getelementptr inbounds i8, i8* %var.as.i8ptr,
i64 %alloc.size

Tim.