Possible to query type information from a malloc in optimized codes

Hello,

I am working on a pass that tries to extract type information from, say, all malloc statements in LLVM-IR (source language is C).

For debug code, this can be achieved by looking up the respective bitcast instruction and extracting the type from it.

However, in optimized code, the LLVM-IR omits these direct bitcasts in different scenarios (see example after the question).

My question now, is there any way to use, e.g., debug data or some use-def search to reliably extract the correct type information for such a malloc?

For one instance, consider the following C code:

typedef struct {
int nvars;
int* vars;
} struct_grid;

void set(struct_grid* pgrid, int nvars, int* vars_n) {
int* new_vars;
new_vars = (int*)malloc(nvars * sizeof(int));
for (int i = 0; i < nvars; i++) {
new_vars[i] = vars_n[i];
}
pgrid->vars = new_vars;
}

Compiled with -g, we get the expected bitcast. With optimizations, we get:

%6 = tail call i8* @malloc(i64 %5) ; the malloc, no subsequent bitcast

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %6, i8* %10, i64 %12, i32 4, i1 false)

Thus, the %6 is never casted, as it is directly put into the memcpy operation.

Only later, through some indirection when new_vars is assigned to pgrid->vars can we get the real type:

%14 = getelementptr inbounds %struct.struct_grid, %struct.struct_grid* %0, i64 0, i32 1, !dbg !38
%15 = bitcast i32** %14 to i8**, !dbg !39
store i8* %6, i8** %15, align 8, !dbg !39, !tbaa !40
ret void

Thanks in advance.

I am working on a pass that tries to extract type information from,
say, all malloc statements in LLVM-IR (source language is C).

For debug code, this can be achieved by looking up the respective
bitcast instruction and extracting the type from it.

However, in optimized code, the LLVM-IR omits these direct bitcasts
in different scenarios (see example after the question).

My question now, is there any way to use, e.g., debug data or some
use-def search to reliably extract the correct type information for
such a malloc?

Hi Alexander. Not an LLVM-flavoured answer, but in case it's useful,
this is something that the tooling from my liballocs project can do for
C source code. <https://github.com/stephenrkell/liballocs>

Looking at bitcasts is at best heuristic since even in debug code there
need not be a bitcast in all circumstances. My approach -- also
heuristic, I admit -- has been to analyse the use of "sizeof" in C
source code. This works pretty well, with the caveat that if you have
malloc wrappers in the mix, since the sizeof occurs at the wrapper
call, not the malloc call, you have to declare such wrappers to the
tool.

(I agree with you that allocation sites could usefully be described in
debugging information; at present I'm not aware of any toolchains that
do this.)

Feel free to mail me off-list if you have questions about
building/using liballocs... it's not mega-friendly as yet, though I am
interested in improving that.

Stephen

Type information isn’t preserved in LLVM IR- the debug info will provide a best-effort, but optimizations might pull apart structures, collapse values across different variables, etc. So it’s potentially lossy.

You can either use debug info (which carries as much type information as is available right now - many quality of implementation issues/areas of improvement where the debug information is lossy) or potentially insert your own intrinsics in the frontend to track the properties you care about.

  • Dave