Efficient way to identify an instruction

Hi all,
I would like to understand if there is an efficient way to identify the instruction that “created” a specific variable. For example,

define i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca %struct._IO_FILE*, align 8
%3 = alloca [40 x i8], align 16
store i32 0, i32* %1, align 4
%4 = call %struct._IO_FILE* @fopen(i8* getelementptr inbounds ([51 x i8], [51 x i8]* @.str, i32 0, i32 0), i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.1, i32 0, i32 0))
store %struct._IO_FILE* %4, %struct._IO_FILE** %2, align 8
%5 = getelementptr inbounds [40 x i8], [40 x i8]* %3, i32 0, i32 0
%6 = load %struct._IO_FILE*, %struct._IO_FILE** %2, align 8
%7 = call i64 @fread(i8* %5, i64 2, i64 1, %struct._IO_FILE* %6)
%8 = load %struct._IO_FILE*, %struct._IO_FILE** %2, align 8
%9 = call i32 @fclose(%struct._IO_FILE* %8)
%10 = getelementptr inbounds [40 x i8], [40 x i8]* %3, i64 0, i64 0
%11 = load i8, i8* %10, align 16
%12 = sext i8 %11 to i32
%13 = icmp eq i32 %12, 66
br i1 %13, label %14, label %25
}

Having the reference I to the instruction in bold.Can i efficiently know that the variable %11 was “created” by the %3 = alloca [40 x i8], align 16.

I understand that the word “created” is not the most appropriate…

Any suggestion?

Thanks

Hi Alberto,

Hi Tim,
as always thanks for your help. Unfortunately I made a mistake in my email but apart from that I still have problems.

Hi Alberto,

Having the reference I to the instruction in bold.Can i efficiently know that the variable %11 was “created” by the %3 = alloca [40 x i8], align 16.

Yes, I.getOperand(0) is the AllocaInst in this case. So for example
isa(I.getOperand(0)) will return true. And if you care
about more details you can dyn_cast it and check any other
properties you want.

I would like to use the approach you described considering I to be a reference to the icmp instruction ( %13 = icmp eq i32 %12, 66 ). From what I understood i should do something like:

Instruction* source;

if(source = dyn_cast(I.getOperand(0))) {
cout << “Alloca Inst” << endl;
I.dump();
getchar();
}

I thought I.getOperand(0) was a reference to the instruction that have created %12. What am I missing?

Cheers.

Tim.

Thanks again
Alberto

Hi,
I was thinking that maybe there is always some sort of dependency analysis that would allow me to visit a tree from the AllocaInst to the IcmpInst. Is it already possible?

Thanks

Hi Alberto,

I have not used this myself, but I think you should be able to visit your Instruction ‘users()’. I think the name this function was given is a bit confusing, but this returns an iterator range you can iterate through to find instructions that depend on a given one.

Similarly, you have the function ‘uses()’ that can be used to traverse down the DAG when instructions are still on SSA form.

Look also at the related functions 'user_end()’, 'user_begin()’, ‘use_end()’ and 'use_begin()

I hope this helps.

Joan

Hi Alberto,

It seems to me that what you are trying to achieve falls into a bucket of Points-To Analysis. Given a pointer (or a pointer + memory instruction), PTA identifies its allocation sites – all places that could have created the pointed-to object [1, 2].

In this simplest case, you can take a value, strip casts, and get to load. Once you have the load, you can strip pointer casts of the pointer operand until you either reach an allocation site or an instruction you won’t know how to look through. I implemented something similar here [3], while BasicAA should (internally) have something more robust [4]. If you are interested in cross-function cases (interprocedural), you may be interested in external PTAs for LLVM like SVF [5] or SeaDsa [6]. Note that in the interprocedural case, you have to be careful what you consider an allocation site, especially with external functions that return pointers.

Sincerely,
Jakub

Hi all,
Thanks for the great answers you gave me. Yes it sounds like an interprocedural PTA. I’ll try to understand how other projects did it and go from there.

Thanks again
Alberto