How to analyze where the address comes from?

Hi,

I want to get the information where the address of load/store comes from,
like below load instruction, %152 may come from a getelementpr, or comes from some gep+ptrtoint+add+inttoptr… instructions. what’s the recommended way to find the original memory pointer?

%153 = load <2 x i16> addrspace(1)* %152, align 2

going through the use-def chain seems not easy, because the ‘add’ operation contains two operands, one come from a pointer, the other is an integer offset. I could not know which is at operand 0 and which is at operand 1.

Thanks!
Ruiling

Previously, I started from the load/store instructions, and tried to find the original definitions by going through defs recursively. And I met the problem described in last mail.
It seems that I have to do things reversely. That is to start from definitions of pointers/memory objects and check their uses recursively. I don’t know whether this is a good idea.

Or is there any other good suggestion?

Thanks!
Ruiling

To find the source of the pointer for the LoadInst, you’ll need to climb up the def-use chain. In the case of an add instruction, you will have to search back through both operands to figure out which one originates from a pointer. You will also have to handle phi-nodes, so you’ll probably need a list of processed phi-nodes to ensure that you don’t iterate indefinitely. The only other way to do it is find all the definitions that you consider to be pointer “origins” (e.g., function arguments, the results of load instructions, etc.) and iterate through their uses until you find the load instruction that uses the pointer (in this case, %153). In other words, instead of starting at a use and searching for the definition, you start at all possible definitions and look for the use. If you’re searching for a lot of pointers, this may end up being more efficient as you won’t be traversing the same definitions over and over again. In short, you’re attacking the problem in the right way, and I don’t think there’s really any better way of doing it. Regards, John Criswell

Hi John,

Thank you so much for the comments! I will do it as suggested.

Thanks!

Ruiling

To find the source of the pointer for the LoadInst, you'll need to climb up
the def-use chain. In the case of an add instruction, you will have to
search back through both operands to figure out which one originates from a
pointer. You will also have to handle phi-nodes, so you'll probably need a
list of processed phi-nodes to ensure that you don't iterate indefinitely.

I tried the above idea, but I find it is easy find out whether a operand
comes from a pointer. But for the other operand, which comes from a
integer, it is hard to determine it does not come from a pointer, as
integer may come from various kinds of instructions, the stop-condition to
prevent further search is not obvious. As for the two operands of 'Add', i
don't know which comes from pointer, obviously I have to go through both of
them. I am not sure whether I understand your idea fully.

Random thought: what if both operands of an add come from pointers? e.g. in a naive coding of binary search as p = (p1+p2)/2

Is that simply illegal in LLVM? Or course it is in most languages including C without a lot of casts, though once you compile it that’s what the machine instructions will be. And silently (but usually harmlessly) overflow if you’re in the top half of the address space…

So of course you should write it as p = p1 + (p2-p1)/2, which type checks, doesn’t need casts, doesn’t overflow, and produces an add of a pointer and an integer. But does all code a compiler encounters actually do this?