Accessing instruction/operand names

Hello everyone,

I'm currently constructing a graph from LLVM bitcode, and I have a question
about accessing the names of the variables shown in the .ll assembly file,
assuming it's possible...

For example, with

%2 = load i32* %x_addr, align 4 ; <i32> [#uses=1]

I can retrieve the opcodeName() from the Instruction object, which is
"load". I can also access the operand and use getName() to retrieve
"x_addr". However, this instruction is storing into %2 - how do I access the
name of that?

Also, when an operand is a numbered temporary such as

%3 = add i32 %2, %1 ; <i32> [#uses=1]

I'm also unable to access the name of it. Are these numberings no longer
present in the bitcode? If not, is there any way to find out the name of
which local variable it is referencing?

Thanks in advance - I've been stuck on this for a while.

Best,

James

Hi James,

Names of the values may be missing so you shouldn’t use them to keep track of values or map between them. If you need to keep the map the best thing would be to use std::map of the pointers referencing to the Value itself.

Names are mostly used for the debugging information.

Milos.

2009/4/15 James Stanier <j.stanier@sussex.ac.uk>

James Stanier wrote:

Hello everyone,

I'm currently constructing a graph from LLVM bitcode, and I have a question
about accessing the names of the variables shown in the .ll assembly file,
assuming it's possible...

For example, with

%2 = load i32* %x_addr, align 4 ; <i32> [#uses=1]

I can retrieve the opcodeName() from the Instruction object, which is
"load". I can also access the operand and use getName() to retrieve
"x_addr". However, this instruction is storing into %2 - how do I access the
name of that?

Also, when an operand is a numbered temporary such as

%3 = add i32 %2, %1 ; <i32> [#uses=1]

I'm also unable to access the name of it. Are these numberings no longer
present in the bitcode? If not, is there any way to find out the name of
which local variable it is referencing?

This question seems a little bit confusing as written, given that LLVM IR is in SSA form. The actual names aren't really relevant to anything.

Maybe you could be more specific about the task you are trying to accomplish to get a good answer?

Luke

Hi Luke,

That's no problem - I understand why it would seem confusing.

The graph I'm constructing is a dataflow graph, and when working with the
bitcode, I'm iterating through each Instruction object and then generating
some nodes and edges.

Take these instructions as an example:

%x_addr = alloca i32 ; <i32*> [#uses=2]
// Instructions omitted...
%2 = load i32* %x_addr, align 4 ; <i32> [#uses=1]

The first instruction generates an "alloca" node, which has an edge to a
node specifying its type, which is i32*.

When the "load" node is generated, it has an edge to its operand, which
would be the previously generated "alloca" node (i.e. this is where %x_addr
"came from"). This is the reason I was curious about the names; I'm
currently just generating one cluster of nodes per instruction, and need to
be able to map them to something to say "we've just generated this edge,
which needs to point here".

Does this make any sense? I can elaborate with some pictures if it doesn't.

Best,

James

P.S. Milos - thanks for your answer.

Luke Dalessandro-2 wrote:

James, the "variables" %number don't really exist in the IR, they are there for the textual representation. In memory all you've got is interconnected Value objects.
To understand this, imagine you have the following code:
%3 = add i32 %2, %1
%5 = add i32 %3, %2
and you have a pointer to the Value that holds the "%5 = ..." add instruction. Now, if you call getOperand(0) on it, it will not return
a "variable" named %3, it will return another Value, which will be the Instruction "add i32 %2, %1"

However, if you're trying to do what I think you are, you can write a recursive function exprToString()
that keeps following the pointers up the tree until it hits a constant, or a function call, or a named variable and rebuilds the expressions in a human readable way.

Anthony

The other repliers have been right that you probably want to use
Value*s rather than string names in constructing your dependency
graph, but I wanted to clear up a second possible point of confusion.
When you see %2 in the assembly, that's an indication that the
instruction's name is empty. That is, value->getName() == "". As far
as I know, llvm-dis just generates numbers in order for un-named
instructions. When the instruction has a name (value->getName() ==
"the_name"), you get %the_name instead of the number. Does that make
sense?

I can retrieve the opcodeName() from the Instruction object, which is
"load". I can also access the operand and use getName() to retrieve
"x_addr". However, this instruction is storing into %2 - how do I access the
name of that?

Also, when an operand is a numbered temporary such as

%3 = add i32 %2, %1 ; <i32> [#uses=1]

I'm also unable to access the name of it. Are these numberings no longer
present in the bitcode? If not, is there any way to find out the name of
which local variable it is referencing?

I agree with everyone else that you should not be using names to track anything. But if you want names, you can run 'opt -instnamer' to assign names to annymous values. I would only recommend using that for debugging purposes though.

-Tanya

Everything that has been said is correct -- I'm using a std::map to index the
pointers to Values, and I think what I'm doing is going to work now although
it'll need a few more hours before I find out for sure!

Thanks ever so much for the help.

Jeffrey Yasskin wrote: