distinguish program and temporary variables

Hi,

I need to check if a variable belongs to the program originally. Consider the following code line:

y = x + 4

and its corresponding llvm ir (roughly):

%16 = load i32 %x
%add = add i32 %16, i32 4

store i32 %add, %y

I need to distinguish between %16, %add and %x, %y.

Any help is appreciated.

Best,
Mohammad

You might be able to use the Debug information embedded within the LLVM IR to determine what is an original variable and what is a temporary added by LLVM. However, I think that such an approach is fragile. It sounds like you need to be analyzing Clang ASTs instead of LLVM IR. The Clang AST represents a program in its original source form, so you can tell what is a program variable, in what file it was defined, its original source type, etc. Regards, John Criswell

Hi,

I need to check if a variable belongs to the program originally. Consider
the following code line:

y = x + 4

and its corresponding llvm ir (roughly):

%16 = load i32 %x
%add = add i32 %16, i32 4
store i32 %add, %y

I need to distinguish between %16, %add and %x, %y.

You might be able to use the Debug information embedded within the LLVM IR
to determine what is an original variable and what is a temporary added by
LLVM. However, I think that such an approach is fragile.

yes, Debug info provides lots of useful info and i think there should be
sth about this case. But I don't know which api or method does this.

It sounds like you need to be analyzing Clang ASTs instead of LLVM IR.
The Clang AST represents a program in its original source form, so you can
tell what is a program variable, in what file it was defined, its original
source type, etc.

I'm not sure about Clang ASTs but this is part of an llvm pass which
analyzes llvm IR. So, i doubt if i can use Clang AST.

Best,
Mohammad

Look at the Doxygen documentation on the LLVM web site for documentation on the Value and Instruction classes. They probably have methods for retrieving the Debug Metadata (and if they don’t, one of their subclasses/superclasses does). You can search through the LLVM source code for examples as well, plus I think there’s a document that describes the format of the LLVM Debug Metadata on the LLVM web page. Why are you restricted to using LLVM IR? Regards, John Criswell

thanks for your time and reply.

Look at the Doxygen documentation on the LLVM web site for documentation
on the Value and Instruction classes. They probably have methods for
retrieving the Debug Metadata (and if they don't, one of their
subclasses/superclasses does). You can search through the LLVM source code
for examples as well, plus I think there's a document that describes the
format of the LLVM Debug Metadata on the LLVM web page.

what about DILocalVariable class? do you think it would provide any info
about the case?

It sounds like you need to be analyzing Clang ASTs instead of LLVM IR.
The Clang AST represents a program in its original source form, so you can
tell what is a program variable, in what file it was defined, its original
source type, etc.

I'm not sure about Clang ASTs but this is part of an llvm pass which
analyzes llvm IR. So, i doubt if i can use Clang AST.

Why are you restricted to using LLVM IR?

I'm working on some part of a project. The whole project works on llvm ir.

Best,
Mohammad

This look like it will quickly degrades with optimization to a point where it won't be meaningful.
Why are you needing this? What are you trying to accomplish?

I’m writing a pass that eliminates some variables. To show the effect of the pass i need to show that I deleted the variables that originally appear in the user code, not temporary variables added by llvm.

Why?

The notion of a “variable from the user code” is meaningless after the minimal amount of optimizations.

For example:

int foo(int a, int b) {
int c = a + 1;
int d = b + 2;
return c + d;
}

Just running mem2reg and reassociate leads to:

; Function Attrs: nounwind ssp uwtable
define i32 @_Z3fooii(i32 %a, i32 %b) #0 !dbg !4 {
entry:
call void @llvm.dbg.value(metadata i32 %a, i64 0, metadata !12, metadata !13), !dbg !14
call void @llvm.dbg.value(metadata i32 %b, i64 0, metadata !15, metadata !13), !dbg !16
call void @llvm.dbg.value(metadata i32 %add, i64 0, metadata !17, metadata !13), !dbg !18
call void @llvm.dbg.value(metadata !2, i64 0, metadata !19, metadata !13), !dbg !20
%add = add i32 %a, 3, !dbg !21
%add2 = add i32 %add, %b, !dbg !22
ret i32 %add2, !dbg !23
}

You still have two values (%add and %add2) but they don’t really match any source variable.
The first %add is still attached to a llvm.dbg.value, but not %add2.

And this is just a very simple example, with only one transformation (beside mem2reg).

That REALLY sounds like something that should be done at an earlier stage in the compilation - at LLVM level, you can’t really know whether something was produced by the compiler itself, or as a consequence of something in the source code.

Unless there is some very specific pattern to those varibles (e.g. “they are always called XYZ_abc_kerflunk_billy_bob_*” - it is unlikely that the compiler will call a generated variable that).

That REALLY sounds like something that should be done at an earlier stage
in the compilation - at LLVM level, you can't really know whether something
was produced by the compiler itself, or as a consequence of something in
the source code.

I guess the compiler should know about it. It is inserting these new
variables into the ir code. And the pattern for temporary variables in llvm
is somehow instruction-dependent, e.g. "add" instructions have temporary
variables like %add, %add2 etc.

Unless there is some very specific pattern to those varibles (e.g. "they
are always called XYZ_abc_kerflunk_billy_bob_*" - it is unlikely that the
compiler will call a generated variable that).

Best,
Mohammad