Name of Function's original module during link-time optimization

Hi,

During link-time optimization using llvm-ld, I occasionally need to find the name/ID of the original module/bytecode file a Function belonged to. In order to do this I added a nameOfPreviousModule-attribute to Function and some getters/setters (see attached patch). However, I can't find out how to read in the ModuleID of a Module during bytecode loading in lib/bytecode/Reader/Reader.cpp.

How can one do this? Is there another (better) way to accomplish what I intend to do here (without extra attribute e.g.)?

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

patch.modulename-1.8 (2.15 KB)

Hi Bram,

Hi,

Reid Spencer wrote:

Call getBytecodeModuleProvider (see Reader.h).

The problem is that one needs to provide the filename of the original module as the argument of getBytecodeModuleProvider, whereas this is unknown (it's exactly what we're trying to find out).

But, by looking where this method is called in the original bytecode loading process, I figured out a way to set the attribute I added when all Functions of a Module are linked into the sole link-time Module (see patch). The reason I pass an extra argument to so many methods (I let it default to ""), is that I'd like to have a full path name, not only the specific file name, of the original modules. As such, using someFunction->getParent()->getModuleIdentifier() in lib/Linker/LinkModules.cpp:650 did not suffice. However, I noticed that File.toString() in lib/Linker/LinkItems.cpp:150 does not yield the full path name.

Could this patch (in a revised form) be added to LLVM?

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

patch.modulename-1.8 (6.62 KB)

What are you trying to accomplish? Why not use location records from debug info?

-Chris

Hi,

Yes, that is true. What are you trying to do?

-Chris

Hi,

Haven't tried them extensively yet, so I'm
wondering whether the remark in your mail of 09/04/2006 to Nikhil Patil about
"-g currently disables many optimizations" still holds.

Yes, that is true. What are you trying to do?

Basically, I'm working on an aspect weaver (next version of http://users.ugent.be/~badams/aspicere/) where the weaving functionality is performed at link-time by some LLVM passes I'm writing. As advice frequently needs join point-specific context information like the name of the woven advice, of the current method, of the current compilation unit, ... I provide this info by storing it in an LLVM struct passed at run-time to the advice (which has been transformed into a Function earlier). That's why I need to know the name of the module to which a Function originally belonged.

About the debugging intrinsics: I've did some small tests using "-g", but I got a segmentation fault in one of my passes. The error wasn't there before (i.e. before using -g), but I need to delve a bit deeper first before I can give meaningful details. However, the debugging code in the LLVM bytecode seems huge, so I'm wondering whether some more "limited" forms of the "-g"-flag exist? Finally, as I'm heavily relying on LLVM's optimization passes to clean up the woven code (remove unused context, ...), the "-g currently disables many optimizations" is an important concern to me.

Maybe an initial InstVisitor pass attaching the debug info as Annotations to Functions, ... while discarding the debugging instructions could solve these problems, as both the debugging info would be available in the IR to subsequent passes and the existing optimization passes (ignoring the Annotations) could remain unchanged. Could this approach be feasible? If so, I could put something together tomorrow.

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

Haven't tried them extensively yet, so I'm
wondering whether the remark in your mail of 09/04/2006 to Nikhil
Patil about
"-g currently disables many optimizations" still holds.

Yes, that is true. What are you trying to do?

Basically, I'm working on an aspect weaver (next version of http://
users.ugent.be/~badams/aspicere/) where the weaving functionality is
performed at link-time by some LLVM passes I'm writing. As advice

ok.

frequently needs join point-specific context information like the
name of the woven advice, of the current method, of the current
compilation unit, ... I provide this info by storing it in an LLVM
struct passed at run-time to the advice (which has been transformed
into a Function earlier). That's why I need to know the name of the
module to which a Function originally belonged.

ok.

About the debugging intrinsics: I've did some small tests using "-g",
but I got a segmentation fault in one of my passes. The error wasn't
there before (i.e. before using -g), but I need to delve a bit deeper
first before I can give meaningful details. However, the debugging
code in the LLVM bytecode seems huge, so I'm wondering whether some
more "limited" forms of the "-g"-flag exist? Finally, as I'm heavily
relying on LLVM's optimization passes to clean up the woven code
(remove unused context, ...), the "-g currently disables many
optimizations" is an important concern to me.

Right. It seems like you would be well served by building with debug info (which captures a variety of source level information), run your instrumentation pass, run a pass to strip out the debug info, then run optimizers as needed.

Note that file-level information is not all you're losing. When/if the inliner (or any other interprocedural pass) runs, will will mess up the usual notion of a function, so you can't rely on function names either.

Maybe an initial InstVisitor pass attaching the debug info as
Annotations to Functions, ... while discarding the debugging
instructions could solve these problems, as both the debugging info
would be available in the IR to subsequent passes and the existing
optimization passes (ignoring the Annotations) could remain
unchanged. Could this approach be feasible? If so, I could put
something together tomorrow.

I'd suggest writing a little pass that strips out debug intrinsics.

-Chris

Hi,

Chris Lattner wrote:

I'd suggest writing a little pass that strips out debug intrinsics.
  

OK, I did this and it works (the strange seg fault also disappears after all declared debug variables are gone)! In a first phase, all intrinsic instructions are discarded after extracting their data into annotations attached to the relevant Function. Then, a second phase wipes out the intrinsic instruction Functions themselves as well as all debug variables in the declaration section. This 2-phase approach is necessary as I implemented the pass as an InstVisitor.

A limitation here is that only Functions' debug data can be kept, as other Values (i.e. Instructions) are not Annotable. Is this an explicit design decision?

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University

PS: Could this pass be interesting to others? If so, I could send the code to the list or to Bugzilla.

Yes, we intentionally do not want things to be annotatable. In fact, Function being annotatable is a wart due to the way the code generator currently works. In general, we prefer to keep data in on-the-side maps instead of attached to the IR. The mailing list has archives of extensive discussion about this sort of thing.

-Chris

I find a bug in document llvm/docs/WritingAnLLVMPass.html#debughints

Since the PassManager class is in the namespace llvm, we should use command

(gdb) break llvm::PassManager::run

to set breakpoint. Otherwise we get error message:

(gdb) break PassManager::run
Can't find member of namespace, class, struct, or union named "PassManager::run"
Hint: try 'PassManager::run<TAB> or 'PassManager::run<ESC-?>
(Note leading single quote.)

The patch is:

--- WritingAnLLVMPass.html 2006-03-14 13:39:39.000000000 +0800
+++ WritingAnLLVMPass-new.html 2006-09-28 21:06:36.000000000 +0800
@@ -1475,7 +1475,7 @@
want:</p>

<pre>
-(gdb) <b>break PassManager::run</b>
+(gdb) <b>break llvm::PassManager::run</b>
Breakpoint 1 at 0x2413bc: file Pass.cpp, line 70.
(gdb) <b>run test.bc -load $(LLVMTOP)/llvm/Debug/lib/[libname].so -[passoption]</b>
Starting program: opt test.bc -load $(LLVMTOP)/llvm/Debug/lib/[libname].so -[passoption]

I've taken care of this. Documentation has been corrected.

Reid.