Getting LLVM Instructions

Hi,

I am working on a project where I need to get a list of llvm Functions that were called during an execution (for futher analysis).
To do this I have maintained a vector<llvm:: Function*> which I print out to a .ll file at the end. However this takes a lot of time since the number of call Instructions is HUGE.
I feel that the bottleneck is the conversion from llvm:: Function to std::string

How can I speed this up?

I don’t necessarily need it in .ll format, if there is a way to dump the entire llvm::Function object as a byte stream to a .dat file and read it back as objects in a separate script, that would work too. I’m not sure how to do this (tried few things didn’t work), any help would be appreciated!

Thanks!

if you're trying to serialize LLVM IR and read it back again later -
yeah, probably best to use th binary searialization rather than the
textual. If I were doing this I'd try building something using clang
with -emit-llvm (that'll produce LLVM IR bitcode in the .o file) and
debug that to see which APIs are used to do that.

Replicating what clang -emit-llvm does sound like the better way to do it.

I was looking under IRPrintingPasses but couldn’t find anything specific that would allow me to print out say a std::vector<llvm:: Instruction*>.

What do you think would be the easiest way to do this? Can I do some hack where I can get away without writing my own llvm pass? I’m not even sure what the right question to ask is, since this is the first time I’m working with llvm.

Thanks!

I'm not sure that LLVM's bitcode format would natively support just a
handful of Instructions, rather than a whole llvm::Module.

If you really want just a handful of instructions, maybe text is the
way to go - it sounded like you were serializing whole functions, at
least - which could be copied/cloned/moved into a standalone
llvm::Module and serialized from there. If it's only select
instructions, then maybe text is fine? Or maybe you can summarize the
information you want from the call more succinctly than LLVM's textual
representation.

Maybe I did not state my use case correctly, I apologise for the confusion.

I have two different use cases -

  1. I have a list of function call instructions from which I can get a list of functions, and subsequently print them out instruction by instruction.

  2. I have a list of llvm::Instructions* directly (obtained by storing each instruction into a vector as it was executed).

For the second case, the number of instructions is well over 10,000. (Even for the first case, going through each Instruction of each function call, the total number of instructions to be printed is huge).

I have 25-30 such traces, so when I try to print out everything it takes a couple of hours (using llvm::Value::print) so textual representation by way of printing using llvm::Value::print is not practical.

Binary dumps would include a lot of handling (since I need to resolve pointers of all objects I want to dump).

In the best case, it would be nice if I could club together the instructions into some container that I can use the clang -emit-llvm method on. I am inclined to think this cannot be an llvm::Module because (as I understand) just a list of instructions cannot be clubbed together to create a valid Module.

Does that offer more clarity for my use case (and why I am disinclined to use llvm::Value::print)?

Thanks!

Somewhat - though perhaps it's easier to emit the whole Module you
already have, then? & go hunting for the desired instructions from
scratch when you parse that Module back in? Rather than filtering
before writing, write IR unfiltered, and filter when reading it.

Yes, it would be difficult to create any coherent representation of a
collection of unrelated Instructions to write out to a Module.

It would have been easier to do that - only thing is that the order of the instructions is important and unique for each trace.
The llvm::Module will be more like the same program but in llvm IR assembly not C.
I am interested in preserving the order in which instructions and function calls happen in each trace.
I am not sure if I can create a module that would fit this requirement.

Thanks!

If you have the Module, you could record in a separate table just a
series of numbers - like "5th function, 10th instruction"?