Question about llvm::Value::print performance

Hi,

I want to use llvm::Value::print to output the assembly strings for llvm::Instructions
inside a rather large llvm::Module (linked module with lots of types/...).

I started with plain ::print and switched over to

http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html#a04e6fc765eeb0c4c90ac5d55113db116

with a ModuleSlotTracker I pass in myself to avoid some complexity.

Still now I have the issue that the TypeFinder::run and related that is done internally
by each print still is 99% of my program runtime. (at least perf tells me that)

Is there any clever workaround for this?

I looked a bit through the sources but it seems that no API is exported that would allow a faster print
call.

Given I want the output only more or less as "debugging" annotation to my graph, is there some other
"fast" way to print instructions, even if some info is missing then?

Greetings
Christoph

Hi Christoph,

maybe there is a way of caching the print outputs and output them at the end of the program execution?
So, your real application do not have this kind of bottle neck.

Best regards,
Thomas

Dear Thomas,

Hi Christoph,

maybe there is a way of caching the print outputs and output them at the
end of the program execution?
So, your real application do not have this kind of bottle neck.

this is a valid idea, thought the problem is: I output all things only "once" and I even
output it like:

1) load module
2) go over functions
3) output all blocks with instructions in the current function

That allows to e.g. use ModuleSlotTracker to compute the <label> comments of blocks fast,
as each function is only visited once.

Still, the performance drop to just print each instruction once is "large", without instruction printing
my control flow graph construction works in ~1 second, with printing I aborted after some minutes the execution.

(the module is really large, linked full graph of some medium sized application, around 5000 functions and 500k instructions
and a lot of debug/type info)

If there is some way that I precompute things on my own, I would really like to use that, but I don't see a way
to do that.

Greetings
Christoph

Hi Christoph,

is it possible to reduce the amount of "work" for every process and iterate over all functions in different processes?
I mean: Debug the first 100 functions, then in the socond step, debug function 101 till 200, and so on. So you summarize and complete your result over several, standalone debugs steps.

Best regards,
Thomas

Hi,

Hi Christoph,

is it possible to reduce the amount of "work" for every process and
iterate over all functions in different processes?
I mean: Debug the first 100 functions, then in the socond step, debug
function 101 till 200, and so on. So you summarize and complete your
result over several, standalone debugs steps.

not really, I need the graph with instruction labels in one go, as it will be used to visualize things,
e.g. like this example that works on PPC assembly. (a bit like writing a dot graph of the IR)


Beside, as the printing seems really like "per instruction redo all type importing and co."
it won't even scale if I let all cores do things.

The AssemblyWriter that is internal has rather clever caching for that which I can't access
from the outside, if I am not wrong.

But perhaps somebody has a clever way around this issue.

Greetings
Christoph

Hi,

Hi,

Hi Christoph,

is it possible to reduce the amount of "work" for every process and
iterate over all functions in different processes?
I mean: Debug the first 100 functions, then in the socond step, debug
function 101 till 200, and so on. So you summarize and complete your
result over several, standalone debugs steps.

not really, I need the graph with instruction labels in one go, as it will be
used to visualize things,
e.g. like this example that works on PPC assembly. (a bit like writing a dot
graph of the IR)

https://www.absint.com/stackanalyzer/shot5.png

Beside, as the printing seems really like "per instruction redo all type
importing and co."
it won't even scale if I let all cores do things.

The AssemblyWriter that is internal has rather clever caching for that which I
can't access
from the outside, if I am not wrong.

But perhaps somebody has a clever way around this issue.

Would be some patch acceptable that allows to pass in some TypePrinting object to

void Value::print(raw_ostream &ROS, ModuleSlotTracker &MST,
                  bool IsForDebug) const

like it was done for the ModuleSlotTracker &MST parameter?

TypeFinder is already exposed, would it be possible to expose the

class TypePrinting {
  TypePrinting(const TypePrinting &) = delete;
  void operator=(const TypePrinting&) = delete;
public:

  /// NamedTypes - The named types that are used by the current module.
  TypeFinder NamedTypes;

  /// NumberedTypes - The numbered types, along with their value.
  DenseMap<StructType*, unsigned> NumberedTypes;

  TypePrinting() = default;

  void incorporateTypes(const Module &M);

  void print(Type *Ty, raw_ostream &OS);

  void printStructBody(StructType *Ty, raw_ostream &OS);
};

class for that purpose, too?

Greetings
Christoph