profiling JIT compiled code with perf

Hello,

is there any support in LLVM for the jitdump format [1] of perf?

It enables perf report to also "zoom in" and annotate the JIT compiled
code on assembly level with runtime percentage. It helps a lot to
understand which parts of the generated code is the bottleneck.

I recently did a proof-of-concept for the JIT assembler asmjit [2]. It
just dumps the generated code in the right format and mmaps the file to
let perf record know about it. perf report picks it up automatically.

So, is there any profiling support for JIT compiled code?

Best regards,
Frank

[1] https://raw.githubusercontent.com/torvalds/linux/master/tools/perf/Documentation/jitdump-specification.txt
[2] WIP: added initial support for perf record by tetzank · Pull Request #197 · asmjit/asmjit · GitHub

Hello,

is there any support in LLVM for the jitdump format [1] of perf?

It enables perf report to also "zoom in" and annotate the JIT compiled
code on assembly level with runtime percentage. It helps a lot to
understand which parts of the generated code is the bottleneck.

I recently did a proof-of-concept for the JIT assembler asmjit [2]. It
just dumps the generated code in the right format and mmaps the file
to let perf record know about it. perf report picks it up
automatically.

So, is there any profiling support for JIT compiled code?

I guess that's a no.

Is there a simple way to get a pointer to the assemble code and the
exact size in bytes? Or should I just use the function pointer? Where
do I get the size?

Then I could at least dump the native code. Profiling on assembly level
would already help a lot.

Regards,
Frank

Have you seen https://reviews.llvm.org/D44892? we are using it in Julia to use perf on jitted code.

-Valentin

Have you seen https://reviews.llvm.org/D44892? we are using it in
Julia to use perf on jitted code.

No, I did not see this patch before. Thanks a lot for the pointer.

It seems to be doing exactly what I want. Let's see if I get it working.

Best regards,
Frank

Hi,

Hello,

> > Have you seen https://reviews.llvm.org/D44892? we are using it in
> > Julia to use perf on jitted code.
>
> No, I did not see this patch before. Thanks a lot for the pointer.
>
> It seems to be doing exactly what I want. Let's see if I get it
> working.

FWIW, I just merged this. Did you have any luck getting it to work?

thanks for working on perf profiling support for the jit engines.

Sadly, I still could not find the time to try it out, not even the
simple example from the commit message.

Also, I'm quite new to llvm and only have some toy examples at the
moment which use orcjit with the SimpleCompiler class. I'm not sure
where to add the perf event listener in all that.

Best regards,
Frank

> > Have you seen https://reviews.llvm.org/D44892? we are using it in
> > Julia to use perf on jitted code.
>
> No, I did not see this patch before. Thanks a lot for the pointer.
>
> It seems to be doing exactly what I want. Let's see if I get it
> working.

FWIW, I just merged this. Did you have any luck getting it to work?

I have trouble getting it to work. I tried the example in the commit
message, but perf report doesn't like to cooperate, or something else.

Do I have to activate something else than LLVM_USE_PERF in the
compilation of llvm? Or do some other configuration?

I had the impression that it's on by default in lli for now. Any
special switch I have to use? What am I missing?

Best regards,
Frank

Hi,

> > > Have you seen https://reviews.llvm.org/D44892? we are using it in
> > > Julia to use perf on jitted code.
> >
> > No, I did not see this patch before. Thanks a lot for the pointer.
> >
> > It seems to be doing exactly what I want. Let's see if I get it
> > working.
>
> FWIW, I just merged this. Did you have any luck getting it to work?

I have trouble getting it to work. I tried the example in the commit
message, but perf report doesn't like to cooperate, or something else.

Yea, as mentioned in the commit message, that doesn't work out of the
box unless you merge the additional patch mentioned therein. The reason
is that mcjit (in contrast to orc), doesn't call the handlers at the
right time.

If you have an orc based application, and you register the handler, then
it'll work OOTB.

Greetings,

Andres Freund