debugging LLVM-JITted code

Hello,

I’m interested in debugging code JITted by LLVM at runtime. For that, I should naturally have some way to emit DWARF that faithfully describes the JITted code into memory along with the JITted code itself and point the debugger to it. Let’s assume that the bridge with the debugger is taken care of (e.g. http://llvm.org/docs/DebuggingJITedCode.html) - my concern in this question is solely with the LLVM side.

AFAIU, LLVM currently includes very partial support for emitting DWARF info with JIT, in the JITDwarfEmitter class. What it has is just a way to emit stack frame information (symbols and DWARF CFA), to allow meaningful core dumps from JITted code, with stack information. This is useful, but obviously very far from full debugging support.

So where should I look for adding such support? Is MC JIT the direction? Does it purport to emit DWARF as well as executable code? Any pointers to relevant places in the code would be most appreciated.

Note: I’m aware there’s an alternative approach - generating a “true” shared lib with LLVM’s toolchain dynamically and loading it. It has its pros and cons vs. generating DWARF directly with the JITted code, and in this question I’m focusing on the latter.

Thanks in advance,
Eli

Hello,

I’m interested in debugging code JITted by LLVM at runtime. For that, I should naturally have some way to emit DWARF that faithfully describes the JITted code into memory along with the JITted code itself and point the debugger to it. Let’s assume that the bridge with the debugger is taken care of (e.g. http://llvm.org/docs/DebuggingJITedCode.html) - my concern in this question is solely with the LLVM side.

AFAIU, LLVM currently includes very partial support for emitting DWARF info with JIT, in the JITDwarfEmitter class. What it has is just a way to emit stack frame information (symbols and DWARF CFA), to allow meaningful core dumps from JITted code, with stack information. This is useful, but obviously very far from full debugging support.

Quite.

So where should I look for adding such support? Is MC JIT the direction? Does it purport to emit DWARF as well as executable code? Any pointers to relevant places in the code would be most appreciated.

MC JIT is the direction that the JIT will be going in the future. Right now any debugging emission is minimal at best. My ideas here would be to emit the debug info as a section in memory and point the debugger at that.

Note: I’m aware there’s an alternative approach - generating a “true” shared lib with LLVM’s toolchain dynamically and loading it. It has its pros and cons vs. generating DWARF directly with the JITted code, and in this question I’m focusing on the latter.

nod Ultimately I think the two will be similar, just the destination of the jitted code - static in memory or shared on disk. It’s just a matter of communicating to the debugger where the debug information resides and having the code emitter output the debug information.

With MC JIT I don’t think getting it to emit the debug information will be particularly hard so you might want to look into it.

-eric

So where should I look for adding such support? Is MC JIT the direction? Does it purport to emit DWARF as well as executable code? Any pointers to relevant places in the code would be most appreciated.

MC JIT is the direction that the JIT will be going in the future. Right now any debugging emission is minimal at best. My ideas here would be to emit the debug info as a section in memory and point the debugger at that.

Hi Eric, thanks for answering.
Suppose I would like to prototype adding such functionality to LLVM - where in the code would be the best place to start? Any existing interfaces that should be reused/implemented/extended here?

Note: I’m aware there’s an alternative approach - generating a “true” shared lib with LLVM’s toolchain dynamically and loading it. It has its pros and cons vs. generating DWARF directly with the JITted code, and in this question I’m focusing on the latter.

nod Ultimately I think the two will be similar, just the destination of the jitted code - static in memory or shared on disk. It’s just a matter of communicating to the debugger where the debug information resides and having the code emitter output the debug information.

With MC JIT I don’t think getting it to emit the debug information will be particularly hard so you might want to look into it.

AFAIU one aspect where the two approaches are somewhat different is with relocation of debug information (since DWARF seems to need relocation). I.e. when generating the shared lib, run-time relocation will be handled by the dynamic loader. To generate DWARF into memory with MC JIT one has to essentially perform this relocation manually, since the generated DWARF has to point at the final addresses where code & data are located. What are your thoughts on this?

Eli

So where should I look for adding such support? Is MC JIT the direction? Does it purport to emit DWARF as well as executable code? Any pointers to relevant places in the code would be most appreciated.

MC JIT is the direction that the JIT will be going in the future. Right now any debugging emission is minimal at best. My ideas here would be to emit the debug info as a section in memory and point the debugger at that.

Hi Eric, thanks for answering.
Suppose I would like to prototype adding such functionality to LLVM - where in the code would be the best place to start? Any existing interfaces that should be reused/implemented/extended here?

lib/ExecutionEngine/MCJIT and lib/ExecutionEngine/RuntimeDyld. The old JIT (lib/ExecutionEngine/JIT) probably has some bits in it for telling the debugger where the debug info lives that could be repurposed, or at minimum would be good reading for general "what's involved" sort of information gathering.

If there's debug information in the input IR, the backend should generate the appropriate debug info sections in the MCJIT'ed object file (in memory), just like a normal object file would. The loader and dynamic linker will need to be taught what to do with them.

One thing to keep in mind, though, is that unlike the old JIT, the MCJIT explicitly supports environments where the compilation address space is not the same as the execution address space. For example, a debugger inserting code into a target process, possibly on a remote target. This is why, for example, there's some extra copying going on and why addresses are explicitly assigned rather than just using the addresses in-place from the compiled object file.

Note: I'm aware there's an alternative approach - generating a "true" shared lib with LLVM's toolchain dynamically and loading it. It has its pros and cons vs. generating DWARF directly with the JITted code, and in this question I'm focusing on the latter.

*nod* Ultimately I think the two will be similar, just the destination of the jitted code - static in memory or shared on disk. It's just a matter of communicating to the debugger where the debug information resides and having the code emitter output the debug information.

With MC JIT I don't think getting it to emit the debug information will be particularly hard so you might want to look into it.

AFAIU one aspect where the two approaches are somewhat different is with relocation of debug information (since DWARF seems to need relocation). I.e. when generating the shared lib, run-time relocation will be handled by the dynamic loader. To generate DWARF into memory with MC JIT one has to essentially perform this relocation manually, since the generated DWARF has to point at the final addresses where code & data are located. What are your thoughts on this?

Right. The RuntimeDyld will need to be expanded to know how to handle that.

-Jim

lib/ExecutionEngine/MCJIT and lib/ExecutionEngine/RuntimeDyld. The old JIT (lib/ExecutionEngine/JIT) probably has some bits in it for telling the debugger where the debug info lives that could be repurposed, or at minimum would be good reading for general “what’s involved” sort of information gathering.

If there’s debug information in the input IR, the backend should generate the appropriate debug info sections in the MCJIT’ed object file (in memory), just like a normal object file would. The loader and dynamic linker will need to be taught what to do with them.

One thing to keep in mind, though, is that unlike the old JIT, the MCJIT explicitly supports environments where the compilation address space is not the same as the execution address space. For example, a debugger inserting code into a target process, possibly on a remote target. This is why, for example, there’s some extra copying going on and why addresses are explicitly assigned rather than just using the addresses in-place from the compiled object file.

Jim, thanks for your answers. I will start looking in the directions you suggested, and will consult the list if/when technical doubts arise.

Eli