Connecting JITted code to gdb

Hi all. I'm working on the recently-announced unladen-swallow project,
and I'm having a bit of trouble getting gdb to step into functions
I've compiled with LLVM's JIT compiler. The attached a_module.ll is
the module I produce from compiling

def foo(r):
  for i in r:
    pass

I'm JIT-compiling and running foo() with:

  typedef PyObject *(*NativeFunction)(PyFrameObject *);
  llvm::ExecutionEngine *engine = ...->getExecutionEngine();
  NativeFunction native =
    (NativeFunction)engine->getPointerToFunction(function);
  return native(frame);

However, when I try to step into the call with gdb, I get:

Breakpoint 1, eval_llvm_function (function_obj=0x142f6e0,
frame=0x1350b98) at ../src/Python/ceval.cc:2549
2549 (NativeFunction)engine->getPointerToFunction(function);
(gdb) n
Current language: auto; currently c++
2550 return native(frame);
(gdb) p native
$1 = (NativeFunction) 0x2080010
(gdb) b *0x2080010
Breakpoint 2 at 0x2080010
(gdb) s
Breakpoint 2, 0x02080010 in ?? ()

If I don't set the second breakpoint, that last step totally skips the
call. To see if I'm just emitting totally wrong debugging information,
I compiled the module into a binary with a stub main, and gdb'ed into
that. Trying to set a breakpoint on "foo" from there crashed Apple's
gdb, which isn't ideal but at least indicates that something's
happening with the dwarf information.

Do I need to do anything extra to get the debug information the JIT
produces hooked into gdb?

Thanks,
Jeffrey

a_module.ll (16.6 KB)

Hi, Jeffrey

Do I need to do anything extra to get the debug information the JIT
produces hooked into gdb?

I'm not sure, if debug information is ever emitted for code being JITed.
Most probably only EH info is honored.
Even if it is emitted - you need to "register" it into gdb somehow, I
don't remember offhand how you can do this, unfortunately.

Run with -debug-only=jit.

Break on line 1148 of JITEmitter.cpp. The debugging message will tell you the address and size of the function that was jitted. You can then tell gdb to disassemble the code.

Hi all. I'm working on the recently-announced unladen-swallow project,
and I'm having a bit of trouble getting gdb to step into functions
I've compiled with LLVM's JIT compiler. The attached a_module.ll is
the module I produce from compiling

def foo(r):
for i in r:
   pass

I'm JIT-compiling and running foo() with:

  typedef PyObject *(*NativeFunction)(PyFrameObject *);
  llvm::ExecutionEngine *engine = ...->getExecutionEngine();
  NativeFunction native =
    (NativeFunction)engine->getPointerToFunction(function);
  return native(frame);

However, when I try to step into the call with gdb, I get:

Breakpoint 1, eval_llvm_function (function_obj=0x142f6e0,
frame=0x1350b98) at ../src/Python/ceval.cc:2549
2549 (NativeFunction)engine->getPointerToFunction(function);
(gdb) n
Current language: auto; currently c++
2550 return native(frame);
(gdb) p native
$1 = (NativeFunction) 0x2080010
(gdb) b *0x2080010
Breakpoint 2 at 0x2080010
(gdb) s
Breakpoint 2, 0x02080010 in ?? ()

If I don't set the second breakpoint, that last step totally skips the
call. To see if I'm just emitting totally wrong debugging information,
I compiled the module into a binary with a stub main, and gdb'ed into
that. Trying to set a breakpoint on "foo" from there crashed Apple's
gdb, which isn't ideal but at least indicates that something's
happening with the dwarf information.

Do I need to do anything extra to get the debug information the JIT
produces hooked into gdb?

That isn't available. Currently the JIT does not produce dwarf information.

Evan

Run with -debug-only=jit.

OT: I take it the recommended model for tools that embed LLVM is for
them to accept all of LLVM's command line arguments on their own
command lines? For Python, it'd be much nicer to make this stuff
tweakable through a module at runtime, or even, for thread-safety
reasons, as a parameter to each call that cares about it. The command
line route will work for our development, but I don't think we'll be
able to release without a better story. (Luckily, we don't have
anything scheduled for 3ish months, and we're happy to send you
patches and pull them into our tree before an official LLVM release if
you don't have this done before we need it.)

Break on line 1148 of JITEmitter.cpp. The debugging message will tell
you the address and size of the function that was jitted. You can then
tell gdb to disassemble the code.

I also want to step through the code and print variables. I can find
the address to read for each variable by reading the assembly, or
maybe by watching the JIT's debug output, but that's significantly
more painful than the standard gdb interface. This will block our
release too since we can't ask most people to read assembly. We'll
need to get work done for this (or do it ourselves) on both your end
and gdb's end, since it doesn't yet have hooks to register debug info
like the exception system does.

I've started a page on the wiki to track the state of the art for
this: http://wiki.llvm.org/HowTo:_Tell_GDB_about_JITted_code

Thanks!

Run with -debug-only=jit.

OT: I take it the recommended model for tools that embed LLVM is for
them to accept all of LLVM's command line arguments on their own
command lines?

Well, sort of... LLVM considers whatever you pass to
ParseCommandLineOptions to be the command line, so you can tweak it
depending on your needs.

For Python, it'd be much nicer to make this stuff
tweakable through a module at runtime, or even, for thread-safety
reasons, as a parameter to each call that cares about it.

Command-line options are used for convenience in a variety of
places... if there's some specific option that you need to modify at
runtime that can't be changed in any other way, patches to change that
are welcome.

Also, IIRC, LLVM isn't threadsafe at the moment...

-Eli

Run with -debug-only=jit.

OT: I take it the recommended model for tools that embed LLVM is for
them to accept all of LLVM's command line arguments on their own
command lines?

Well, sort of... LLVM considers whatever you pass to
ParseCommandLineOptions to be the command line, so you can tweak it
depending on your needs.

Yes, so I'll probably add an environment variable to set this for my
use, since I keep getting told to pass --time-passes or
--debug-only=jit or --regalloc=local, and it'll be much easier to be
able to pass those parameters literally than to have to find the C++
interface to them each time. I might also try to expose
ParseCommandlineOptions as a Python function so I can add options
within the interpreter ... if that works.

For Python, it'd be much nicer to make this stuff
tweakable through a module at runtime, or even, for thread-safety
reasons, as a parameter to each call that cares about it.

Command-line options are used for convenience in a variety of
places... if there's some specific option that you need to modify at
runtime that can't be changed in any other way, patches to change that
are welcome.

I'll send you patches for these as I find them. I just wanted to point
out that this convenience for you guys is likely to be inconvenient
for people trying to embed LLVM.

Also, IIRC, LLVM isn't threadsafe at the moment...

Right, but in the long term (2009Q3-4) we'll need to make Python's
runtime threadsafe, and before that I believe we'll want to optimize
in a background thread while a foreground thread is executing code
(which may itself try to compile more Python and call into LLVM) so
I'm assuming that I'll be working on making LLVM thread-safe at some
point. We'll use a global LLVM lock until then and just take the
latency hit.

Thanks,
Jeffrey

I'm adding the gdb list because it appears there's currently no way to
tell gdb about newly-JITted code. That is, it's not an LLVM-specific
problem.

There appear to be two techniques in common use to debug
dynamically-generated code despite this. First, as Evan suggests
below, we can have the JIT print the address range that it's written a
function into, have gdb disassemble that, set breakpoints at
particular addresses, and print variables by knowing what register
they live in. Second, as described at
http://www.mono-project.com/Debugging#Debugging_with_GDB_in_XDEBUG_mode,
we can write out a full object file with dwarf tables, and use
add-symbol-file to get gdb to load that on demand.

Neither of these is ideal. add-symbol-file is better, but it doesn't
allow us to set breakpoints inside the JITted code until it's
generated, and it doesn't let those breakpoints follow the code as
it's re-optimized and re-translated. It currently also requires user
interaction, but it's possible that we could write a -gdb.py file to
reload the debug information every time the user gets to the gdb
prompt. There may be other problems I haven't though of.

It would be better to have an interface through which a JITting
library could tell gdb about newly-generated code. This could resemble
the overlay interface
(http://sourceware.org/gdb/current/onlinedocs/gdb_13.html#SEC108) or
the interface through which dynamic loaders tell gdb about
newly-loaded code. There are a couple considerations that are specific
to JITting, of course:
1. A JIT compiler generates new code frequently, and having to do
lots of extra work, especially while the debugger isn't attached, may
hurt performance.
2. Translated code gets duplicated, replaced, and freed, and gdb
needs to modify its breakpoints to keep up.

I don't really know enough about the internals of LLVM or gdb to make
any recommendations, but I think it would be useful to find some way
for them (and other debuggers and JITs) to talk to each other.

Jeffrey

I'm adding the gdb list because it appears there's currently
no way to tell gdb about newly-JITted code.

It would be better to have an interface through which a
JITting library could tell gdb about newly-generated code.

I don't really know enough about the internals of LLVM or gdb to make
any recommendations, but I think it would be useful to find some way
for them (and other debuggers and JITs) to talk to each other.

This sounds like an excellent project.

I think the only real barrier is finding someone dedicated to making
it work.

Tom

Hi Jeffrey,

Adding support for this would be a great thing, and is somewhere on my long-term todo list (which is probably farther out than you'd like it ;-).

Note that the LLVM JIT does have hooks to build a runtime symbol table for generated code, see the ENABLE_JIT_SYMBOL_TABLE ifdef in JITEmitter.cpp. This is used for performance tools that want to dynamically sample code that may or may not be generated by the JIT and need a way to attribute it back to the LLVM function name. My intention is for the simple data structure that is built by this (which hangs off the public symbol "__jitSymbolTable") to eventually include a pointer to GDB data. It would be really nice someday if GDB could walk this structure to find its debug symbols.

-Chris