Debugging LLVM IR with GDB

Hi all,

Has anybody debugged LLVM IR with GDB? I'm using dragonegg to transform C into IR, then applying my optimizations. Passing "-g" to dragonegg doesn't seem to work since it generates debug info for the C code, not the IR. I really need GDB (lli doesn't solve my problems) in order to debug multi-threaded and multi-process MPI code.

More clearly, if I have a file hello.ll and I execute:

$> llc hello.ll -march=x86-64 -o hello.s
$> mpicc -O0 hello.s -o hello

Can I debug "hello" with GDB at the IR level? If so, how? If not, would it be possible to add debug info generation at the IR level (and how much effort required)? I read something about DwarfDebug but it's not clear to me how to use it or even if it does what I need. I hope I didn't miss any obvious solution.

Thanks ahead,

Hi Pablo,

Has anybody debugged LLVM IR with GDB? I'm using dragonegg to transform C into
IR, then applying my optimizations. Passing "-g" to dragonegg doesn't seem to
work since it generates debug info for the C code, not the IR. I really need GDB
(lli doesn't solve my problems) in order to debug multi-threaded and
multi-process MPI code.

More clearly, if I have a file hello.ll and I execute:

$> llc hello.ll -march=x86-64 -o hello.s
$> mpicc -O0 hello.s -o hello

Can I debug "hello" with GDB at the IR level? If so, how? If not, would it be
possible to add debug info generation at the IR level (and how much effort
required)? I read something about DwarfDebug but it's not clear to me how to use
it or even if it does what I need. I hope I didn't miss any obvious solution.

it would be great to have a utility that, given an IR file, adds location debug
info to each line of it, the location being the line number within the IR file.
Then in the debugger you would get to see yourself moving around in the IR file
as you step through the program built from it. Unfortunately I have no idea how
to implement this in a reasonable way.

Ciao, Duncan.

Duncan Sands wrote:

Hi Pablo,

> Has anybody debugged LLVM IR with GDB? I'm using dragonegg to transform C into
> IR, then applying my optimizations. Passing "-g" to dragonegg doesn't seem to
> work since it generates debug info for the C code, not the IR. I really need GDB
> (lli doesn't solve my problems) in order to debug multi-threaded and
> multi-process MPI code.
>
> More clearly, if I have a file hello.ll and I execute:
>
> $> llc hello.ll -march=x86-64 -o hello.s
> $> mpicc -O0 hello.s -o hello
>
> Can I debug "hello" with GDB at the IR level? If so, how? If not, would it be
> possible to add debug info generation at the IR level (and how much effort
> required)? I read something about DwarfDebug but it's not clear to me how to use
> it or even if it does what I need. I hope I didn't miss any obvious solution.

it would be great to have a utility that, given an IR file, adds location debug
info to each line of it, the location being the line number within the IR file.
Then in the debugger you would get to see yourself moving around in the IR file
as you step through the program built from it. Unfortunately I have no idea how
to implement this in a reasonable way.

Ciao, Duncan.

There's always what the integrated assembler does if you say "-g" and
there aren't any debug-info directives in the assembler source. It
does something based on input line number, not that I find it very
useful, but it might be the most appropriate tactic for LLC.
--paulr

It should be possible to precompute the layout of the .ll file from the IR itself and then add self-annotations. It may require a special form of dumping the IR into a .ll file to make sure that empty lines, etc. are properly synchronized with the annotations. Wouldn't this solve the problem (reasonably)?

-Krzysztof

Hi Krzysztof,

Has anybody debugged LLVM IR with GDB? I’m using dragonegg to transform C into IR, then applying my optimizations. Passing “-g” to dragonegg doesn’t seem to work since it generates debug info for the C code, not the IR. I really need GDB (lli doesn’t solve my problems) in order to debug multi-threaded and multi-process MPI code.

More clearly, if I have a file hello.ll and I execute:

$> llc hello.ll -march=x86-64 -o hello.s
$> mpicc -O0 hello.s -o hello

Can I debug “hello” with GDB at the IR level? If so, how? If not, would it be possible to add debug info generation at the IR level (and how much effort required)? I read something about DwarfDebug but it’s not clear to me how to use it or even if it does what I need. I hope I didn’t miss any obvious solution.

With the recent MCJIT work, C code compiled with LLVM can be debugged in GDB (http://llvm.org/docs/DebuggingJITedCode.html). Alas, this doesn’t mean IR can be debugged as easily :slight_smile:

The problem is the following: the debug information attached to each IR instruction by Clang refers to the original C code. When this debug info passes through the various layers of optimizations, MC, JITting, it remains pointing to C code. To debug IR, one would have to add debug info to each instruction, actually pointing to the IR itself. Then I guess GDB would know how to debug it, at least with a few tweaks, because GDB generally doesn’t terribly care which language it debugs, as long as it can follow the debug info and the commands make sense (and this can be handled by adding appropriate Python plugins).

Eli

Has anybody debugged LLVM IR with GDB? I'm using dragonegg to transform C
into IR, then applying my optimizations. Passing "-g" to dragonegg doesn't
seem to work since it generates debug info for the C code, not the IR. I
really need GDB (lli doesn't solve my problems) in order to debug
multi-threaded and multi-process MPI code.

More clearly, if I have a file hello.ll and I execute:

$> llc hello.ll -march=x86-64 -o hello.s
$> mpicc -O0 hello.s -o hello

Can I debug "hello" with GDB at the IR level? If so, how? If not, would it
be possible to add debug info generation at the IR level (and how much
effort required)? I read something about DwarfDebug but it's not clear to me
how to use it or even if it does what I need. I hope I didn't miss any
obvious solution.

With the recent MCJIT work, C code compiled with LLVM can be debugged in GDB
(http://llvm.org/docs/DebuggingJITedCode.html). Alas, this doesn't mean IR
can be debugged as easily :slight_smile:

The problem is the following: the debug information attached to each IR
instruction by Clang refers to the original C code. When this debug info
passes through the various layers of optimizations, MC, JITting, it remains
pointing to C code. To debug IR, one would have to add debug info to each
instruction, actually pointing to the IR itself. Then I guess GDB would know
how to debug it, at least with a few tweaks, because GDB generally doesn't
terribly care which language it debugs, as long as it can follow the debug
info and the commands make sense (and this can be handled by adding
appropriate Python plugins).

Eli

Hi all,

For my own purposes, I wrote a pass that does exactly what you all are
describing: add debug metadata to LLVM IR.

As a pass, it had to tackle the problem of "This file needs to exist
on disk somewhere so gdb can find it", which I solved my dumping it
onto /tmp/ somewhere. Not a great solution (who deletes these?) but
worked well enough.

Another interesting issue is how to coexist with any existing debug
metadata, which can be useful for simultaneously debugging an IR
transform inline with the C source for instrumentation-style passes
like SAFECode, ASan/TSan.

Quick Example:

(gdb) break main
Breakpoint 1 at 0x4010b1: file
/home/wdietz2/magic/test/unit/test_loop.c, line 9.
(gdb) r
Starting program:
/home/wdietz2/llvm/32-obj-make/projects/magic/test/Output/test_loop

Breakpoint 1, main (argc=<value optimized out>, argv=<value optimized

) at /home/wdietz2/magic/test/unit/test_loop.c:9

9 unsigned k = 0;
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.80.el6_3.5.x86_64 libgcc-4.4.6-4.el6.x86_64
libstdc++-4.4.6-4.el6.x86_64
(gdb) n
10 source(argc != 0, &k);
(gdb) n
14 %and.i.i.i.i104 = and i64 %4, 70368744177660
(gdb) n
15 %5 = load i8** @global, align 8
(gdb) n
18 store i32 16843009, i32* %6, align 1
(gdb) n
19 store i8 1, i8* getelementptr inbounds ([1 x i8]* @array,
i64 0, i64 0), align 1
(gdb) n
20 call coldcc void @runtime_func() nounwind
(gdb) n
11 while(i-- > argc)
(gdb) n
23 %and.i.i.i.i85 = and i64 %7, 70368744177660
(gdb) n
14 while(j++ < i) k += j;
(gdb) n
11 while(i-- > argc)
(gdb) n
14 while(j++ < i) k += j;
(gdb) n
102 %77 = load i8** @global, align 8
(gdb) n
105 %79 = load i32* %78, align 4
(gdb) n
106 %cmp7.i.i.i = icmp ne i32 %79, 0
(gdb) n
108 call void @llvm.memset.p0i8.i64(i8* %add.ptr.i.i.i.i86, i8
%conv8.i.i.i, i64 4, i32 1, i1 false) nounwind
(gdb) n
14 while(j++ < i) k += j;
(gdb) n
15 while(j-- > 0) k *= k + j;
(gdb) n
95 %69 = load i8** @global, align 8
(gdb) n
98 %71 = load i32* %70, align 4
(gdb)

The pass itself is rather simple--the hard problem it solves is
emitting the IR to disk and reasoning about what Instruction* is on
what line, which really shouldn't be a problem if done properly in
LLVM. If desired I can certainly make the code available on request.

In short, it seemed to work well for me and having it done properly in
LLVM itself would be great!

~Will

Hi all,

For my own purposes, I wrote a pass that does exactly what you all are
describing: add debug metadata to LLVM IR.

As a pass, it had to tackle the problem of "This file needs to exist
on disk somewhere so gdb can find it", which I solved my dumping it
onto /tmp/ somewhere. Not a great solution (who deletes these?) but
worked well enough.

Another interesting issue is how to coexist with any existing debug
metadata, which can be useful for simultaneously debugging an IR
transform inline with the C source for instrumentation-style passes
like SAFECode, ASan/TSan.

[...]

The pass itself is rather simple--the hard problem it solves is
emitting the IR to disk and reasoning about what Instruction* is on
what line, which really shouldn't be a problem if done properly in
LLVM. If desired I can certainly make the code available on request.

I am definitely interested in such a pass. I think a pass that simply annotates an existing LLVM-IR file is already a big step forward. The stuff on top that you describe is also great, but if it complicates integration into LLVM, you could start with a plain LLVM-IR debug info annotation pass and add additional stuff later on.

Cheers
Tobi

Do you mean passing "-g" to llc or to llvm-as? Both return with an "unknown command line option: -g" error.

Thanks,
Pablo

Hi Will,

For my own purposes, I wrote a pass that does exactly what you all are
describing: add debug metadata to LLVM IR.

Your pass seems to be exactly what I'm looking for. ¿Could you make it available to me (us)?

Another interesting issue is how to coexist with any existing debug
metadata, which can be useful for simultaneously debugging an IR
transform inline with the C source for instrumentation-style passes
like SAFECode, ASan/TSan.

I'm not really an expert on debug info, so correct me if I'm wrong. The idea would imply some level of support from GDB itself (visualization of multiple file formats at the same time i.e. C and IR) which I'm not aware of, or perhaps generate a mixed-format file just for the purposes of visualization. The idea is interesting though.

The pass itself is rather simple--the hard problem it solves is
emitting the IR to disk and reasoning about what Instruction* is on
what line, which really shouldn't be a problem if done properly in
LLVM. If desired I can certainly make the code available on request.

In short, it seemed to work well for me and having it done properly in
LLVM itself would be great!

It might be already in the right place. As far as I understand, we should be able to annotate the code after all the optimizations have been made, which can be done inside opt. On the other hand, I can see the benefits of doing it inside llc or llvm-as, as we have more information about the correspondence of IR and assembler code.

Anyway, as Tobias said, your pass is a step forward on its own. I'm willing to help here.

Thanks ahead,
Pablo

Another thing, which would be really great, but also really hard, is
to map the SSA registers to their values.

So, in GDB, you could say:

(gdb) print %117
(int) %117 = 4

Though you'd have to generate all possible locations of the value in
the generated code, which can be a pain in the Dwarf back-end.

But just the line information is already great, as it interlaces with
C source lines, so it's easy to see what C line mapped to what group
of IR lines.

+1 for a patch to implement that! :wink:

Sorry, I wasn't clear. My thought would be along the lines of teaching
llc to understand "-g". This option would mean, if there is no debug
metadata, then pretend that each IR instruction in the input source file
had metadata, which pointed to the source line for that instruction in
the .ll file.

My inspiration was the case where you write a target-machine assembler
file with no debug-info directives (.loc etcetera) and then do
  clang -g foo.s
This will put debug info into the foo.o file, and that info will point
to the foo.s source lines for each machine instruction.
--paulr

Could anyone list up reasons why we need LLVM IR level debugger in place ?
I could only think of one reason and that is to debug IR portability issues (?)

I wouldn't go as far as saying that we "need" an IR-level debugger, but use Clang/LLVM to debug their code. At higher optimization levels in particular, source-level debugging may be obscured by various optimizations, and assembly-level debugging may be too tedious and cryptic.

-Krzysztof