Dwarf debugging strangeness, continued...

Another chapter in the long saga of trying to get source-level debugging working :slight_smile:

I’ve switched over to using the direct object-file generation instead of generating assembly in hopes of getting around the “Fatal error: duplicate .debug_line sections” binutils bug.

I now have 4 different tools for dumping the DWARF info for an object module: dwarfdump, objdump, readelf, and pydevtools. Curiously, they don’t all agree on what debugging information is present in an LLVM-generated object file.

First let me explain what’s in the executable. There are several parts:

  • First, there are a bunch of .bc files that are generated by my frontend, and which contain debugging metadata (I checked using llvm-dis and the metadata appears to be OK). These are then combined by my linker, tartln, which combines the functions of ‘opt’ and ‘llc’, as well as having some custom passes for reflection and garbage collection. (In fact, about half of the source code for tartln was lifted directly from opt and llc, although this cut & paste operation occurred somewhere around the 2.5 timeframe, and I’ve tried to keep the code up to date since then.)

  • The output of this compilation phase is a .o file.- Second, there are some .cpp files that were compiled with gcc. These contain a few runtime support routines for the language, such as stderr output and the code to walk the stack frame (which requires inline assembly.) No, I’m not using clang, as I am trying to minimize build dependencies.

  • The output of this stage is a static library.- Finally, the .o and the static libraries are linked together by passing them to gcc. (I wanted to use ld directly, but when I do I get undefined symbol errors for all of the libunwind functions…more detail on that in another thread if anyone is interested. I know that the unwind functions are in some default-linked library, unfortunately which library is not obvious - there’s no file named libunwind anywhere in my library path)

  • The result of this stage is an executable.
    Now, when I attempt to dump the debugging info, the following happens:

  • With dwarfdump -a, I only seem to get a small number of subprogram definitions in the output: 25 total. However, with dwarfdump -i, I get 6141 subprogram defenitions. This is strange given that the docs for dwarfdump claim that -a is a superset of -i.

  • With “readelf --debug-dump” and “objdump --dwarf” I get 6170 subprogram definitions.

  • pydevtools simply crashes when fed my executable. This may be a bug in pydevtools. It’s hard to tell what’s going on here, I’ve debugged it a bit, and it seems like pydevtools is blowing up because the list of ELF sections that it reads is an empty list.

  • It would be great if I could get pydevtools working, because it has a GUI browser view for the debugging information - that would be a big help tracking down these problems. Oh well.

When I attempt to debug this in gdb, it acts as if it can only ‘see’ the debug definitions coming from gcc, and not any of the ones from my frontend. Doing a stack dump during execution shows symbolic information for the gcc-created functions, and bare machine addresses for the other stuff.

Note that all of the above is on Linux - on OS X I get a completely different set of errors.

At this point I’d pay money to get this solved…it stopped being fun, oh, about a year ago :slight_smile:

If you can send me couple of small .bc files and linked .o file (mach-O) created using tartln then I’ll try to see if I can find anything suspicious here.

  • First, there are a bunch of .bc files that are generated by my frontend, and which contain debugging metadata (I checked using llvm-dis and the metadata appears to be OK). These are then combined by my linker, tartln, which combines the functions of ‘opt’ and ‘llc’, as well as having some custom passes for reflection and garbage collection. (In fact, about half of the source code for tartln was lifted directly from opt and llc, although this cut & paste operation occurred somewhere around the 2.5 timeframe, and I’ve tried to keep the code up to date since then.)
  • The output of this compilation phase is a .o file.

If you can send me couple of small .bc files and linked .o file (mach-O) created using tartln then I’ll try to see if I can find anything suspicious here.

OK I’ll work on that when I get home. It may take a while to come up with a “small” .bc file, since even a “hello world” type program pulls in a substantial amount of library code (i/o libraries, container classes, root-level exception handling, argv handling, garbage collection runtime code, and so on.)

It may be something as simple as me doing something stupid in my build script, (although I did remember to use -fno-omit-framepointer).

I suppose I should mention one odd thing about my current build script: the current optimization level is -O2. I can’t use -O0, this causes an assertion failure in the lowering pass for LLVM intrinsics. The problem has to do with inlining and llvm.gcroot(). My frontend insures that all calls to llvm.gcroot() are in the first block of a function, but the inlining pass does not preserve this constraint - which causes the assertion failure. For some reason optimization makes this problem go away.

  • First, there are a bunch of .bc files that are generated by my frontend, and which contain debugging metadata (I checked using llvm-dis and the metadata appears to be OK). These are then combined by my linker, tartln, which combines the functions of ‘opt’ and ‘llc’, as well as having some custom passes for reflection and garbage collection. (In fact, about half of the source code for tartln was lifted directly from opt and llc, although this cut & paste operation occurred somewhere around the 2.5 timeframe, and I’ve tried to keep the code up to date since then.)
  • The output of this compilation phase is a .o file.

If you can send me couple of small .bc files and linked .o file (mach-O) created using tartln then I’ll try to see if I can find anything suspicious here.

OK I’ll work on that when I get home. It may take a while to come up with a “small” .bc file, since even a “hello world” type program pulls in a substantial amount of library code (i/o libraries, container classes, root-level exception handling, argv handling, garbage collection runtime code, and so on.)

It may be something as simple as me doing something stupid in my build script, (although I did remember to use -fno-omit-framepointer).

I suppose I should mention one odd thing about my current build script: the current optimization level is -O2.

That’s big. Debugging optimized code is completely a new chapter.

I can’t use -O0, this causes an assertion failure in the lowering pass for LLVM intrinsics. The problem has to do with inlining and llvm.gcroot(). My frontend insures that all calls to llvm.gcroot() are in the first block of a function, but the inlining pass does not preserve this constraint - which causes the assertion failure. For some reason optimization makes this problem go away.

You may want to press on this and find a solution.

One thing you can try to rule out any link stage bug is – take your .bc file and generate separate .o file for each .bc file using clang and see if dwarfdump finds any debug info in .o files or not. If it does, then tartln is dropping debug info on the floor somewhere.