Implementing minimal debug info (-g1?) for Clang

Hi!

Currently Clang “-g” flag emits full debug info, which is fine for debugging, but increases the binary size significantly.
It may be useful to produce less debug info, that is still enough for collecting nice stack traces with file names and line numbers,
but would introduce less overhead. Cary Coutant made a patch which does this for GCC (it didn’t hit trunk yet) -
reduces debug info to only descriptions of functions, extern variables, line number tables and inlined subroutine info
by setting “-gmlt” (“minimum line table”) or “-g1” flags.
(See this patch at http://old.nabble.com/-patch–Add-new–gmlt-option-for-min.-debug-info-with-line-tables-(issue4440072)-td31482851.html
or http://codereview.appspot.com/4440072). This patch is used in Google for about 2 years already.

I get the following binary sizes of 483.xalancbmk benchmark from SPEC 2006 (clang from trunk vs. gcc 4.6.x with Google patches):
11026073 Xalan_base.clang_O0

45882529 Xalan_base.clang_O0_g
11079688 Xalan_base.gcc_O0
16437776 Xalan_base.gcc_O0_gmlt
54221056 Xalan_base.gcc_O0_g

WDYT of implementing similar option in Clang? Clearly, there are two options:

  1. Don’t modify Clang codegen, but instead erase all the extra debug info by LLVM (pro - there actually is StripDebugInfo pass already, but it’s very short and easy,
    and would have to be patched a lot).
  2. Emit less information in Clang. We’re also quite interested if this may reduce the compilation time as well.
    Which approach looks better in your opinion? I’d like to start working on that enhancement, but certainly would be happy to hear some advice beforehand.

Hi!

Currently Clang “-g” flag emits full debug info, which is fine for debugging, but increases the binary size significantly.
It may be useful to produce less debug info, that is still enough for collecting nice stack traces with file names and line numbers,

but would introduce less overhead.

Sounds great!

Cary Coutant made a patch which does this for GCC (it didn’t hit trunk yet) -
reduces debug info to only descriptions of functions, extern variables, line number tables and inlined subroutine info
by setting “-gmlt” (“minimum line table”) or “-g1” flags.

“-gmlt” is a really unfortunate option name. Why not -gline-tables-only or something like that?

(See this patch at http://old.nabble.com/-patch–Add-new–gmlt-option-for-min.-debug-info-with-line-tables-(issue4440072)-td31482851.html
or http://codereview.appspot.com/4440072). This patch is used in Google for about 2 years already.

I get the following binary sizes of 483.xalancbmk benchmark from SPEC 2006 (clang from trunk vs. gcc 4.6.x with Google patches):
11026073 Xalan_base.clang_O0

45882529 Xalan_base.clang_O0_g
11079688 Xalan_base.gcc_O0
16437776 Xalan_base.gcc_O0_gmlt
54221056 Xalan_base.gcc_O0_g

WDYT of implementing similar option in Clang? Clearly, there are two options:

  1. Don’t modify Clang codegen, but instead erase all the extra debug info by LLVM (pro - there actually is StripDebugInfo pass already, but it’s very short and easy,
    and would have to be patched a lot).
  2. Emit less information in Clang. We’re also quite interested if this may reduce the compilation time as well.
    Which approach looks better in your opinion? I’d like to start working on that enhancement, but certainly would be happy to hear some advice beforehand.

#2 is definitely the right way to go, thanks!

-Chris

I was holding off trying to come up with ways to give advice and I've only come up with a couple. Unfortunately a lot of how things work is tied into the recursive route that we take. I think it may require a bit of an overhaul for how we end up limiting debug information, but I'm very interested in seeing it. If there's anything I can do to help let me know.

-eric

Eric,

I’ve uploaded an early draft (though, barely tested) at http://codereview.appspot.com/6015050/, could you PTAL?
The debug info is printed all around lib/CodeGen sources, so the patch looks quite messy - I had to cut off pieces of code that emit
different kinds of debug info about variables and types here and there. I’ve left a bunch of asserts in the patch inside
the functions that presumably should not be called (yeah, all this is so fragile). The patch description contains some comments about the goal
as well.

Do you think that this direction is fine in general?
With flag -gline-tables-only enabled, I get about 1.5x binary size increase on a large test.
I’m able to see stack traces with line numbers in gdb on a microtest as well.