Getting clang to give dwarf info for macro expansions?

Does anyone know if/how I can get clang to generate dwarf information
that lets me trace / debug *into* macro-generated code?

It seems that when a C program invokes a macro, all of the source code
produced by that macro invocation is attributed to the line at which
the macro is called. I've seen this in both dwarfdump and gdb, as
described below. (The exception seems to be when a macro expansion
causes the definition of a new function. That new function shows up
in the dwarf info as one might hope.)

What I'd ideally like is for tracing tools to clearly describe which
branches within macro-generated code were taken. When all of the
source code generated by a macro invocation is attributed to the macro
*call* site by the dwarf information, that level of precision seems
impossible.

For example, suppose I have this code:

foo.c

#define FOO \
   if (x == 3) { \
   return 1; \
   }

int bar(int x) {
   FOO
   return 0;
}
<<<<<

And I compile it with this command:
clang -c -g -Xclang -dwarf-column-info foo.c

And I look at the dwarf information using the command:
objdump -dgls foo.o

I get this:
...
Disassembly of section .text:

0000000000000000 <bar>:
bar():
/tmp/foo.c:6
   0: 55 push %rbp
   1: 48 89 e5 mov %rsp,%rbp
   4: 89 7d f8 mov %edi,-0x8(%rbp)
/tmp/foo.c:7
   7: 81 7d f8 03 00 00 00 cmpl $0x3,-0x8(%rbp)
   e: 0f 85 0c 00 00 00 jne 20 <bar+0x20>
  14: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
  1b: e9 07 00 00 00 jmpq 27 <bar+0x27>
/tmp/foo.c:8
  20: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
/tmp/foo.c:9
  27: 8b 45 fc mov -0x4(%rbp),%eax
  2a: 5d pop %rbp
  2b: c3 retq
...

Note that all of the reported source locations are in lines 6-9;
nothing is attributed to lines 1-3.

To double-check that I really can't get detailed trace information
from macro-generated code, I added a main() function and stepped
through the code with gdb. Here's what I got:

(gdb) list
4 }
5
6 int bar(int x) {
7 FOO
8 return 0;
9 }
10
11 int main(int argc, const char* argv )
12 {
13 return bar( argc );
(gdb) break main
Breakpoint 1 at 0x4004d6: file foo.c, line 13.
(gdb) run
Starting program: /tmp/a.out

Breakpoint 1, main (argc=1, argv=0x7fffffffdf98) at foo.c:13
13 return bar( argc );
(gdb) step
bar (x=1) at foo.c:7
7 FOO
(gdb) step
8 return 0;
(gdb) step
9 }
(gdb) step
__libc_start_main (main=0x4004c0 <main>, argc=1, argv=0x7fffffffdf98,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffdf88) at libc-start.c:321
321 libc-start.c: No such file or directory.
(gdb)

DWARF doesn’t really have a way to describe multiple source locations for a single instruction, so far as I know.

I imagine we could add something like DW_TAG_inlined_subroutine (though it wouldn’t be quite as simple, since they’re not necessarily strictly nested), and/or the new two-level line tables that’re being proposed/worked on.

DWARF does have a macro section, but I believe this is just to describe the original macro templates (so you can use them in debugger expressions, etc) not to provide source fidelity for macro uses.

  • David

Thanks David. I'm certainly no dwarf expert, but I was wondering if
we could just have something like this:

foo.c:6:1
(various machine instructions)
bar.h:20
(some machine instruction resulting from a macro call at, for example,
line 7 of foo.c)
foo.c:8:1
(more machine instructions)

Anyway, I'm not proposing a modification. I was just curious if there
was some existing way to get better insight into the object code
associated with macro-generated source.

Thanks David. I'm certainly no dwarf expert, but I was wondering if
we could just have something like this:

foo.c:6:1
(various machine instructions)
bar.h:20
(some machine instruction resulting from a macro call at, for example,
line 7 of foo.c)
foo.c:8:1
(more machine instructions)

Yeah, I haven't experimented with that - it's something that could be tried
(not high enough on my list) but I suspect would be confusing to users,
because they'd have no context as to where in the actual function they were
(if the macro was used twice, which macro instantiation were they in, etc).

Anyway, I'm not proposing a modification. I was just curious if there
was some existing way to get better insight into the object code
associated with macro-generated source.

I believe the short answer is: no, there is no existing way to get better
insight into code from macros. (but I could be wrong)

- David

From: cfe-dev-bounces@cs.uiuc.edu [mailto:cfe-dev-bounces@cs.uiuc.edu] On Behalf Of David Blaikie
> Thanks David. I'm certainly no dwarf expert, but I was wondering if
> we could just have something like this:
>
> foo.c:6:1
> (various machine instructions)
> bar.h:20
> (some machine instruction resulting from a macro call at, for example,
> line 7 of foo.c)
> foo.c:8:1
> (more machine instructions)

Yeah, I haven't experimented with that - it's something that could be tried
(not high enough on my list) but I suspect would be confusing to users,
because they'd have no context as to where in the actual function they were
(if the macro was used twice, which macro instantiation were they in, etc).

If your debugger is showing the equivalent of disassembly with interspersed
source, that's not a problem. The macro definition shows up twice, each case
attached to the instructions for that use of the macro. It's not intrinsically
much different from multiple inlined calls to the same function.

> Anyway, I'm not proposing a modification. I was just curious if there
> was some existing way to get better insight into the object code
> associated with macro-generated source.

I believe the short answer is: no, there is no existing way to get better
insight into code from macros. (but I could be wrong)

I suppose you could preprocess the source, filter out the #line directives,
and then build/debug the preprocessed source, but that's probably more work
for less gain than you really want.
--paulr

> From: cfe-dev-bounces@cs.uiuc.edu [mailto:cfe-dev-bounces@cs.uiuc.edu]
On Behalf Of David Blaikie
> > Thanks David. I'm certainly no dwarf expert, but I was wondering if
> > we could just have something like this:
> >
> > foo.c:6:1
> > (various machine instructions)
> > bar.h:20
> > (some machine instruction resulting from a macro call at, for example,
> > line 7 of foo.c)
> > foo.c:8:1
> > (more machine instructions)
>
> Yeah, I haven't experimented with that - it's something that could be
tried
> (not high enough on my list) but I suspect would be confusing to users,
> because they'd have no context as to where in the actual function they
were
> (if the macro was used twice, which macro instantiation were they in,
etc).

If your debugger is showing the equivalent of disassembly with interspersed
source, that's not a problem. The macro definition shows up twice, each
case
attached to the instructions for that use of the macro. It's not
intrinsically
much different from multiple inlined calls to the same function.

Well the major difference is that with inlined call information you can use
"bt" to see the context (which call site the inlined subroutine came from).

Sure. And if you step into a macro, you can still see what your current frame is, even if it won’t tell you what line within the function—big deal.

#include of function fragments would be a similar case, although that’s not a style common in C/C++ (it is in COBOL, which doesn’t have a macro facility, and I can assure you that in practice it works out okay).

–paulr