[RFC][DebugInfo][DWARF] .debug_line entry for a callsite of an inlined function

Debugging a function that can be inlined is problematic because its callsite is eliminated and there is no longer an instruction that corresponds to the source location of the call.

GCC (starting from v8.1) generates at least two .loc directives for the first instruction of an inlined function (meaning multiple records in .debug_line for the same address). The first .loc corresponds to the source location of the callsite while the second one refers to the source location of the instruction itself. Thus the debugger is able to stop on the first instruction of the inlined function if it’s asked to stop on the call.

Consider the following example:

$ cat test.cpp

01 void bar();
03 inline __attribute__((always_inline))
04 void foo() {
05   bar();
06 }
08 void test() {
09   foo();
10 }
$ gcc-11.2 -g -O3 -S -o -

        .file 1 "test.cpp"
        .loc 1 8 13 view -0
        .loc 1 9 3 view .LVU1 <- callsite
        .loc 1 4 6 view .LVU2
        .loc 1 5 3 view .LVU3
        .loc 1 5 6 is_stmt 0 view .LVU4
        jmp     bar()
$ bin/llvm-dwarfdump -debug-line test.o

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      8     13      1   0             0  is_stmt
0x0000000000000004      9      3      1   0             0  is_stmt
0x0000000000000004      4      6      1   0             0  is_stmt
0x0000000000000004      5      3      1   0             0  is_stmt
0x0000000000000004      5      6      1   0             0
0x0000000000000009      5      6      1   0             0  end_sequence

As you can see, bar() call which was a part of inlined function foo() corresponds to multiple .loc directives, one of them points to line 9 which is the callsite of foo().

gdb handles this as expected, one can set a breakpoint on line 9 and run step-in/step-out/finish for foo(). A breakpoint on line 05 also works as it should.
The only issue that can be observed is in some cases gdb shows the inlined function instead of its callsite:

(gdb) b test.cpp:9
Breakpoint 1 at 0x401116: file test.cpp, line 9.
(gdb) r
Starting program: a.out

Breakpoint 1, foo () at test.cpp:5
5      bar();

Clang-based toolchain ignores the issue whatsoever: neither the compiler produces .loc directives for an inlined callsite, nor lldb is able to handle multiple .loc for a single instruction. It ignores all .loc but the last one.

$ clang-tot -g -O3 -S -o -

test():                               # @test()
       .file   1 "test.cpp"
       .loc    1 8 0                           # test.cpp:8:0
       .loc    1 5 3 prologue_end              # test.cpp:5:3
       jmp     bar()                         # TAILCALL
$ bin/llvm-dwarfdump -debug-line test.o

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      8      0      0   0             0  is_stmt
0x0000000000000000      5      3      0   0             0  is_stmt prologue_end
0x0000000000000005      5      3      0   0             0  is_stmt end_sequence

Basically, having multiple .debug_line entries`for the same address seems to be a bit off standard:

6.2.5 The Line Number Program
As stated before, the goal of a line number program is to build a matrix
representing one compilation unit, which may have produced multiple
sequences of target machine instructions. Within a sequence, addresses and
operation pointers may only increase. (Line numbers may decrease in cases of
pipeline scheduling or other optimization.)

this is why lldb gives up if it faces multiple .debug_line entries for the same address (see the comment in lldb/source/Symbol/LineTable.cpp):

// Replace the last entry if the address is the same, otherwise append it. If
// we have multiple line entries at the same address, this indicates illegal
// DWARF so this “fixes” the line table to be correct. If not fixed this can
// cause a line entry’s address that when resolved back to a symbol context,
// could resolve to a different line entry. We really want a
// 1 to 1 mapping
// here to avoid these kinds of inconsistencies. We will need or revisit
// this if the DWARF line tables are updated to allow multiple entries at the
// same address legally.

On the other side, clang still may produce multiple .loc (thus multiple line info records) for the same address in other circumstances that are not connected with the issue we are considering.

My personal opinion is that the feature GCC/GDB provides is useful in engineering practice despite it’s a bit off the standard, but I’d like to ask the community to express their opinion on the subject. Does it make any sense to implement the same behavior for llvm/lldb?

1 Like

I think you have described the problem very thoroughly. The fundamental problem with extending the line table in this way is that it violates one of the relatively few “don’t do this” commandments in the DWARF spec. That means that emitting a line table that does have multiple “rows” per machine instruction is open to interpretation (or mis-interpretation, or outright rejection) by different consumers, as there is no standardized interpretation.

You might find this DWARF committee issue to be relevant: DWARF Issue
It introduces a “two-level” line table, specifically to address the inlining issue. If you think it doesn’t solve the problem, your feedback would be very welcome! I believe this extension has been implemented only in a fork of gcc, but it might be a direction to pursue.

Thank you, Paul, for the answer!

That means that emitting a line table that does have multiple “rows” per machine instruction is open to interpretation (or mis-interpretation, or outright rejection) by different consumers, as there is no standardized interpretation.

Agree, but I was about to have this under an option that is turned off by default or enabled only for particular consumers known to support this. That wold resolve the concern about possible misinterpretation. But having a standardized approach is better for sure.

I’ve looked at ‘Two-Level Line Tables’ feature (thank you so much for pointing to this!), and have a couple of questions about it (in case you’d have a chance to answer them or maybe you can point me where’d be better to ask them).

(1) Is there any defined release date for DWARFv6?
(2) Does ‘Deferred to DWARF Version 6’ mean that this feature is accepted for DWARFv6 or it’s going to be considered later, when the new standard will be scheduled?
(3) Does the ‘Logicals’ table allow multiple entries that map to the same address?
From the one side the proposal says that

In a two-level line number table, the logicals table would provide a mapping from each logical statement in a program to a recommended breakpoint location…

which seems to focus on ‘logical statements’ not ‘addresses’, and doesn’t explicitly disallow multiple logical statements to share the same address. But from the other side:

The actuals table would be optional, and when omitted, the logicals table would correspond to the DWARF v4 line number table.

…which doesn’t allow multiple entries to have the same address. Does the ‘Two-Level Line Tables’ feature make this an exception?

I was able to find the initial implementation (for gcc/gdb/gnu-binutils) and try the example above. Here are some details.

The generated assembly for test() is:

  .file 1 "test.cpp"
  .subprog 1 "" 1 8
  .lloc 1 1 8 subprog 1
  pushq %rbp
  movq  %rsp, %rbp
  .lloc 2 1 9 subprog 1
  .subprog 2 "_Z3foov" 1 4
  .lloc 3 1 5 subprog 2 context 2
  call  _Z3barv
  .lloc 4 1 10 subprog 1
  popq  %rbp

For the inlined foo() there are two .lloc directives that preceding foo()’s body.

  .lloc 2 1 9 subprog 1           // call of foo()
  .lloc 3 1 5 subprog 2 context 2 // inlined body of foo()

Logicals table looks as the following:

 Logicals Statements:
  [0x00000061]  Set context to 0 and subprogram to 1
  [0x00000064]  Extended opcode 2: set Address to 0x0
  [0x0000006f]  Advance Line by 7 to 8
  [0x00000071]  Copy
        Logical 1: 0x0[0] file 1 line 8 discrim 0 context 0 subprog 1 is_stmt 1
  [0x00000072]  Special opcode 44: advance Address by 4 to 0x4 and Line by 1 to 9
        Logical 2: 0x4[0] file 1 line 9 discrim 0 context 0 subprog 1 is_stmt 1
  [0x00000073]  Set context to 2 and subprogram to 2
  [0x00000076]  Advance Line by -4 to 5
  [0x00000078]  Copy
        Logical 3: 0x4[0] file 1 line 5 discrim 0 context 2 subprog 2 is_stmt 1
  [0x00000079]  Pop context to logical 2
  [0x0000007a]  Special opcode 54: advance Address by 5 to 0x9 and Line by 1 to 10
        Logical 4: 0x9[0] file 1 line 10 discrim 0 context 0 subprog 1 is_stmt 1
  [0x0000007b]  Set context to 0 and subprogram to 3
  [0x0000007e]  Special opcode 25: advance Address by 2 to 0xb and Line by 2 to 12
        Logical 5: 0xb[0] file 1 line 12 discrim 0 context 0 subprog 3 is_stmt 1
  [0x0000007f]  Set context to 0 and subprogram to 4
  [0x00000082]  Special opcode 74: advance Address by 7 to 0x12 and Line by 1 to 13
        Logical 6: 0x12[0] file 1 line 13 discrim 0 context 0 subprog 4 is_stmt 1

Having Logicals 2 that describes line 9 (the call of foo()) makes it possible for gdb to stop on that line (it wasn’t able to show the body of foo() thought, but I’m pretty sure there is enough information to make it working):

(gdb) b 9
Breakpoint 1 at 0x40057a: file test.cpp, line 9.
(gdb) r
Starting program: a.out

Breakpoint 1, test () at test.cpp:9
9      foo();
(gdb) s
foo () at test.cpp:9
9      foo();
(gdb) s
bar () at test.cpp:12
12    void bar() {return;}

As Logicals 2 and Logicals 3 share the same address, it seems, the implementation answers ‘Yes’ for the third question.

It seems I should have made the previous comment under Paul’s answer to make notifications work properly, so just tag @pogo59 here. Sorry for the noise.

Apologies for not noticing your second comment sooner, I’ve been looking into this a bit myself.

(3) Does the ‘Logicals’ table allow multiple entries that map to the same address?

I actually think the answer is a “No” - as you’ve noted, the only implementation of the Two-Level Line Tables proposal allows for a single address to be used by multiple Logicals. However, this is because gcc already allows that with the normal line table, which is against the DWARF spec. Outside of that, the requirement that the Logicals table degrade to the normal line table requires that multiple rows per address be disallowed, and nothing in the proposal suggests otherwise. However, in the Wiki article linked from the DWARF proposal, we also have the line:

When a single machine instruction corresponds to more than one source statement (e.g., due to optimizations such as common subexpression elimination), a separate row for the same address is added to the actuals table for each statement. These consecutive rows are then treated as a single row designating a set of logical statements that are associated with the instruction at that address.

This confuses things further; if you allow multiple logicals per address in the Actuals table, then you must also allow it in the Logicals table, because the latter essentially duplicates the former with its address column! As far as I can tell, it seems that the proposal is simply written with the incorrect assumption that multiple rows for the same address is not in conflict with the DWARF spec, or is otherwise treating it as normal behaviour (as in gcc/gdb).

In summary:

  • DWARF requires an address not be mapped to multiple source locations.
  • gcc, gdb, and the two-level line tables proposal all ignore this.

I agree with your overall opinion that mapping a single address to multiple source locations would be a useful feature, and I support adding it to LLVM behind an optional flag.

A small extra note on two-level line tables: it is not a necessary feature to enable multiple source locations per address, but it does speed things up for the debugger by encoding inlining information in the line table, avoiding the need to parse the DIE tree to determine if a DW_TAG_inlined subroutine covers a given address. I think a modified version of the proposal would be useful: as I mentioned above, the address column in the Logicals table duplicates some of the information in the Actuals table - it would be easy to just drop the address column from the Logicals table altogether, dropping the ability to trivially decay to the ordinary line table, but removing the unnecessary duplication while gaining the other benefits of the proposal. Notably, the Checkpoint-Based Debugging proposal could make stepping in the multiple-locations-per-address cases more intuitive.

Sorry for a late reply, I’ve been meaning to get back to this.

No. The committee is actively reviewing proposals, and that process usually goes on for a couple of years. Eventually we declare an endpoint and go into a final document review, but there’s no set schedule for that.

The latter; it’s on the list for consideration as part of v6, but we haven’t gotten to it yet.

@StephenTozer and I had a side conversation about this. IMO it would have to, and you came to the same conclusion. The formal proposal to the DWARF committee doesn’t explicitly state one way or the other. I’ve been meaning to write up something for the committee list about this, and see what the author of the proposal thinks. But of course I haven’t done that yet :disappointed:

@StephenTozer, @pogo59 thank you for your replies and sorry for the delay in response.

If I understand the things correctly, multiple .debug_line entries with the same address is something that needs to be explicitly allowed and supported either way. Moreover, as it is already out in the wild (thanks to GNU toolchain), it seems to make sense to take advantage of it.

So, I would like to decouple things a bit.

On the one hand we have the standardization problem, and solving that seems to be a long journey.

On the other hand there is the feature I started with. It aims to allow setting breakpoints on inlined callsites, which is thought to be important for optimized code debugging. It relies on non-standardized functionality (multiple .debug_line entries per address), though there is a debugger that has already supported this.

From LLVM’s perspective, this feature seems to be

  • small and isolated (it can be based on the existing information in debug metadata and requires only changes to ASMPrinter),
  • safe (surely, it should be guarded by an option and/or enabled for GDB tuning only until more consumers start to support it),
  • not something completely new to LLVM, since LLVM may already produce multiple locations for a single address (the example in my first message proves this).

It could be further extended by introducing more .debug_line entries for optimized locations, not just for inlined callsites. This kind of change will be more invasive, but still can be done nicely.

In my opinion, it can be done before or in parallel with the work needed to standardize the feature. Moreover, I believe it doesn’t matter which way the feature will be standardized, it will require similar functionally from the compiler modulo the code emission part (which depends on assembly directives and DWARF syntax).

Does it make sense?

As of adding a feature to the standard, the ‘Two-Level Line Tables’ proposal looks promising. But as it has been already mentioned there are some places worth clarifying. @pogo59, you mentioned that you thought about writing up something for the committee list about this, and see what the author of the proposal thinks. Would you mind doing so, please? Is there anything I can do to help?

Looking at the ‘Support for Checkpoint-Based Debugging’ section on the wiki page (thank you, @StephenTozer, for pointing to this!) I started thinking that implicitly disallowing multiple ’Logicals’ entries per address was intentional.

It says:

With the current line number tables, single-stepping or moving from one breakpoint to the next requires that each logical statement have its own address, so that the debugger can execute at least one machine instruction per statement.

Having multiple line entries per address introduces an ambiguity on what location should be displayed for the given address, if there are multiple ones. I guess, currently, GDB uses some trivial logic, like displaying the last location in a sequence of line entries with the same address (I haven’t looked closely at the code, but a couple of experiments showed this like that). ‘Checkpoint debugging’ seems to be an alternative option.

This makes me think that the problem of mapping multiple locations to a single machine address is a bit out of scope of this proposal. Maybe it is worth considering separately?

I think there’s certainly a lot that goes beyond the basic proposal of supporting breakpoints on calls to inlined functions. I think either way, we need to support multiple locations to a single machine address to support that feature - the problem being that unless we pick some address for the function call (which will need to be shared with an another source location in most cases), it can’t appear in the line table at all. I think we could technically get around that by using the DIE tree, but that would be very poor for performance, especially when debugging large programs.

As you’ve said, we already have a compiler and a debugger that produce this kind of output, it shouldn’t be too difficult to mimic what it does. We can start off only doing so for inlined calls, and later move on to other optimized code cases, although I think it’s a pretty small jump to make - all the hard parts will be in getting it working for any cases to begin with. But the RFC doesn’t need to be bogged down by all the things that we could do later on, so I’m in support of the basic implementation coming in first, especially since imo the inability to break on inlined calls is one of the more significant issues with stepping right now.

FWIW, I think it’s more than having duplicate entries in the line table - llvm-dwarfdump probably isn’t printing some other features/extensions to the line table that GCC/GDB are using. The line directives having “view” in them probably communicates some extra flag to emit into the line table.

All that said, I don’t necessarily have a problem with implementing this in LLVM, enabled when tuning for gdb. (whether or not lldb would want patches that add support for it there too, I don’t know) though the API might be weird.

Though the existing instances of LLVM producing multiple line entries for the same instruction are bugs and should be fixed at some point (the assembler should ignore the previous line change if the line changes again before the next instruction is emitted - for normal (non-view) line table entries). I don’t think the existence of those duplicates should be an argument for/against implementing GCC/GDB’s view support in LLVM.

Thank you, @dblaikie for mentioning this!
Actually, I wasn’t aware that there is a proposal for the ‘Location Views’ feature. Here it is DWARF Issue.

I’m not insisting this is a good/primary argument for adding the feature. I just wanted to mention that this is already the case, and having this patch [MC/DWARF] Generate multiple .debug_line entries for adjacent .loc di… · llvm-mirror/llvm@4df4bcc · GitHub, it seems, this behavior, at least, was (if not is) intentional.

huh, fascinating - I didn’t know that behavior had been added intentionally, I’d assumed it was a bug.

[MC/DWARF] Generate multiple .debug_line entries for adjacent .loc directives - it was from back in the day when I was fairly new to working on DWARF, though I was around for the conversation.

I guess not entirely surprising given GCC/GDB’s line view work, that this had an effect, but imho, it’s still GDB and the non-integrated assembler that were buggy, rather than LLVM’s integrated assembler - duplicate line entries, if they mean anything, describe a zero-length region with one line, then a non-zero length region with the next address. The first one carries no information, since it describes no addresses - it doesn’t seem reasonable to apply its location description over to the addresses described by the following line entry. (it’s a half open range, and we’d generally treat [x, x) as the empty range, not as equivalent to [x, x+1))