DebugInfo: Purpose of call site tags

Hey folks,

I’m trying to wrap my head around the implementation, purpose, and costs involved in both the GCC-extension v4 and standard v5 DW_TAG_call_site, call site parameters, addresses, etc.

So picking up from some of the design discussion in https://reviews.llvm.org/D72489:

Me (Blaikie): I’m not sure why AT_call_return_pc would be needed at a tail call site as the debugger must ignore it. As for emitting DW_AT_low_pc under gdb tuning, I think this might be an artifact from the original GNU implementation.

Djordje: Yes, that is the GNU implementation’s heritage (I cannot remember why GCC generated the low_pc info in the case of the tail calls), but GNU GDB needs the low_pc (as an address) in order to handle the call_site and call_site_parameters debug info for non-tail calls. To avoiding the pc address info in the case of tail calls makes sense to me, since debuggers should avoid that info.

OK, so a few questions on that:

  1. Why would low_pc not be required for tail calls?
  2. Why is the v4 low_pc predicated on GDB tuning too? If we’re producing the call_site tag, what’s the point of that without an address?
  3. What features do these call_site tags enable (in the absence of call_site_parameters)?
  4. What’s the end goal in terms of what calls should be described in the DWARF? (describing literally every call sounds /super/ expensive) - they currently seem quite different between GCC and Clang on a few test cases I’ve tried, so it’s hard to tell the logic

(& if I understand correctly, the call_site_parameters are intended to work collaboratively between callees and callers, so if, say, a parameter value is caller saved & then clobbered in the callee - you could still print the value of that parameter by looking at the saved copy in the caller?)

Hey folks,

I’m trying to wrap my head around the implementation, purpose, and costs involved in both the GCC-extension v4 and standard v5 DW_TAG_call_site, call site parameters, addresses, etc.

So picking up from some of the design discussion in https://reviews.llvm.org/D72489:

Me (Blaikie): I’m not sure why AT_call_return_pc would be needed at a tail call site as the debugger must ignore it. As for emitting DW_AT_low_pc under gdb tuning, I think this might be an artifact from the original GNU implementation.

Djordje: Yes, that is the GNU implementation’s heritage (I cannot remember why GCC generated the low_pc info in the case of the tail calls), but GNU GDB needs the low_pc (as an address) in order to handle the call_site and call_site_parameters debug info for non-tail calls. To avoiding the pc address info in the case of tail calls makes sense to me, since debuggers should avoid that info.

OK, so a few questions on that:

  1. Why would low_pc not be required for tail calls?

I don’t think a meaningful return PC can be encoded at a tail call site. Control doesn’t transfer to PC+4 past the jump instruction when the callee returns (the PC is set to whatever the last saved return address is instead).

My understanding is that the point of AT_call_return_pc is to allow the debugger to present better backtraces, i.e. to implement a solver to figure out where to insert artificial tail call frames in the backtrace.

  1. Why is the v4 low_pc predicated on GDB tuning too? If we’re producing the call_site tag, what’s the point of that without an address?

I’m fuzzy on this but IIUC the low_pc attribute in a call site tag is the GNU predecessor to AT_call_return_pc. And a tag without return PC information just gives a hint to the debugger that the function contains a tail call.

  1. What features do these call_site tags enable (in the absence of call_site_parameters)?

At the moment, just artificial tail call frames, but there are some interesting potential future applications. E.g.: disambiguating backtraces in the presence of function merging (a bigger deal for swift than it is for clang - the call site tag for a thunk-call could record the “original”/unmerged/source-level callee), and surfacing rich(er) information about CFI failures at call sites.

  1. What’s the end goal in terms of what calls should be described in the DWARF? (describing literally every call sounds /super/ expensive) - they currently seem quite different between GCC and Clang on a few test cases I’ve tried, so it’s hard to tell the logic

The goal is to describe all calls that aren’t optimized out. At least, I’m not sure that there’s a leaner subset that would really be sufficient for Apple’s use cases, and the size overhead hasn’t caused issues internally. We could certainly add a mode to clang to elide some of this call site info, though.

(& if I understand correctly, the call_site_parameters are intended to work collaboratively between callees and callers, so if, say, a parameter value is caller saved & then clobbered in the callee - you could still print the value of that parameter by looking at the saved copy in the caller?)

Yep!

vedant

Hey folks,

I'm trying to wrap my head around the implementation, purpose, and costs involved in both the GCC-extension v4 and standard v5 DW_TAG_call_site, call site parameters, addresses, etc.

So picking up from some of the design discussion in Login

        Me (Blaikie): I'm not sure why AT_call_return_pc would be needed at a tail call site as the debugger must ignore it. As for emitting DW_AT_low_pc under gdb tuning, I think this might be an artifact from the original GNU implementation.

    Djordje: Yes, that is the GNU implementation's heritage (I cannot remember why GCC generated the low_pc info in the case of the tail calls), but GNU GDB needs the low_pc (as an address) in order to handle the call_site and call_site_parameters debug info for non-tail calls. To avoiding the pc address info in the case of tail calls makes sense to me, since debuggers should avoid that info.

OK, so a few questions on that:
1) Why would low_pc not be required for tail calls?

I don’t think a meaningful return PC can be encoded at a tail call site. Control doesn’t transfer to `PC+4` past the jump instruction when the callee returns (the PC is set to whatever the last saved return address is instead).

My understanding is that the point of AT_call_return_pc is to allow the debugger to present better backtraces, i.e. to implement a solver to figure out where to insert artificial tail call frames in the backtrace.

+1, but the GCC still generates the low_pc (the GNU ext. v4) even for the tail calls.

2) Why is the v4 low_pc predicated on GDB tuning too? If we're producing the call_site tag, what's the point of that without an address?

I’m fuzzy on this but IIUC the low_pc attribute in a call site tag is the GNU predecessor to AT_call_return_pc. And a tag without return PC information just gives a hint to the debugger that the function contains a tail call.

Yes, there were no such attribute at that moment representing something like that, and they picked the low_pc as a solution. In addition, if a call_site tag corresponds to a tail call, it should have a flag (DW_AT_call_tail_call/DW_AT_GNU_tail_call) indicating it is a tail call.

3) What features do these call_site tags enable (in the absence of call_site_parameters)?

At the moment, just artificial tail call frames, but there are some interesting potential future applications. E.g.: disambiguating backtraces in the presence of function merging (a bigger deal for swift than it is for clang - the call site tag for a thunk-call could record the “original”/unmerged/source-level callee), and surfacing rich(er) information about CFI failures at call sites.

4) What's the end goal in terms of what calls should be described in the DWARF? (describing literally every call sounds /super/ expensive) - they currently seem quite different between GCC and Clang on a few test cases I've tried, so it's hard to tell the logic

The goal is to describe all calls that aren’t optimized out. At least, I’m not sure that there’s a leaner subset that would really be sufficient for Apple’s use cases, and the size overhead hasn’t caused issues internally. We could certainly add a mode to clang to elide some of this call site info, though.

(& if I understand correctly, the call_site_parameters are intended to work collaboratively between callees and callers, so if, say, a parameter value is caller saved & then clobbered in the callee - you could still print the value of that parameter by looking at the saved copy in the caller?)

Yep!

vedant

Thanks,
Djordje