Adding short backtrace debuginfo

jyn514 · January 21, 2025, 9:28pm

I want to change the Rust standard library’s short backtrace printing to be extensible. I opened Generate `DW_AT_RUST_short_backtrace` attributes for `DISubprogram` nodes by jyn514 · Pull Request #123683 · llvm/llvm-project · GitHub and @dblaikie suggested that I talked about the design and goals before we got bogged down in the implementation. So! Here goes.

The Rust standard library has two styles for printing backtraces at runtime:

Full backtraces. These work in the obvious way.
Short backtraces. These filter out “unimportant” frames that are likely not related to the developer’s bug. For example, frames like __libc_start_main, _Unwind_Resume, and rust runtime internals like std::rt::lang_start are filtered out.

Currently, the Rust runtime determines “unimportant” frames by looking directly at un-mangled symbol names of generated functions. This is not extensible, and involves a state machine that requires the frames to be present at runtime; in particular the frames must be marked noinline, impeding optimizations.

I want to allow individual frames to be marked as “unimportant” for the purpose of backtraces, using a new DWARF vendor extension attribute. PDB doesn’t appear to be extensible and so I haven’t tried to implement this there; @wesleywiser suggested I used llvm.codeview.annotation for PDB, but I’m leaving that for future work.

Ideally, this would would be extensible to other languages and codegen backends; I would love to see llvm-symbolizer have a mode for printing short backtraces instead of the full backtrace.

I added the following enum API:

enum class ShortBacktraceAttr {
  SkipFrame = 0,
  StartShortBacktrace = 1,
  EndShortBacktrace = 2,

SkipFrame indicates only the current frame should be skipped. StartShortBacktrace and EndShortBacktrace control this state machine in the rust runtime: rust/library/std/src/sys/backtrace.rs at master · rust-lang/rust · GitHub. I don’t think they can be replicated only with SkipFrame; in particular, StartShortBacktrace is necessary so that we can hide frames before main, for which we don’t control the debuginfo. If this is to be extensible to other languages, we also want EndShortBacktrace so that we can have a start/end pair that lets this work across shared object libraries, or if part of the code was written in different language.

@dblaikie you suggested that this could instead be an enum { Skip, Print, Inherit } enum; presumably you intended for Inherit to be the default if there’s no debuginfo present at runtime. But I don’t think this is a general enough mechanism. Consider a program like this one:

// lib.rs
#[rustc_skip_short_backtrace] // Skip
pub fn foo() { panic!(); }

// library/std/src/panic.rs
pub fn catch_unwind(f: fn()) { f(); }

// main.rs
fn main() {
  std::panic::catch_unwind(foo);
}

First, we generate foo with a Skip backtrace annotation. To keep the same behavior as my proposal, we need some way to print the frame for catch_unwind without printing the frame for foo. So we can’t use Inherit. But we also need to not print catch_unwind when it’s used before main in the runtime startup (see the worked example below). So we can’t use Print. So I don’t think your idea works.

Here are some worked examples of the new attribute, taken from the Rustc test suite:

A trivial program which immediately panics:

fn main() { panic!() }

With short backtraces:

thread 'main' panicked at src/main.rs:11:5:
explicit panic
stack backtrace:
        [... omitted 17 frames ...]
  18:     0x573969c95d9d - example::main::h0cbc0be966554fbd
                               at /home/jyn/src/example/src/main.rs:11:5
        [... omitted 18 frames ...]
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

With full backtraces:

thread 'main' panicked at src/main.rs:11:5:
explicit panic
stack backtrace:
   0:     0x573969cb557a - std::backtrace_rs::backtrace::libunwind::trace::h5248f59125b65dcb
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x573969cb557a - std::backtrace_rs::backtrace::trace_unsynchronized::h51f8c2f0c1f665a8
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x573969cb557a - std::sys::backtrace::_print_fmt::h394536ef105dc1ee
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/sys/backtrace.rs:66:9
   3:     0x573969cb557a - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hdad3ec861e1bc3c2
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/sys/backtrace.rs:39:26
   4:     0x573969cd2553 - core::fmt::rt::Argument::fmt::h1fff0e041375e022
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/fmt/rt.rs:177:76
   5:     0x573969cd2553 - core::fmt::write::hd3d2ae2bd7022d6c
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/fmt/mod.rs:1440:21
   6:     0x573969cb2e43 - std::io::Write::write_fmt::h6f807cd45fe0ec3f
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/io/mod.rs:1888:15
   7:     0x573969cb53c2 - std::sys::backtrace::BacktraceLock::print::hd89057abd6064d03
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/sys/backtrace.rs:42:9
   8:     0x573969cb630f - std::panicking::default_hook::{{closure}}::ha5006fae9f6b3890
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:298:22
   9:     0x573969cb617a - std::panicking::default_hook::hb2297b08dc8057bb
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:325:9
  10:     0x573969cb6be2 - std::panicking::rust_panic_with_hook::hc90599a27179187c
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:831:13
  11:     0x573969cb6a7a - std::panicking::begin_panic_handler::{{closure}}::hef9dccd4fc0fa6b6
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:704:13
  12:     0x573969cb5a79 - std::sys::backtrace::__rust_end_short_backtrace::h34e3b56edd49a65f
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/sys/backtrace.rs:168:18
  13:     0x573969cb670d - rust_begin_unwind
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:695:5
  14:     0x573969cd1960 - core::panicking::panic_fmt::hab3db7cb7603f25e
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/panicking.rs:75:14
  15:     0x573969cd1ae6 - core::panicking::panic_display::h06ed683e585343e8
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/panicking.rs:261:5
  16:     0x573969cd1ae6 - core::panicking::panic_explicit::hfe0b9a0df85a8a12
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/panicking.rs:234:5
  17:     0x573969c95daa - example::main::panic_cold_explicit::h59bb81719ac049b2
                               at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic.rs:88:13
  18:     0x573969c95d9d - example::main::h0cbc0be966554fbd
                               at /home/jyn/src/example/src/main.rs:11:5
  19:     0x573969c95d4b - core::ops::function::FnOnce::call_once::h7cdd469612e13c58
                               at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  20:     0x573969c95d0e - std::sys::backtrace::__rust_begin_short_backtrace::hd598eb18a7a773b1
                               at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:152:18
  21:     0x573969c95ce1 - std::rt::lang_start::{{closure}}::he0cb2971e4611cae
                               at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:194:18
  22:     0x573969cb0c70 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h1662fe2ef888d630
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/core/src/ops/function.rs:284:13
  23:     0x573969cb0c70 - std::panicking::try::do_call::h3d18bcf005343ff3
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:587:40
  24:     0x573969cb0c70 - std::panicking::try::h9d6374bf9286b4b5
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:550:19
  25:     0x573969cb0c70 - std::panic::catch_unwind::h8bed5993af4f99ba
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panic.rs:358:14
  26:     0x573969cb0c70 - std::rt::lang_start_internal::{{closure}}::hb5f804e6afeba7e4
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/rt.rs:163:24
  27:     0x573969cb0c70 - std::panicking::try::do_call::h8a13ba7f8d7b0ead
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:587:40
  28:     0x573969cb0c70 - std::panicking::try::h370a7fcea6779d1d
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panicking.rs:550:19
  29:     0x573969cb0c70 - std::panic::catch_unwind::h3e6c1755441ed33e
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/panic.rs:358:14
  30:     0x573969cb0c70 - std::rt::lang_start_internal::h481232870f10ba8e
                               at /rustc/049355708383ab1b9a1046559b9d4230bdb3a5bc/library/std/src/rt.rs:159:5
  31:     0x573969c95cc7 - std::rt::lang_start::he943d34afc9204bb
                               at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:193:5
  32:     0x573969c95dce - main
  33:     0x71ddee229d90 - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  34:     0x71ddee229e40 - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:392:3
  35:     0x573969c95bc5 - _start
  36:                0x0 - <unknown>

A program that uses these attributes to control the printing of its own backtrace

 // Has no effect, since we already have a inner function with #[rust_end_short_backtrace]
  #[rustc_end_short_backtrace]
  fn first() {
      second();
  }

  #[rustc_end_short_backtrace]
  fn second() {
      third(); // won't show up in backtrace
  }

  fn third() {
      fourth(); // won't show up in backtrace
  }

  fn fourth() {
      fifth(); // won't show up in backtrace
  }

  #[rustc_start_short_backtrace]
  fn fifth() {
      sixth();
  }

  fn sixth() {
      seven();
  }

  fn seven() {
      panic!("debug!!!");
  }

  fn main() {
      first();
  }

With short printing:

  stack backtrace:
        [... omitted 14 frames ...]
     0: short_ice_remove_middle_frames::seven
               at $DIR/short-ice-remove-middle-frames.rs:44:5
     1: short_ice_remove_middle_frames::sixth
               at $DIR/short-ice-remove-middle-frames.rs:40:5
        [... omitted 3 frames ...]
     2: second
               at $DIR/short-ice-remove-middle-frames.rs:23:5
     3: first
               at $DIR/short-ice-remove-middle-frames.rs:18:5
     4: short_ice_remove_middle_frames::main
               at $DIR/short-ice-remove-middle-frames.rs:48:5
        [... omitted 16 frames ...]

Without short printing:

[ ... ]
   1: short_ice_remove_middle_frames::seven
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:44:5
   2: short_ice_remove_middle_frames::sixth
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:40:5
   3: short_ice_remove_middle_frames::fifth
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:36:5
   4: short_ice_remove_middle_frames::fourth
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:31:5
   5: short_ice_remove_middle_frames::third
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:27:5
   6: short_ice_remove_middle_frames::second
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:23:5
   7: short_ice_remove_middle_frames::first
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:18:5
   8: short_ice_remove_middle_frames::main
             at ./tests/ui/panics/short-ice-remove-middle-frames.rs:48:5
[ ... ]

In this particular case, the program could use SkipFrame three times instead of a Start/EndShortBacktrace pair, or the compiler could do such a transformation internally; but this is not true in the general case.

adrian.prantl · January 22, 2025, 9:55pm

I suppose you also want this to work in other debuggers, but are you aware of LLDB’s capability to have a plugin determine which frames should be displayed in backtraces?

github.com/llvm/llvm-project

[lldb] Extend frame recognizers to hide frames from backtraces

llvm:main ← adrian-prantl:126629381

opened 11:38PM - 15 Aug 24 UTC

adrian-prantl

+424 -75

Compilers and language runtimes often use helper functions that are fundamentall…y uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

dblaikie · January 22, 2025, 11:12pm

Yep, nevermind my skip/inherit/shenanigans - I understand the original Rust behavior/etc better now. I hadn’t realized the begin/end rust helpers were dispatchers that became part of the stack trace - had assumed something else. Nevermind.

Could this generalize to being possibly useful for an interactive debugger too? Like often people want to step into, say, a std::function call, but not bother with all the implementation details of std::function’s type erasure and dispatch. So std::function’s op() could say “end short backtrace” and then just before it dispatches to the user code again (or, even better, some way to say the /next/ frame is the one to include - perhaps being able to have the attribute on a DW_TAG_call_site) “begin short backtrace” again.

So I guess “skip” could be implemented by putting “end” on this function and “start” on every function it calls? Though that’d fail to preserve the behavior of some outer usage that might’ve already wanted to skip this function and several others… so if we wanted that we’d need some “restore” behavior. Then we’d need to know how to pair these attributes so as to say “this undoes the effect of /that/” which is probably difficult/too complicated.

Would be interesting to know how this compares to/compliments the lldb recognizer thingy.

I have certainly seen problems with some of lldb’s frame behavior - I think stepping into std::function is one example at least at some point (perhaps it’s been fixed? Not sure) - you can’t step into std::* code by default, it steps over. Even when user code is called indirectly by that std::* code. It’d be great to get the best of both worlds - skipping all the implementation goo, but still being able to step back into user code.

I guess that might motivate the push/pop/paired behavior - if the standard library uses a std::function to call into itself, you would’nt want the call into the functor that is itself more standard library code, to suddenly enable showing stack frames. Nor would you want to have to annotate every entry into the standard library from some point that could be going out of the standard library (ie: have to annotate every functor you pass to std::function to say "but this is going back into the standard library, so should be “skip” again)

Not sure we’d get the standard library to actually add these annotations even if we made them cheaper - putting an attribute on every public function in the standard library seems a bit expensive.

jyn514 · January 23, 2025, 12:42am

I suppose you also want this to work in other debuggers, but are you aware of LLDB’s capability to have a plugin determine which frames should be displayed in backtraces?

The Rust runtime is not running in a debugger. These backtraces (by default) get printed whenever a program panics, so even a simple hello world can trigger a backtrace if e.g. it gets EPIPE when printing.

The extensible LLDB plugin is cool! I think if we were to combine them, we would add a LLDB plugin that reads this debuginfo. I don’t want to implement this first as an LLDB plugin because it wouldn’t help with the primary use case, which is the backtrace that gets printed by the runtime.

Could this generalize to being possibly useful for an interactive debugger too?

That seems very reasonable to me; I would welcome an LLDB plugin that reads this debuginfo in order to determine which frames are shown. I don’t have time to write such a plugin myself, though.

often people want to step into, say, a std::function call, but not bother with all the implementation details of std::function’s type erasure and dispatch . So std::function’s op() could say “end short backtrace” and then just before it dispatches to the user code again

that seems like a good use case. how many frames are we talking about here? if it’s less than, say, 5, I think it would be simpler to just annotate each frame with SkipFrame rather than try and mess with Start/EndShortBacktrace; the latter seems like a very heavy hammer.

That said, I like the idea of making Start/EndBacktrace nestable. That doesn’t need any compile-time support from LLVM I don’t think, it just needs the runtime that prints these backtraces (rust’s std::rt, LLDB, or llvm-symbolizer) to track how many levels of StartShortBacktrace it’s seen so far.

(or, even better, some way to say the /next/ frame is the one to include - perhaps being able to have the attribute on a DW_TAG_call_site) “begin short backtrace” again.

Hmm, it makes sense to me that you’d want that behavior. Do you know if DWARF defines the semantics if you include multiple attributes on the same DW_TAG_subprogram? If it allowed multiple, we could include both EndBacktrace and SkipFrame on the same function, and that would allow the library author to choose whether they want to include the last frame or not (rather than the runtime hard-coding it).

putting an attribute on every public function in the standard library seems a bit expensive.

I do not think we should annotate every public function in the standard library with this attribute, regardless of whether we’re talking about the Rust standard library or the C++ standard library. The heuristic I’ve been using for rust is:

Does this function panic unconditionally, or is it part of the panic runtime? If so, skip it. (core::option::unwrap_failed, rust_begin_unwind).
Is this function an implementation detail of the language? If so, skip it. (core::ops::function::FnOnce::call_once; I think this corresponds to std::function in c++)

But I would be hesitant to put it on something like the Vec indexing operator, and I would be strongly against putting it on something like u32::isqrt, because in that case knowing exactly which function is being called is important to know what went wrong in your program. Information hiding is useful to the extent it makes debugging easier instead of harder, and sometimes you do really need to know what the standard library is doing.

jyn514 · January 23, 2025, 12:45am

alternatively, we have lots of bitwidth to play with - we could use a flags kind of approach, where you can set SkipFrame | EndShortBacktrace on a single dwarf attribute, and then the runtime does bitwiddling to parse it back out.

pogo59 · January 23, 2025, 6:46pm

DWARF does not define the semantics if you include multiple copies of the same attribute on the same DW_TAG_subprogram (or any other DIE). If you want both EndBacktrace and SkipFrame on the same function, you’d want to define a separate attribute value that meant both of those things.

adrian.prantl · January 23, 2025, 8:38pm

StartShortBacktrace is necessary so that we can hide frames before main, for which we don’t control the debuginfo.

If that’s the only reason, wouldn’t it be a better solution to just special-case the main symbol in the backtracer, instead of introducing a DWARF construct that is only going to be used once per program and always on the same function?

jyn514 · January 23, 2025, 11:14pm

I don’t think so, no, because the function is different depending on which language is calling LLVM. For instance rust wants to put this attribute on a closure in std::rt::lang_start, this one here: rust/library/std/src/rt.rs at backtrace-debuginfo · jyn514/rust · GitHub

You could say that the backtracer is responsible for hiding frames before main and the rust compiler is responsible for injecting SkipFrame debuginfo on all frames between main and the lang_start closure; but that’s a lot of manual work for each language that wants this feature. And it doesn’t address the catch_unwind use case from the start of the post, where the function should sometimes be hidden and sometimes shown depending on its position in the callgraph.

dblaikie · January 28, 2025, 7:20pm

Mostly assuming the libc++ maintainers may not want to fuss around with updating the attributes when they change the implementation details, or those implementation details may be used in other places, etc.

Hmm, maybe that’s all it needs. Not sure - the ability for the attribute to say where to “start” (for the main-like function example) even though no one’s “ended” it seems like it might be unclear whether something is nested or not…

If you’re walking from the outer-most/earliest function call (bottom of the stack) - I guess any repeated action (start or end) could be treated as a level to count… imagine if libc said “start+skip” on whatever function calls main (so main doesn’t have to say it itself) but then you layer Rust or something else on top of this so main is in that language’s runtime, does some stuff and then says its own “start+skip” before reaching its entry point. I guess we’d then say the earlier start is nested inside the later one, ignoring all the frames from both.

Then later on you could visit an end+start pair for std::function-like things, similarly that could be nested (some language has an abstraction built on top of std::function) - makes it a bit hard for the parser since there’s no known value for nesting level at the top or bottom of the stack…
I guess once you find all the start/end actions, you can figure out the minimum value of nesting and call that the level at which frames are rendered for short back traces…
Maybe it’s simpler than I’m picturing, but it does feel a bit awkward, but maybe necessary.

As someone else mentioned - DWARF doesn’t really support multiple attributes of the same name on a single entity (DWARF uses child entities when that’s necessary, but they use more bits to encode, unfortunately). A bit mask/pattern seems plausible - though I’m not immediately thinking of an existing example of that in DWARF, so maybe that’s frowned upon for some reason.

putting an attribute on every public function in the standard library seems a bit expensive.

I do not think we should annotate every public function in the standard library with this attribute, regardless of whether we’re talking about the Rust standard library or the C++ standard library. The heuristic I’ve been using for rust is:

Does this function panic unconditionally, or is it part of the panic runtime? If so, skip it. (core::option::unwrap_failed, rust_begin_unwind).

Is this function an implementation detail of the language? If so, skip it. (core::ops::function::FnOnce::call_once; I think this corresponds to std::function in c++)

But I would be hesitant to put it on something like the Vec indexing operator, and I would be strongly against putting it on something like u32::isqrt, because in that case knowing exactly which function is being called is important to know what went wrong in your program. Information hiding is useful to the extent it makes debugging easier instead of harder, and sometimes you do really need to know what the standard library is doing.

“implementation detail of the language” might be a trickier one for C++, since so much of it is in the standard library (which I guess you’re considering not necessarily to be “implementation details of the language”) - like std::function has no tight coupling with the language, it can be written entirely in C++ code without any special blessings.

I imagine to get lldb’s current skipping behavior, it’d boil down to annotating every standard library function - I think it skips them all right now. Is that right @adrian.prantl ?

How would Apple feel about this? If the LLDB frame skipper stuff was baked into DWARF, used when rendering back traces? Would that be a good thing/something Apple was interested in, or do you consider the LLDB frame skipping to be too optimized for the debugging scenario and at odds with what you’d want elsewhere/unable to be known at compile-time?

adrian.prantl · January 28, 2025, 8:32pm

It probably wouldn’t be sufficient to replace the existing mechanism, because the LLDB frame recognizers can also cover use-cases where you want to hide uninteresting frames in system libraries that you don’t have any debug info for.

This proposal is a good fit for use-cases where you build everything from source and can rely on having debug info for every frame.

dblaikie · January 28, 2025, 9:25pm

Could it at least take over for frames with debug info?

adrian.prantl · January 28, 2025, 9:32pm

In Swift we mostly use the plugin to hide thunks and runtime function for which we typically don’t emit debug info. @Michael137, do you think this feature would be useful for functions defined in the libcxx headers?

jyn514 · February 6, 2025, 1:38am

dblaikie:

If you’re walking from the outer-most/earliest function call (bottom of the stack) - I guess any repeated action (start or end) could be treated as a level to count… imagine if libc said “start+skip” on whatever function calls main (so main doesn’t have to say it itself) but then you layer Rust or something else on top of this so main is in that language’s runtime, does some stuff and then says its own “start+skip” before reaching its entry point. I guess we’d then say the earlier start is nested inside the later one, ignoring all the frames from both.

Then later on you could visit an end+start pair for std::function-like things, similarly that could be nested (some language has an abstraction built on top of std::function) - makes it a bit hard for the parser since there’s no known value for nesting level at the top or bottom of the stack…
I guess once you find all the start/end actions, you can figure out the minimum value of nesting and call that the level at which frames are rendered for short back traces…
Maybe it’s simpler than I’m picturing, but it does feel a bit awkward, but maybe necessary.

i think this seems complicated because you are imagining walking the stack from top (_start) to bottom (the frame where the program is paused). that is not how backtrace printing normally works. normally it goes from bottom to top, because you have the stack pointer for the current frame and you follow that back up to the top. this is why sometimes gdb says “previous frame inner to this frame (corrupt stack?)”, because it doesn’t know the top frame a-priori.

if you walk from bottom to top, i think this is much simpler. every time we see a start attribute, we increment the level of nesting; every time we see an end attribute, we decrement it. for any frame, if the level of nesting is non-zero, or if it has “skip” debuginfo, we don’t print it. this already handles the “repeated start” case where both __libc_start_main and std::rt::lang_start increment the level; if we end with a non-zero level of nesting that’s no problem.

A bit mask/pattern seems plausible - though I’m not immediately thinking of an existing example of that in DWARF, so maybe that’s frowned upon for some reason.

cool, i can update the implementation to make this a bit mask instead of an enum. if that causes problems for any reason, i think it would also be reasonable to say that “start” always hides the frame, because you can replicate the “start without skipping this frame” functionality by annotating every caller with “start” (this is not ideal, but also i expect “start without skipping this frame” to be rare).

dblaikie · February 10, 2025, 6:33pm

Hmm, OK - I guess that’s one thing I hadn’t registered properly - the “start” is the most recent stack frame, and “end” is the oldest stack frame. Which makes sense when stack walking recent to oldest, but given the necessary directionality of that - maybe there’s some less ambiguous terms we could use?

But with that framing in mind, trying to follow this walking from most recent frame back to the oldest. When we start we increment and when we end we decrement - but we don’t know the initial depth when we begin the walk. So I’m not sure how we know which frames are non-zero, at least not initially. We’d have to track the lowest level of nesting seen in the total walk - then any frame at /that/ level is rendered?

So if we had a stack (libc(start), main, foo) we start at zero for foo, main is zero too, and libc increments nesting and gets assigned the nesting level 1, main and foo 0. so we render main and foo.

If we had the stack (libc(start), main, stdlib_entry(end), stdlib_impl) we’d start at zero at stdlib_entry, but stdlib_entry “ends” so we decrement and assign -1 to main, then libc increments and gets assigned 0 - and we render the frames at -1?

jyn514 · February 15, 2025, 1:59pm

@dblaikie before we discuss this a bunch more, i do want to say - if this is with the goal of finding a better runtime representation, or a way to make this extensible to other languages, i’m all for that. but if you’re just trying to find out if this is possible at all, i can tell you it’s certainly possible, because the rust runtime has done it for 10 years. the code for that is here: rust/library/std/src/sys/backtrace.rs at 8c07d140e00dfa5b0988754051d07d8a91ff01f7 · rust-lang/rust · GitHub

When we start we increment and when we end we decrement - but we don’t know the initial depth when we begin the walk.

The initial depth is an implied 1. This works for rust because it assumes there will always be an end_short_backtrace frame somewhere in the stack which will decrement it to 0; i suppose in the general case you can’t assume that and you would have to do a full stack walk like you’re describing to see if this debuginfo is present anywhere.

jyn514 · February 15, 2025, 2:06pm

actually, @dblaikie, what about this: for an initial draft, we forget about all this start/end stuff and only support a way to say “the current frame should not be printed in the backtrace”. that’s nice and simple and doesn’t require global reasoning, and it doesn’t have any effects on inlining. rust will keep using __rust_{start,end}_short_backtrace symbol names for this more complicated thing. and if we ever want to go back and make that mechanism general purpose we can, but in the meantime we have the “skip current frame” mechanism that’s uncontroversial and works everywhere.

dblaikie · February 21, 2025, 8:25pm

Sure, if other folks are good with that - and it’s an extension attribute we can change the meaning of later anyway, let’s go with that.

Which consumers do you plan to implement support for this in? Will they need backwards/forwards compatibility if this is changed in the future? Or are they version locked with the compiler (that’d make it easier to make breaking changes in the future so we don’t have to consume more encoding space by using another extension attribute, or making the value more verbose than it needs to be so as not to overlap with the current/past semantics, etc)

jyn514 · February 21, 2025, 9:32pm

I plan to implement this in the Rust standard library, backtrace-rs, and possibly addr2line depending how hard it is to add a general-purpose mechanism for looking up DWARF info to gimli. The standard library is locked to the version of the compiler, but the others aren’t. However, the compiler isn’t locked to a version of LLVM; we support compiling rustc with old versions of system LLVM (i think currently we support back to 18).

That said, if this is a boolean 0/1 flag to start, it should be backwards compatible to change that to the bitmask I mentioned above: we would put “Skip” in the lowest bit and when I implement this in backtrace-rs I would only look at the lowest bit. Then if LLVM adds data to other places in the bitmask in the future, old consumers would keep working.

dblaikie · February 21, 2025, 9:59pm

Sounds good. Making sure we document that it’s 0/1 that need to be checked for, not zero/non-zero.

Topic		Replies	Views
MCJIT and DWARF debugging info and lldb LLVM Dev List Archives	2	86	October 15, 2013
[RFC][LLDB] Highlighting function names in LLDB backtraces LLDB debuginfo	5	130	March 27, 2025
Filtering frames out of the backtrace display LLDB	1	159	February 12, 2024
Debug info generation through llvm backend LLVM Dev List Archives	4	100	February 10, 2011
Generating a backtrace LLVM Dev List Archives	1	142	February 24, 2010

Adding short backtrace debuginfo

Related topics