[RFC][ItaniumDemangler] New option to print compact C++ names

Demangled C++ names can quickly become hard to read due to their length (e.g., particularly when templates are involved). Users of tools that display demangled names like LLDB (for backtraces) may not care about all elements of the original C++ name (in the case of debugger backtraces a user most likely just wants to quickly see the function name for each frame). We’ve had similar requests in the past from users looking at backtraces in crash reports or profiling tools, where shorter demangled names would help user-experience and also potentially help with crash report sizes.

The proposal is to introduce a new set of options (akin to a “printing policy”) to the LLVM ItaniumDemangler to control aspects of how the demangled name should be printed.

We’re primarily looking for input on the appetite for introducing such an option in the demangler and what kinds of options people could see themselves using.

Given following demangled name (taken from an actual LLDB frame):

$ llvm-cxxfilt _ZN4llvm2cl4listINSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbNS0_6parserIS8_EEEC1IJA43_cNS0_4descENS0_9MiscFlagsENS0_12OptionHiddenENS0_3catENS0_2cbIvRKS8_EEEEEDpRKT_

llvm::cl::list<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, llvm::cl::parser<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>> >::list<char [43], llvm::cl::desc, llvm::cl::MiscFlags, llvm::cl::OptionHidden, llvm::cl::cat, llvm::cl::cb<void, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>>( char const (&) [43], llvm::cl::desc const&, llvm::cl::MiscFlags const&, llvm::cl::OptionHidden const&, llvm::cl::cat const&, llvm::cl::cb<void, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&> const&)

Note:

  1. std::basic_string isn’t reduced to std::string (technically because the Ss substitution was inhibited by the inline namespace)
  2. Default template parameters are expanded (the mangling doesn’t tell us anything about whether a parameter is defaulted, so nothing the demangler could’ve known)
  3. It’s really hard to identify what the function name here actually is

A user might really just want to see the following:

llvm::cl::list<..>::list<..>(char const (&)[43], llvm::cl::desc const&, llvm::cl::MiscFlags const&, llvm::cl::OptionHidden const&, llvm::cl::cast const&, llvm::cl::cb<..> const&)

Now it’s much more apparent that we’re dealing with a constructor call.

Related discussion: Llvm-cxxfilt alternate renderings for particularly large demanglings

Prior Art

The Swift demangler is configurable via a swift::Demangle::DemangleOptions structure.

There are cases where we might want to configure the level of detail in the demangled names. But currently those decisions have to be made unilaterally for all users. E.g., showing constructor variants, adding space between template closers, etc.

The MSVC demangle (undecorate) API has flags that do some form of name simplification.

This C++ demangler https://pkg.go.dev/github.com/ianlancetaylor/demangle#NoTemplateParams used at Google for some symbolication purposes.

Implementation Considerations

This section describes some details of a possible implementation of this RFC (based on our experience developing a prototype).

Should this be part of the demangler?

One alternative is to do the simplification of these names in a pre- or post-processing step. E.g., llvm-cxxmap is a tool that allows one to specify equivalences between elements of a mangled name. But it doesn’t currently do any sort of remangling. Its primary (and only?) use is for a tool to determine whether two mangled names represent the same C++ name. One could try to re-use that infrastructure to write some sort of “mangled name simplification” tool. So we would then feed the simplified mangled name into the demangler. This seems difficult to get right for more complex options such as “hide all template parameters after certain depth”.

Post-processing seems undesirable, since at that point we’re implementing a C++ parser.

How to pass options to ItaniumDemangle

The main entry point to printing a demangled name lives on the demangle tree itself (on Node::print). This makes it tricky to keep around state across the entire printing process because there’s simply no way to store it apart from passing it as a function parameter. The Node class doesn’t link back to the Demangler, so storing it there is currently not possible.

This has been previously worked around by storing printing related state inside the OutputBuffer class. That’s because it is the only structure we do pass around while printing.

We considered following ways to provide a PrintingPolicy structure to the demangler:

  1. Pass PrintingPolicy to all the Node::printLeft/Node::printRight overrides
  2. Add a link from Node to the owning Demangler and store options on the Demangler
  3. Add PrintingPolicy as a member to OutputBuffer
  4. Decouple printing from the Node class using a new PrintVisitor that holds state used during printing

We ended up implementing option (4) because it seemd like the most architecurally sane approach. Though it does introduce the most churn here (since we need to move all the printLeft/printRight overrides into a new class. So we’d be happy to consider the other options.

PrintVisitor (option 4)

Possible implementation

The idea here is to rip out the Node::printLeft/Node::printRight APIs into a new itanium_demangle::PrintVisitor class. This class would look something like:

struct PrintVisitor {
  OutputBuffer &OB;
  PrintingPolicy PP;
  // ... other printing related state

  void printLeft(const Node *Node) {
    Node->visit([this](auto *N) {
        this->operator()(N, PrintKind::Left);
    });
  }

  void printRight(const Node *Node) {
    Node->visit([this](auto *N) {
        this->operator()(N, PrintKind::Right);
    });
  }

  // Example operator() overload
  void operator()(const NameType *Node, PrintKind::Side S) {
    switch (S) {
    case PrintKind::Left
      OB += Node->Name;
    case PrintKind::Right:
      return;
    }
  }
};

This re-uses the Node::visit API to walk over the demangle tree and print as we did before. Only now the printLeft/printRight are part of the same operator() overload for each Node type and we have access to printing state without passing it around.

This has the added benefit of being able to move all the printing state out of OutputBuffer.

Users that previously called Node::print would now call something as follows:

Node->visit(PrintVisitor(Buffer, Policy)

(which could further be encapsulated)

How to expose options beyond the demangler library

This proposal is only concerned with adding these options to the demangler itself. Tools that get the demangled name via llvm::demangle/llvm::itaniumDemangle (such as llvm-cxxfilt) wouldn’t immediately benefit from this. We would need agree on what the command-line options looked like and how those would be passed into ItaniumDemangle.

We primarily looked at the LLDB use-case when prototyping this, which invokes the ItaniumPartialDemangler directly. Meaning it can construct the PrintingPolicy at the call-site.

PrintingPolicy Options

Some options that could be useful in a “compact” demangled name:

  • Hide template parameters
    • Could have configurable depth
    • Can even have HideAllTemplateParams/HideClassTemplateParams/etc.
  • Recognize std::__1:: as std:: (though this may not be the appropriate level to fix this particular issue at)
  • Omit function return types
  • Omit abi-tags

Other suggestions are welcome.

1 Like

Tagging some people I’ve talked about this in the past with:

@labath @dblaikie @adrian.prantl @ldionne @jingham

I think this would be really useful, and the proposed implementation strategy makes sense to me.

Something that’s related to this (but likely can’t use the same mechanism) are attempts to improve the readability of compiler diagnostics by simplifying type names printed by the compiler. One example of that is GitHub - vittorioromeo/camomilla: Simple Python script that simplifies C++ compiler errors. Useful when using heavily-templated libraries., which is a post-processing script written in Python.

It would be really nice if e.g. Clang could use the same underlying “type name simplification” machinery used by the demangler out of the box. That way, we could customize its type printing like we customize that of the demangler. However, I understand they are operating on vastly different trees so this may not be feasible or even desirable – I wanted to point it out so we at least consider the possibility.

Yeah, unfortunately a lot of the things the compiler should be doing aren’t possible as post-processing (it knows about default template parameters, it knows about type-as-written, etc - none of which a post-processing tool, certainly not one that only consumes the mangled name (as opposed to one that might also use clang tooling to parse the source to rediscover some things the compiler knows)) so I’m not sure there’s much overlap in terms of non-lossy simplifications/shortenings…

We could shorten some things if the demangled form isn’t syntactically valid - introducing intermediate type aliases the same way the mangled name deduplicates things. (eg: void f<T1 = std::basic_string...>(T1, T1))

I believe our crash analysis tools internally omit a bunch of stuff from the demangled names - I’m not sure which things they omit, or how it’s implemented, but I’ll look into it now and get back to you - I think they’re probably dropping all the template parameters and maybe the function parameters too. Those might already be configurable with the demangler, or at least llvm-symbolizer.

We could omit default types in std templates. This would depend on the C++ standard used though.

Oh - also, I think lldb prints a fair bit more verbosely on backtraces than gdb does - might be worth looking at those differences to reduce some of the noise there as well as the name itself. (I suspect the things gdb does differently probably help at the command line but might not have any impact on GUI/IDE users)

Hmm just gave this a quick try, but GDB doesn’t seem to be much better with the particular llvm::cl::list example at least:

(gdb) bt
#0  llvm::cl::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::list<char [26], llvm::cl::desc, llvm::cl::MiscFlags, llvm::cl::OptionHidden> (
    this=0x11fd68c0 <ReservedRegsForRA>)

Though I’ll do some more comparing

That’d be very useful thank you! The demangler does have an option to skip parsing (i.e., demangling) of the function parameters, which was introduced not too long ago. The main difference there is that it affects the parsing itself, not just the printing of the name (might be a technicality but thought I’d point out that the ParseParams option isn’t something that would live in the PrintingPolicy proposed here). I’m not aware of any other options to configure the printing (or even demangling) though. Also, LLDB currently doesn’t use the demangled function arguments anyway. When displaying a frame it applies type-summaries to each of the arguments and displays those, instead of the demangled names.

This are the docs for the MSVC demangle (undecorate) function. Some of the flags it has can be thought of as “compaction” of the name ( UNDNAME_NO_ARGUMENTS, for example), but a lot of them are concerned with removing things that are not a part of the itanium mangled name in the first place.

The reason I brought this up is to show that there is a precedent for configuring the demangler to omit some parts of the mangled name – not because I thought we should match what it does.

Ahh nice, thanks for the pointer! Will update the RFC with the link

The reason I brought this up is to show that there is a precedent for configuring the demangler to omit some parts of the mangled name – not because I thought we should match what it does.

Yup didn’t mean to imply that this was the suggestion. Just that prior art exists in other demanglers

This isn’t related to the demangler, but I think the main difference comes from handling of inlined functions. Compare this lldb oupput:

  * frame #0: 0x0000555555555134 a.out`foo(x=5) at a.cc:1:59
    frame #1: 0x0000555555555169 a.out`barfuz(int) [inlined] bar(x=<unavailable>) at a.cc:3:25
    frame #2: 0x0000555555555161 a.out`barfuz(int) [inlined] baz(x=<unavailable>) at a.cc:5:25
    frame #3: 0x0000555555555161 a.out`barfuz(x=<unavailable>) at a.cc:7:54

with this:

#0  foo (x=5) at a.cc:1
#1  0x0000555555555169 in bar (x=<optimized out>) at a.cc:3
#2  baz (x=<optimized out>) at a.cc:5
#3  barfuz (x=<optimized out>) at a.cc:7

LLDB prints the name of the physical function in each frame. If that function is a (long) template, then it’s tricky to even find the place where the name of the inlined function starts.

Would it maybe make sense to add some debug information for stuff users typically don’t care about? e.g. add a DW_AT_inline_namespace, so inline namespaces are omitted by default. There is also [[clang::preferred_name]], which seems like a good choice to put into the debug information as well.

Yea good point. gdb’s output seems like what I’d prefer to see in a backtrace for inlined functions.

How to exactly add an option in LLDB to not display those is an interesting question. A dedicated flag to thread backtrace is probably not what we want (though there is already a flag for showing “extended” vs “non-extended” backtraces, which this could fall under. Though we’d still have to plumb it down into FormatEntity, which is a long way down). Expressing it as a frame format string is also tricky, without duplicating the code for FunctionNameWithArgs, etc. variables.

It’s definitely worth having the discussion (maybe separate from this RFC) since we to find a solution for adding more options to the frame display settings anyway if we want to present compact demangled names. In my prototype I just plumbed a single new parameter from the thread backtrace command down into FormatEntity. But that didn’t feel ideal

I assume you’re talking about the LLDB case here? The function names displayed in the backtrace are just the demangled name. We don’t do any sort of augmenting of that string (apart from decorating it a bit for inline functions, function arguments, etc.). So the only thing we could do with extra knowledge from debug-info is to post-process the demangled name. Which I think we’d like to avoid doing, because it essentially becomes a C++ parsing problem.

Side note, the DWARF spec already allows representing inline namespaces in debug-info by attaching a DW_AT_export_symbols to a namespace. We actually do also have special support for [[clang::preferred_name]]; we replace references to types that have a preferred name with references to the preferred name. But again, we don’t use Clang’s type-printer to display function names (and we probably don’t want to because the AST that LLDB produces from DWARF isn’t sufficient to represent function names accurately. E.g., abi-tags, tepmlate parameters, etc. aren’t properly described).

Ah, sorry, should’ve clarified - I meant overall gdb prints less, not that it prints less of the function name.

Here’s an example:
gdb:

#0  f1 (s="foobar") at test.cpp:3
#1  0x0000555555555263 in main () at test.cpp:5

lldb:

* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x0000555555555228 a.out`f1(s=error: summary string parsing error) at test.cpp:3:1
    frame #1: 0x0000555555555263 a.out`main at test.cpp:5:3
    frame #2: 0x00007ffff7a43b8a libc.so.6`__libc_start_call_main(main=(a.out`main at test.cpp:4), argc=1, argv=0x00007fffffffd878) at libc_start_call_main.h:58:16
    frame #3: 0x00007ffff7a43c45 libc.so.6`__libc_start_main_impl(main=(a.out`main at test.cpp:4), argc=1, argv=0x00007fffffffd878, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffd868) at libc-start.c:360:3
    frame #4: 0x0000555555555151 a.out`_start + 33

Perhaps not showing threads if there’s only one, skipping things above main, and skipping which program it’s from, maybe if they’re all from the same program? (not sure if gdb puts them in if there is ambiguity).

And I think maybe gdb has some limit on printing all the parameters… hmm, nope, not on a minimal test at least.

oh, and the word "frame " at the start of every line, and the indent before it, seems to hurt a bit too.

GCC ignores return types, FWIW - which come up in demangled names from C++ templates (they have to mangle return types, and mangle /how/ the return type is written, which is even worse), so you get stuff like this from lldb:

  * frame #0: 0x0000555555555313 a.out`std::conditional<sizeof (int) != 0, void, void>::type f1<int>(s="foobar") at test.cpp:5:

compared to gdb’s:

#0  f1<int> (s="foobar") at test.cpp:5

Oh, fascinating… gdb’s printing the name from the DWARF (not using demangling) when it’s present - or at least it’s doing something different when the code is built without debug info:
with -g:

#0  f1<int> (s="foobar") at test.cpp:5

Without -g:

#0  0x00005555555552c4 in std::conditional<(sizeof (int))!=(0), void, void>::type f1<int>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()

Even includes the parameters… (oh, and didn’t set a breakpoint unless I provided the exact mangled name - b f1 said “Function “f1” not defined.”)

Hmm, but is lldb doing something similar?
with -g:

  * frame #0: 0x00005555555552c8 a.out`std::conditional<sizeof (int) != 0, void, void>::type f1<int>(s=error: summary string parsing error) at test.cpp:5:1

Without -g:

  * frame #0: 0x00005555555552c0 a.out`std::conditional<sizeof (int) != 0, void, void>::type f1<int>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>)

Oh, I guess in this case because of a lack of debug info for parameters, it includes the type information - but perhaps in both cases lldb is demangling? Then choosing whether to include parameters?

But perhaps this is all a bit aside - yeah, the ability to demangle a narrower subset of content. (I’m still following up internally, but did find a comment that described “removing parameters” - so sounds like demangling then doing some after-the-fact tidying up, which probably doesn’t help us, except to suggest what functionality would be good to have built-in)

Eventually found out our internal use is using a different demangler (implemented in Go: demangle package - github.com/ianlancetaylor/demangle - Go Packages ) - it does have options for omitting template parameters and omitting function parameters for nested entities inside a function (but not for the function itself? huh) ( demangle package - github.com/ianlancetaylor/demangle - Go Packages )

And that’s implement essentially by a printing policy object that gets passed into the print function - if that’s of any use/relevance/inspiration for implementing these sort of things in LLVM…

Are you sure about that? This thread basically started because the demangled names contain “too much” information. And some of the simplifications flying around (default template arguments, inline namespace) could only be achieved by having more information – and that information is present in DWARF.

There are obviously some UX considerations that we’d need to figure out (like, we still want to let the user see the full demangled name /somehow/), but I don’t think it would be unreasonable to generate the name displayed in the backtrace from DWARF (if DWARF is present).

None of this means that the proposed changes to the demangler aren’t good or useful. I just wouldn’t take it as a given that the demangler output is the best thing to put into the backtrace string (apparently, it works for gdb).