Llvm-cxxfilt alternate renderings for particularly large demanglings

So I’ve recently been investigating some very large (multiple GB… good times) demangled names that are a bit unwieldy - I locally modified llvm-cxxfilt to produce something semi-usable for complex names that looks like this:

_1 = _0<>
_2 = _0<_1, _1>
_0 = t1
f1(_0<_0<_2, _2>, _2>)

It’s not the most elegant thing, would certainly need some cleaning up (& maybe it should only be usable when llvm-cxxfilt is passed a whole mangled name on the command line, rather than when filtering /demangling/ on arbitrary input/output)

It could be improved by not numbering/inlining any entity that’s short enough/not super interesting to common (like direct names like t1 maybe it’d be fine if that was duplicated everywhere).

I also was rather looking forward to/would’ve liked to have a dot/graphviz output mode where each multiply-referenced node (note how in the above, f1 doesn’t get numbered separately from its use in the function with signature - because the name f1 isn’t referenced by its substitution number at all - it only appears once) gets a node in the graph, and the text would look similar to the above, but perhaps use a node-local numbering, and have edges numbered by that node-local numbering?

@jyknight mentioned he’d like a format maybe something like this (perhaps you had some other ideas, James?)
@zygoloid mentioned he’d like a format that just doesn’t expand template parameters (so, eg: template<typename T> void f1(T); ... f1(3) would produce f1<_1 = int>(_1) instead of f1<int>(int) to give a bit more fidelity about where the template parameters appear - which would remove some of the duplication, but probably still not enough in the case I was investigating - I wouldn’t mind adding that while I’m here, though)

Any have opinions on these output formats, or things they’ve been thinking of here?

It looks like the Outliners in LLVM. You put all the interesting? substrings into a Suffixtree and replace the long/frequent substrings with placeholders.

FWIW I never understood why the original gnu demangler chose to substitute template parms and not produce something like ‘f<T=int> (T)’

1 Like

Bit easier & more syntactically aware than that - the mangling already has a numbering for shared/multiply-referenced entities, so any such entity that does end up multiply-referenced can be printed out differently in some way that allows that sharing rather than duplicating/flattening the DAG into a tree.