RFC: Cleaning up the Itanium demangler

Hello all,
The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is really slow. This is largely because the textual representation of the symbol that is being demangled is held in a std::string, and manipulations done during parsing are done on that string. The demangler is always concatenating strings and inserting into the middle of strings, which is terrible. The fact that the parsing logic and the string manipulation/formatting logic is interleaved also makes the demangler pretty ugly. Another problem was that the demangler used a lot stack space, and has a bunch of stack overflows filed against it.

I've been working on fixing this by parsing first into an AST structure, and then traversing that AST to produce a demangled string. This provides a significant performance improvement and also make the demangler somewhat more clean. Attached you should find a patch to this effect. This patch is still very much a work in progress, but currently passes the libcxxabi test suite and demangles all the symbols in LLVM identically to the current demangler. It also provides a significant performance improvement: it demangles the symbols in LLVM about 3.7 times faster than the current demangler. Also, separating the formatting code from the parser reduces stack usage (the activation frame for parse_type reduced from 416 to 144 bytes on my machine). The stack usage is still pretty bad, but this helps with some of it.

Does anyone have any early feedback on the patch? Does this seem like a good direction for the demangler?

As far as future plans for this file, I have a few more refactorings and performance improvements that I'd like to get through. After that, it might be interesting to try to replace the FastDemangle.cpp demangler in LLDB with this, to restore the one true demangler in the source tree. The FastDemangler.cpp is only partially completed, and calls out to ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it fails to parse the symbol.

Any thoughts here would be appreciated!
Thanks,
Erik

demangler-tree.diff (300 KB)

Hi Erik,

Thanks for working on this!

I’m very interested in your work because I’ve just started writing a demangler for the Microsoft mangling scheme. What I found in the current Itanium demangler is the same as you – it looks like it allocates too much memory during parsing and concatenates std::strings too often. I could see there’s a (probably big) room to improve. Demangler’s performance is sometimes important for LLD, which is my main project, as linkers often have to print out a lot of symbols if a verbose output is requested. For example, if you link Chrome with the -map option, the linker has to demangle 300 MiB strings in total, which currently takes more than 20 seconds on my machine if single-threaded.

The way I’m trying to implement a MS demangler is the same as you, too. I’m trying to create an AST to describe type and then convert it to string. I guess that we can use the same AST type between Itanium and MS so that we can use the same code for converting ASTs to strings.

It’s unfortunate that my work is overlapping with yours. Looks like you are ahead of me, so I’ll take a look at your code to see if there’s something I can do for you.

Using the same AST is an interesting idea. The AST that I wrote isn’t that complicated, and is pretty closely tied to the libcxxabi demangler, so I bet it would be easier to have separate representations, especially if your intending on mimicking the output of MS’s demangler. I’m also not at all familiar with how MS mangles their C++, which might imply a slightly different representation.

I don't have any concrete feedback, but:

- +1 for removing the "FastDemagler"

- If you already construct an AST as a part of your demangling
process, would it be possible to export that AST for external
consumption somehow? Right now in lldb we sometimes need to parse the
demangled name (to get the "basename" of a function for example), and
the code for doing that is quite ugly. It would be much nicer if we
could just query the parsed representation of the name somehow, and
the AST would enable us to do that.

I don’t have any concrete feedback, but:

  • +1 for removing the “FastDemagler”

  • If you already construct an AST as a part of your demangling
    process, would it be possible to export that AST for external
    consumption somehow? Right now in lldb we sometimes need to parse the
    demangled name (to get the “basename” of a function for example), and
    the code for doing that is quite ugly. It would be much nicer if we
    could just query the parsed representation of the name somehow, and
    the AST would enable us to do that.

I was thinking about this use case a little, actually. I think it makes more sense to provide a function, say getItaniumDemangledBasename(), which could just parse and query the AST for the base name (the AST already has an way of doing this). This would allow the demangler to bail out if it knows that the rest of the input string isn’t relevant, i.e., we could bail out after parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out the AST should make parsing the base name significantly faster on top of this.

Do you have any other use case for the AST outside of base names? It still would be possible to export it from ItaniumDemangle.

How does it far in terms of code size? One of the big problems with the
libcxxabi demangler is its size. It's at least twice as large as the one
in elftoolchain, even ignoring some possible feature differences.

Joerg

Well.. the current parser chops the name into "basename", "context",
"arguments", and "qualifiers" part. All of them seem to be used right
now, but I don't know e.g. how unavoidable that is. I know about this
because I was fixing some bugs there, but I am actually not that
familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
thoughts on this. We also have the ability to set breakpoints by
providing just a part of the context (e.g. "breakpoint set -n
foo::bar" even though the full function name is baz::booze::foo::bar),
but this seems to be implemented in some different way.

I don't think having the ability to short-circuit the demangling would
bring as any speed benefit, at least not without a major refactor, as
we demangle all the names anyway. Even the AST solution will probably
require a fair deal of plumbing on our part to make it useful.

Also, any custom-tailored solution will probably make it hard to
retrieve any additional info, should we later need it, so I'd be in
favor of the AST solution. (I don't know how much it would complicate
the implementation though).

I've been working on fixing this by parsing first into an AST structure, and
then traversing that AST to produce a demangled string. This provides a
significant performance improvement and also make the demangler somewhat
more clean.

How does it far in terms of code size? One of the big problems with the
libcxxabi demangler is its size. It's at least twice as large as the one
in elftoolchain, even ignoring some possible feature differences.

Looks pretty good, I'm guessing this is because all the string operations aren't getting inlined into the parser anymore.
Before this patch:
__TEXT __DATA __OBJC others dec hex
247293 0 0 1248 248541 3cadd libc++abi.a(cxa_demangle.cpp.o)

After this patch:
__TEXT __DATA __OBJC others dec hex
137723 4216 0 6016 147955 241f3 libc++abi.a(cxa_demangle.cpp.o)

Better, but still huge compared to:

   text data bss dec hex filename
  44560 0 0 44560 ae10 rt_libelftc_dem_gnu3.pico

Joerg

Ah, I see. In that case I agree that exposing the AST is the only way that
this could be done. I don't think it would be that hard to implement, it
would cause a bit of a divergence between cxa_demangle and ItaniumDemangle,
where the former would want to keep the AST private and the latter public,
but thats not the end of the world. I'd be curious to see if the LLDB folks
are interested in such an API.

I'm very interested in your work because I've just started writing a
demangler for the Microsoft mangling scheme. What I found in the current
Itanium demangler is the same as you -- it looks like it allocates too much
memory during parsing and concatenates std::strings too often. I could see
there's a (probably big) room to improve. Demangler's performance is
sometimes important for LLD, which is my main project, as linkers often
have to print out a lot of symbols if a verbose output is requested. For
example, if you link Chrome with the -map option, the linker has to
demangle 300 MiB strings in total, which currently takes more than 20
seconds on my machine if single-threaded.

The way I'm trying to implement a MS demangler is the same as you, too.
I'm trying to create an AST to describe type and then convert it to string.
I guess that we can use the same AST type between Itanium and MS so that we
can use the same code for converting ASTs to strings.

Using the same AST is an interesting idea. The AST that I wrote isn't that
complicated, and is pretty closely tied to the libcxxabi demangler, so I
bet it would be easier to have separate representations, especially if your
intending on mimicking the output of MS's demangler. I'm also not at all
familiar with how MS mangles their C++, which might imply a slightly
different representation.

I'm not going to immediately try to do it, but I think sharing the same AST
data structure seems to makes sense. I'm not too crazy about mimicking all
the details of the Microsoft's demangler, so a slight deviation is OK as
long as the difference is minor and reasonable. Mangled symbols are very
different between Itanium and Microsoft, but after all the demangled form
is a plain C++ which should be the (almost) same between the two.

One thing I’d noticed in the past is that the existing libcxxabi cxa_demangle.o code is MUCH smaller when compiled by GCC, vs the same code compiled by clang.

On ARM, it was approximately 40K (gcc -Os), 50K (gcc -O2), 70K (clang -Os), 115K (clang -O2) of TEXT.

This is Greg's area, he'll be able to answer in detail how the name chopper gets used. IIRC it chops demangled names, so it is indirectly a client of the demangler, but it doesn't use the demangler to do this directly. Name lookup is done by finding all the base name matches, then comparing the context. We don't do a very good job of doing fuzzy full name matches - for instance when trying to break on one overload you have to get the arguments exactly as the demangler would produce them. We could do some more heuristics here (remove all the spaces you can before comparison, etc.) though it would be even easier if we had something that could tokenize names - both mangled & natural.

The Swift demangler produces a node tree for the demangled elements of a name which is very handy on the Swift side. A long time ago Greg experimented with such a thing for the C++ demangler, but it ended up being too slow.

On that note, the demangler is a performance bottleneck for lldb. Going to the fast demangler over the system one was a big performance win. Maybe the system demangler is fast enough nowadays, but if it isn't then we can't get rid of the FastDemangler.

Jim

When I looked at demangler performance, I was able to make significant improvements to the llvm demangler. At that point removing lldb’s fast demangler didn’t hurt performance very much, but the fast demangler was still faster. I forget (and apparently didn’t write down) how much it mattered, but post this change I think was single digit %.

https://reviews.llvm.org/D32500

Another important criterium for the demangler in the debugger is that it 100% cannot crash no matter what it gets fed. lldb used to have it's own copy of the system demangler library because it had bugs, and we needed to be able to fix them faster than the system version. We feed it all the symbols we ingest (we actually sniff them a little bit, but we really shouldn't have to do that, the demangler should be fast enough rejecting symbols) so if there's one in some system library that triggers a demangler crash, you're pretty much dead in the water on that system...

Jim

FYI, I uploaded my work-in-progress patch for the Microsoft mangling scheme to https://reviews.llvm.org/D34667.

I put up some patches for this here:
https://reviews.llvm.org/D35159
https://reviews.llvm.org/D35158

Thanks,
Erik