VC C++ demangler

Hi,

We have a demangler for the Itanium ABI, but looks like we don’t have one for the MSVC-style symbols. Is there any good demangler we can import to LLVM?

If there’s no suitable demangler, I’d like to write one. Currently, we are using UnDecorateSymbolName function, but the function is available only on Windows (which is problematic when you are doing a cross-build), and the function is not thread-safe. These two seem to be an enough reason to have our own demanler.

Rui

I'm not aware of a suitable one, currently. I agree it would be very
useful to have.

A long time ago, when I devised the grammar and structure of the Microsoft C++ name mangling scheme (decorated names), the document describing the object model and the name decoration scheme were made publically available. Perhaps this is still available publically, or perhaps Microsoft might be willing to share an up to date definition of the name-decoration grammar, especially in light of the integration of CodeView debugging information into LLVM, which somewhat ties in with this.

This was expressed as a regular BNF grammar, so it should be possible to create a clean-room implementation of both the “mangler” and “de-mangler” from that BNF definition if it still exists in that form. Does the recently added CodeView debug information not provide this description (I admit I haven’t looked)?

Certainly tools like ‘c++filt’ do not know about the Microsoft name decoration scheme, but LLVM does know how to mangle the names using the VC++ ABI, and since the mangling follows a regular grammar, the de-mangling should be relatively straight-forward to implement.

All the best,

MartinO

We have clang/lib/AST/MicrosoftMangle.cpp, so looks like what I should do is to write code that do the reverse of it. One thing I should be careful is to produce the exact same outputs as Microsoft’s UnDecorateSymbolName function would output so that the behavior doesn’t change between Windows and non-Windows platforms, but it probably shouldn’t be hard.

Something you may want to keep in mind while prototyping is that
writing a demangler in C++ (or any other unsafe language) is
particularly annoying, so you may want to keep fuzzers and sanitizers
always in your toolbox. The itanium demangler currently in tree
suffers from all kinds of bugs/vulnerabilities, FWIW.

I thought these have been fixed after being fuzzed by Kostya? Are there
still new ones that popped-up that haven't been fixed?

We have clang/lib/AST/MicrosoftMangle.cpp, so looks like what I should do is to write code that do the reverse of it. One thing I should be careful is to produce the exact same outputs as Microsoft’s UnDecorateSymbolName function would output so that the behavior doesn’t change between Windows and non-Windows platforms, but it probably shouldn’t be hard.

Just to be clear - once LLVM has its own demangler, it should probably use it on all platforms, so there’d be no worry about different behavior between LLVM on Windows and LLVM elsewhere.

But that said, it’s probably still important/worthwhile to make sure it’s consistent with the platform demangler.

Just to be clear - once LLVM has its own demangler, it should probably use it on all platforms, so there’d be no worry about different behavior between LLVM on Windows and LLVM elsewhere.

But that said, it’s probably still important/worthwhile to make sure it’s consistent with the platform demangler.

Personally I would be all for a unit test program that verified against the Windows API when run on Windows, and against canned output on non-Windows.

–paulr

That was my preference too, but looks like getting the exact same results
as the Windows API is not that easy nor worthwhile when it comes to
arbitrary formatting rules. For example, IIRC, UnDecorateSymbolName
generates not "int const* const* x" nor "int const * const * x" but "int
const* const * x". This is simply odd, and I'd guess we don't want to mimic
all these corner cases. So mixing our own demangler and the Windows
demangler can cause unnecessary churn.

Yeah, may well be the case - I don’t /think/ LLVM quite matches the exact syntax of the GCC demangler either (I seem to recall constants as non-type template parameters were a bit different).

If it’s only whitespace differences, that’s easy to accommodate. If there are other cases that don’t work, maybe don’t use this tactic for those, if we have a good reason for being different. As they say, don’t throw the baby out with the bathwater.

–paulr

It has been a while since I have fought this stuff but, as I recall, there is some relationship between the display name of a function in the debug info and the result of demangling a symbol.
I think a good criteria is to have Clang’s display name generation for CodeView and our implementation of the demangler agree. This way we have an explainable system and any discrepency we would want to correct would result us in fixing both Clang and our demangler.

Even better idea: use the demangler to generate the display names in the debug info. I am pretty sure this is what MSVC does.

If it's only whitespace differences, that's easy to accommodate. If there
are other cases that don't work, maybe don't use this tactic for those, if
we have a good reason for being different. As they say, don't throw the
baby out with the bathwater.

I'll try to keep the difference only in whitespace.

FYI, I started writing a demangler. I think I can send an initial patch to review in a few days.

Please add me on reviews. BTW, even differing in whitespace might cause problems, I know their tools have some builtin assumptions about whitespace in type names. How deeply engrained this is is not clear though.

I uploaded a FYI patch (not intended for submission) as https://reviews.llvm.org/D34667. If you want to take a look and comment on its design, please do so. Thanks!

Sorry for didn’t post earlier, has anyone checked http://mingw-w64.sourceforge.net/libmangle/ ?

Also the Wine project has some code which might be useful as reference / test case:

https://github.com/wine-mirror/wine/blob/master/dlls/msvcrt/undname.c
https://github.com/wine-mirror/wine/blob/master/dlls/msvcrt/tests/cpp.c#L1098

They are LGPL, so I think we cannot use them.