RFC: Adding an itanium c++ demangler to lib/Support

We are trying out lld ELF on windows. It works great, but a big
difference from running it on linux is that it cannot demangle c++
names since there is no itanium demangler available on windows.

We have an implementation in libcxxabi/src/cxa_demangle.cpp. I see
that there was some discussion about having a version of that in
lib/Support, but I don't think a patch was ever posted.

So, some questions

* Is having an itanium demangler in lib/Support something people find
desirable or at least acceptable?
* The libcxxabi code is dual licensed, would the copy in lib/Support be as well?
* How much llvm-like should we try to make it? Should it take an
StringRef, return an Error and print to a raw_ostream? Or should it
look more like __cxa_demangle to try to make it easier to move code
in?

My current preference is probably to make it as llvm-like as possible
since I don't expect we will need to add new mangling features too
often.

Cheers,
Rafael

* Is having an itanium demangler in lib/Support something people find
desirable or at least acceptable?

Yes.

* The libcxxabi code is dual licensed, would the copy in lib/Support be as well?

Please don’t use the one from libcxxabi. Howard wrote one that was initially in libcxxabi but was replaced because it had memory requirements that were incompatible with one of the use cases in libcxxabi (on the out-of-memory exception path). It is far more flexible and allows things to be hooked in at various points in the parse. I believe that this one was written entirely by Howard during his time as an Apple employee so can likely be relicensed with Chris’s permission if required.

* How much llvm-like should we try to make it? Should it take an
StringRef, return an Error and print to a raw_ostream? Or should it
look more like __cxa_demangle to try to make it easier to move code
in?

I believe that it should be a generally useful demangler. __cxa_demangle has a very poorly designed interface and is really only useful for turning mangled names into strings. The earlier one makes it easy, for example, to extract the demangled name of each argument type for a function call. This is something that I can imagine being useful in JIT FFI contexts, for example.

David

I really want to start simple. So if adding a demangler the first
objective is to add one that lets us drop the HAVE_CXXABI_H.

After that it can be expanded.

Cheers,
Rafael

+1 (desirable)

In my opinion, yes. Also, if or when we do this we should remove the
additional copy of the demangler in lldb.

+Kate

We already have two demangler implementations (LLDB and libcxxabi). I'd rather not have three. Have you looked at the LLDB one? I think Kate has some patches she hasn't had a chance to commit yet that add functionality. I heard something like 10x faster, and way less stack usage (although not quite fully functional yet). Seems like a good starting point.

I don't have a problem with "the one true demangler" living in lib/Support, but ideally we'd find a way to reuse it in libc++abi so that we have one, well-tested, implementation.

Having a single one for llvm and lldb should be "easy", I am OK with
starting by just moving the lldb one to llvm.

I guess it should be possible to have libc++abi link with lib/Support
and fetch a single object file, but that means not using other parts
of llvm in the implementation (no StringRef, Error or raw_ostream).

Is the implementation in lldb dual licensed?

Cheers,
Rafael

+Kate

We already have two demangler implementations (LLDB and libcxxabi). I'd
rather not have three. Have you looked at the LLDB one? I think Kate has
some patches she hasn't had a chance to commit yet that add functionality.
I heard something like 10x faster, and way less stack usage (although not
quite fully functional yet). Seems like a good starting point.

I don't have a problem with "the one true demangler" living in
lib/Support, but ideally we'd find a way to reuse it in libc++abi so that
we have one, well-tested, implementation.

IIRC, LLDB has two demanglers: one is a copy of the libc++ demangler and
the other is a "fast-path" demangler. There are some cases that the
fast-path demangler cannot handle which leads it to fall back to the libc++
clone.

My professional opinion, having worked a lot with mangling technology,
would be for us to write a new mangler that had incredibly few dependencies
on anything. This would make it easy for us to copy the source or an
object file generated by the source.

+Kate

We already have two demangler implementations (LLDB and libcxxabi). I'd rather not have three. Have you looked at the LLDB one? I think Kate has some patches she hasn't had a chance to commit yet that add functionality. I heard something like 10x faster, and way less stack usage (although not quite fully functional yet). Seems like a good starting point.

I don't have a problem with "the one true demangler" living in lib/Support, but ideally we'd find a way to reuse it in libc++abi so that we have one, well-tested, implementation.

IIRC, LLDB has two demanglers: one is a copy of the libc++ demangler and the other is a "fast-path" demangler. There are some cases that the fast-path demangler cannot handle which leads it to fall back to the libc++ clone.

I think the goal of the fast-path LLDB demangler was to eventually
be fully-functional, it just isn't there yet.

My professional opinion, having worked a lot with mangling technology, would be for us to write a new mangler that had incredibly few dependencies on anything. This would make it easy for us to copy the source or an object file generated by the source.

This lines up with what I'm thinking, I just imagine that the LLDB
"fast-path" demangler could be a starting point.

My professional opinion, having worked a lot with mangling technology, would be for us to write a new mangler that had incredibly few dependencies on anything. This would make it easy for us to copy the source or an object file generated by the source.

This lines up with what I'm thinking, I just imagine that the LLDB
"fast-path" demangler could be a starting point.

So, I would really prefer that whatever we put in lib/Support be fully
generic. How about

* Move the libc++abi one to lib/Support and make sure it doesn't
depend on anything else in llvm and exports just the demangler. We
keep it dual licensed so that the libc++abi build can include it. We
would probably just have an ifdef in the code to know the function
name to export and share identical copies of the file in llvm and
libc++abi.

* Change lldb's fallback demangler to be llvm's one.

* Once the fastpath demangler can handle all cases, we move its
implementation to be the one in lib/Support.

Cheers,
Rafael

+Kate

We already have two demangler implementations (LLDB and libcxxabi). I’d rather not have three. Have you looked at the LLDB one? I think Kate has some patches she hasn’t had a chance to commit yet that add functionality. I heard something like 10x faster, and way less stack usage (although not quite fully functional yet). Seems like a good starting point.

I’d definitely support this and would happily answer inquiries about the existing design. I don’t think we quite manage a ten-fold improvement, but 6-8 times faster than the libcxxabi implementation is typical at -O3. The primary design constraint from LLDB’s perspective is raw throughput since we generally need to demangle every symbol associated with a process in order to resolve typical requests (break on a function whose base name is “main”.)

The existing design is intended to be 100% accurate for cases it can handle and to fail gracefully when it doesn’t support a particular mangling to enable fallback to the libcxxabi implementation. Sadly, we also have a copy of the latter as we needed to work around a few crashes as they cropped up late in various product cycles.

I don’t have a problem with “the one true demangler” living in lib/Support, but ideally we’d find a way to reuse it in libc++abi so that we have one, well-tested, implementation.

IIRC, LLDB has two demanglers: one is a copy of the libc++ demangler and the other is a “fast-path” demangler. There are some cases that the fast-path demangler cannot handle which leads it to fall back to the libc++ clone.

I think the goal of the fast-path LLDB demangler was to eventually
be fully-functional, it just isn’t there yet.

Absolutely. We’d be happy to rely on a shared, fully functional implementation that meets our throughput needs and I believe this could be a reasonable starting point.

My professional opinion, having worked a lot with mangling technology, would be for us to write a new mangler that had incredibly few dependencies on anything. This would make it easy for us to copy the source or an object file generated by the source.

This lines up with what I’m thinking, I just imagine that the LLDB
“fast-path” demangler could be a starting point.

It’s entirely self-contained and largely stock C with a very few modern C++ conveniences.

I’m wondering if we also want to have a demangler for MSVC name mangling scheme for those who want to cross-build Windows executables on Unix. (Or do we already have one?)

I'm wondering if we also want to have a demangler for MSVC name mangling
scheme for those who want to cross-build Windows executables on Unix. (Or
do we already have one?)

+1 on this. This sounds extremely useful (and I do often cross-compile for
Windows on Unix).