[lld] adding demangler for symbol resolution

Hi Nick, Bigcheese,

When lld is used to link C++ code, it would be required to demangle symbol names by default/user driven option.

The Gnu linker has the following options :-

--demangle=[style]
--no-demangle

I found that clang/llvm-symbolizer use __cxx_demangle function.

I would think that lld also need to call the same function, and I think the way we want to demangle is to have the function in LinkingContext as various flavors may choose to use different API's to demangle symbol names.

The API's that would be in LinkingContext would be :-

         * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ?
         * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name

Thoughts / Suggestions ?

Thanks

Shankar Easwaran

I agree that LinkingContext should be the right place to have a function for demangling. Such feature is needed. It’d be useful in many situations such as printing an error message or logging. But I’d want to keep the number of API’s each LinkingContext needs to support minimum, at least for now, as we can add more as we need it.

Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use.

The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string.

Seems like a demangling utility might be something to add at the LLVM level. Either directly to raw_ostream or a wrapper like format().

-Nick

Hi Nick, Bigcheese,

When lld is used to link C++ code, it would be required to demangle symbol names by default/user driven option.

The Gnu linker has the following options :-

--demangle=[style]
--no-demangle

I found that clang/llvm-symbolizer use __cxx_demangle function.

I would think that lld also need to call the same function, and I think the way we want to demangle is to have the function in LinkingContext as various flavors may choose to use different API's to demangle symbol names.

The API's that would be in LinkingContext would be :-

        * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ?
        * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name

Thoughts / Suggestions ?

Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use.

Are you mentioning that one demangle function in LinkingContext ?

One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ?

The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string.

I dont know if diagnostics in clang, already redirect things directly to a stream.

May be for now, as an initial implementation, we can have a single demangle function that returns a std::string.

As part of this, I was thinking to cleanup the way the errors are displayed to the user from the Resolver, we could have functions in SymbolTable with

raiseError(SymbolErrorKind, filename, symbolname)
raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname)

SymbolErrorKind :=

MultipleDefinition
Undefined
GroupError
Note (for tracing)
...

Seems like a demangling utility might be something to add at the LLVM level. Either directly to raw_ostream or a wrapper like format().

I have browsed discussions in llvm related to this to move the demangler function which is housed in libcxx, and I dont think there is a plan to move that.

I think the format() specifier would be one thing that would be useful, but I am not sure on how different linking contexts in lld, could route calls with a central format specifier.

Can you share more info on this ?

Thanks

Shankar Easwaran

Hi Nick, Bigcheese,

When lld is used to link C++ code, it would be required to demangle
symbol names by default/user driven option.

The Gnu linker has the following options :-

--demangle=[style]
--no-demangle

I found that clang/llvm-symbolizer use __cxx_demangle function.

I would think that lld also need to call the same function, and I think
the way we want to demangle is to have the function in LinkingContext as
various flavors may choose to use different API's to demangle symbol names.

The API's that would be in LinkingContext would be :-

        * virtual bool canDemangle() = 0; // Does the flavor provide a
way to demangle symbol names ?
        * virtual std::string demangle(StringRef symbolName) = 0; //
demangle the symbol name

Thoughts / Suggestions ?

Wouldn't it be simpler to have one demangle() method that does nothing
(returns input string) if demangling is not available, the string is not a
mangled symbol, or demangling was turned off (--no-demangle). Then, you
just wrap a demangle() call around every use.

Are you mentioning that one demangle function in LinkingContext ?

One demangle method wouldnt work as the ItaniumABI uses one method to
demangle, ARMCXXABI uses a different method, and MSVC uses a different one.
I am not sure about Mach-O here ?

The __cxa_demangle function has an odd interface that requires a malloc

allocated block. Having demangle() return a std::string means yet another
allocation. We might not care if this is just used in diagnostic outputs,
but a more efficient way would be to pass the stream object to demangle and
have it write directly to the stream instead of creating a std::string.

I dont know if diagnostics in clang, already redirect things directly to a
stream.

May be for now, as an initial implementation, we can have a single
demangle function that returns a std::string.

As part of this, I was thinking to cleanup the way the errors are
displayed to the user from the Resolver, we could have functions in
SymbolTable with

raiseError(SymbolErrorKind, filename, symbolname)
raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname)

SymbolErrorKind :=

MultipleDefinition
Undefined
GroupError
Note (for tracing)
...

I'd think error message outputs are a bit scattered in SymbolTable.cpp, but
defining enum values for it is too much. Let's not design too much. I'd
define a function for each error if there are multiple locations printing
the same error. Also this is a separate issue from demangling so we
shouldn't mix them.

Hi Nick, Bigcheese,

When lld is used to link C++ code, it would be required to demangle
symbol names by default/user driven option.

The Gnu linker has the following options :-

--demangle=[style]
--no-demangle

I found that clang/llvm-symbolizer use __cxx_demangle function.

I would think that lld also need to call the same function, and I think
the way we want to demangle is to have the function in LinkingContext as
various flavors may choose to use different API's to demangle symbol names.

The API's that would be in LinkingContext would be :-

        * virtual bool canDemangle() = 0; // Does the flavor provide a
way to demangle symbol names ?
        * virtual std::string demangle(StringRef symbolName) = 0; //
demangle the symbol name
Thoughts / Suggestions ?

Wouldn't it be simpler to have one demangle() method that does nothing
(returns input string) if demangling is not available, the string is not a
mangled symbol, or demangling was turned off (--no-demangle). Then, you
just wrap a demangle() call around every use.

Are you mentioning that one demangle function in LinkingContext ?

One demangle method wouldnt work as the ItaniumABI uses one method to
demangle, ARMCXXABI uses a different method, and MSVC uses a different one.
I am not sure about Mach-O here ?

First, it's really easy to detect which ABI is being used based on the
prefix:
_Z -> standard Itanium demangler (__cxa_demangle)
__Z -> Itanium with a leading _
? -> MSVC

We don't need a virtual method. MinGW people might be linking Itanium
symbols on Windows, and that should demangle just fine if __cxa_demangle is
available.

Second, __cxa_demangle is not available on all platforms, so lld should
just test for it's availability and use it for Itanium symbols if available.

I think the LLVM project has a demangler floating around (libc++?). It
might be nice to find a way to reuse that across projects like this so the
output of LLVM tools doesn't change based on the capabilities of the host.
In other words, it'd be nice if we had a good story for demangled
diagnostics while cross-linking.

The __cxa_demangle function has an odd interface that requires a malloc

Thanks for the info, Reid. We will have a single demangler then, cross-linking is a very good point that you raised. The demangler will check if the first character was a _ and if __cxa_demangle is available, call __cxa_demangle If the first character is a ?, and if MSVC is defined, call UnDecorateSymbolName The above should suffice for now, I think, and if there is a need we could add more to it. - Shankar Easwaran

The API's that would be in LinkingContext would be :-

       * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ?
       * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name

Thoughts / Suggestions ?

Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use.

Are you mentioning that one demangle function in LinkingContext ?

Yes. How do you expect clients to use your proposed canDemangle()/demangle() interface? Seems like it would always be:
  str = sym;
  if (ctx.canDemangle())
     str = ctx.demangle(sym);

My suggestion is to move the canDemangle functionality into demangle, so clients just always use:
    str = ctx.demangle(sym);
and it returns the input string if a demangler is not available or is disabled.

One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ?

Given that, how can we make an lld tool that cross builds the same as it on the native system? Are you thinking of writing your own demangler? Or use whatever one is natively available, and fall back to not demangling if the native demangler cannot demangle the given symbol name (e.g. an MSVS symbol on when running on linux).

The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string.

I dont know if diagnostics in clang, already redirect things directly to a stream.

May be for now, as an initial implementation, we can have a single demangle function that returns a std::string.

Lets look at an example, lld currently has:
            llvm::errs() << "lld warning: shared library symbol "
                              << curShLib->name()
                              << " has different load path in " …

My ideal change would be to something like:

            llvm::errs() << "lld warning: shared library symbol "
                              << ctx.demangle(curShLib->name())
                              << " has different load path in " …

-Nick

The API's that would be in LinkingContext would be :-

        * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ?
        * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name

Thoughts / Suggestions ?

Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use.

Are you mentioning that one demangle function in LinkingContext ?

Yes. How do you expect clients to use your proposed canDemangle()/demangle() interface? Seems like it would always be:
   str = sym;
   if (ctx.canDemangle())
      str = ctx.demangle(sym);

My suggestion is to move the canDemangle functionality into demangle, so clients just always use:
     str = ctx.demangle(sym);
and it returns the input string if a demangler is not available or is disabled.

Yes. This would be much preferrred.

One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ?

Given that, how can we make an lld tool that cross builds the same as it on the native system? Are you thinking of writing your own demangler? Or use whatever one is natively available, and fall back to not demangling if the native demangler cannot demangle the given symbol name (e.g. an MSVS symbol on when running on linux).

a) The function would be non-virtual.
b) I am not planning to write a demangler. I was planning on using abi::__cxx_demangle if there was one available and the first character in the symbol was a _.
     If MSVC was defined, we would use the Undecorate API.

Does this look good ?

The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string.

I dont know if diagnostics in clang, already redirect things directly to a stream.

May be for now, as an initial implementation, we can have a single demangle function that returns a std::string.

Lets look at an example, lld currently has:
             llvm::errs() << "lld warning: shared library symbol "
                               << curShLib->name()
                               << " has different load path in " …

My ideal change would be to something like:

             llvm::errs() << "lld warning: shared library symbol "
                               << ctx.demangle(curShLib->name())
                               << " has different load path in " …

I think we are on the same page. ctx.demangle() would return a string I assume in your case as well.

Thanks

Shankar Easwaran

The demangler that Howard wrote for libc++abi was intended to be general and reusable. It was rewritten eventually because it wasn't a good fit for the C++ runtime library, but it would make sense to import it into one of the LLVM libraries, as a good, general demangler is something that a lot of things (including lldb) would benefit from.

David

+ Chandler/Rafael

I agree with this. I will try to move out the function from libc++ abi to lib/Support and see if what the reviewers would say.

Makes it much easier to call the functionality instead of each tool/component having a separate implementation.

Thanks

Shankar Easwaran

Maybe you should base your work on the "old" libc++abi implementation that was more flexible, but slightly overkill for libc++abi and so was replaced by the current one.

You should be able to find it by browsing the libc++abi source history.

Jean-Daniel

Hi Shankar,

What’s the progress of moving the demangler into LLVM? On Windows, I have to use glog’s demangler for now, but I just bumped into a bug in that code that fails on demangling a function name.

Sorry, I am not working on that at present.

The simplest long-term solution would probably be to just add demangling in a portable way to libSupport. Then we don’t need to conditionalize or anything. We just call llvm::demangle(). This is the only viable solution for proper cross-linking anyway.

Given that there should be existing demanglers we can reuse, I’m not sure that this long term solution is significantly more difficult than the short term solution you are proposing.

– Sean Silva

One source of confusion is libstdc++ owns the API __cxa_demangle, when the demangle code is moved from libc++ to llvm, does libc++ link with libSupport ?

libc++ needs to continue to own the demangle API too, IMO.

Shankar Easwaran

Actually, it would be libsupc++, which just happens to be included in
the dynamic libstdc++. The equivalent would be libc++abi.

Joerg

One source of confusion is libstdc++ owns the API __cxa_demangle, when the
demangle code is moved from libc++ to llvm, does libc++ link with
libSupport ?

libSupport would have nothing to do with the __cxa_demangle symbol. It
would be llvm::demangle (and probably have a different signature, e.g.
StringRef).

-- Sean Silva