LLDB Expression Parser Name Mangling

* I sent this earlier before I registered, sorry in advance if this shows up as a duplicate.

I have been familiarizing myself with the expression parsing code in LLDB with the intention of finding and fixing several expression parser related bugs.

The first issue that I have been investigating in detail is related to calling c_str() on a standard string. The expression fails because LLDB is not able to match up the mangled function name with any names in the symbol table. There is special handling for standard strings in IRForTarget::GetFunctionAddress to support two variants of the mangled name prefix of, _ZNKSbIc and _ZNKSs. The _ZNKSbIc represents basic_string<char> whereas _ZNKSs represents string which is a typedef of basic_string<char>. In this case the full name in the g++ compiled dwarf symbols is _ZNKSs5c_strEv, Clang also generates the same symbol. The call to m_decl_map->GetFunctionAddress is failing because the mangled name that is being generated by the JIT compiled expression is actually the fully specified name, _ZNKSbIcSt17char_traits<char>St15allocator<char>E5c_strEv, which is equivalent to basic_string<char,char_traits<char>,allocator<char>>.
I have been walking through the expression parsing code but have not been able to locate where this name is actually generated. I am guessing the name is generated during the ParseAST but I have not been able to track it down yet, any help would be appreciated.

- Alex

The compiler will generate this in the debug info. When we go looking for a symbol or function, the compiler will ask us where the function is, and we usually find this in the C++ standard library shared library. So you should do a:

(lldb) image dump symtab

This should dump all symbols from all shared libraries and look for std::basic_string somewhere in the mix and see what symbols. You _ZNKSbIcSt17char_traits<char>St15allocator<char>E5c_strEv detangles to:

std::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str() const

on our system. I would check the symbols in your libstdc++ shared library and see what symbols are there for basic_string and see why things aren't matching up.

Greg

Questions from a random observer: Why are these two manglings of std::string hardcoded into the code? Or better yet, why is std::string even special cased to begin with? Is the debug visualization support not sufficient to be able to deal with stl strings in a useful manner?

Questions from a random observer: Why are these two manglings of std::string hardcoded into the code? Or better yet, why is std::string even special cased to begin with? Is the debug visualization support not sufficient to be able to deal with stl strings in a useful manner?

Debug visualizations (which we tend to call “data formatters”) and expression evaluator do not interact a lot, and that’s by design.

So, while I don’t know what the answer to your question is, in general the expectation is that data formatters can present an entirely made-up world, and the expression evaluator won’t be the least affected.

The canonical example is std::map. Data formatters present you logical children, in the form of key/value pairs named [0], [1], …, but there is no expectation that you will be able to type expr myMap[0].value in the general case, even though that is what you see on screen

Questions from a random observer: Why are these two manglings of std::string hardcoded into the code?

Because the compiler will emit one or both and the current libstdc++ only has one mangling in the symbol table. If we don't pull this trick we can end up not finding functions that are required in JITed code when the lookups for these symbols happen in the MCJIT.

Or better yet, why is std::string even special cased to begin with?

Because this one of the few things that there are special aliased mangled names for in the mangling. Not sure why the compiler isn't always forced to use the shorter built in mangling, but it currently isn't. This might only be an issue on MacOSX, I am not sure on how often these issues would present themselves on other systems.

Is the debug visualization support not sufficient to be able to deal with stl strings in a useful manner?

Debug visualization is just fine, it is just when the MCJIT says "I must get an address for this mangled named" and if we aren't able to find it, the expression evaluation fails.

Greg

Questions from a random observer: Why are these two manglings of std::string hardcoded into the code?

Because the compiler will emit one or both and the current libstdc++ only has one mangling in the symbol table. If we don't pull this trick we can end up not finding functions that are required in JITed code when the lookups for these symbols happen in the MCJIT.

Or better yet, why is std::string even special cased to begin with?

Because this one of the few things that there are special aliased mangled names for in the mangling. Not sure why the compiler isn't always forced to use the shorter built in mangling, but it currently isn't. This might only be an issue on MacOSX, I am not sure on how often these issues would present themselves on other systems.

See this:

http://mentorembedded.github.io/cxx-abi/abi.html#mangling-compression

The relevant section is:

   <substitution> ::= St # ::std::
   <substitution> ::= Sa # ::std::allocator
   <substitution> ::= Sb # ::std::basic_string
   <substitution> ::= Ss # ::std::basic_string < char,
             ::std::char_traits<char>,
             ::std::allocator<char> >
   <substitution> ::= Si # ::std::basic_istream<char, std::char_traits<char> >
   <substitution> ::= So # ::std::basic_ostream<char, std::char_traits<char> >
   <substitution> ::= Sd # ::std::basic_iostream<char, std::char_traits<char> >

The abbreviation St is always an initial qualifier, i.e. appearing as the first element of a compound name. It does not require N...E delimiters unless either followed by more than one additional composite name component, or preceded by CV-qualifiers or a ref-qualifier for a member function. This adds the case:

   <name> ::= St <unqualified-name> # ::std::

So if a compiler doesn't apply compression, or does, when the current system was built with/without you can get name lookup failures.

In lldb I dumped the un-mangled symbol for: std::string::c_str() const. The address is: libstdc++.so.6. This address corresponds the the exported symbol _ZNKSs5c_strEv, that I dumped from libstdc++.so.6 at the same file address/value. Both the lldb I am debugging and the inferior a.out I am attached from lldb show dependencies on the same version and location of libstdc++.so.6. I am still not clear if the mangled name, _ZNKSbIcSt17char_traitsSt15allocatorE5c_strEv, is being generated from MCJIT within LLDB and where that happens in the source. As you described in your followup message, according to the name mangling compression rules I would expect this to be mangled as _ZNKSs5c_strEv. On my system (Ubuntu 14.04, gcc.4.82, clang 3.5-1), the standalone compile of the test program mangles as _ZNKSs5c_strEv in dwarf symbols, but in LLDB the name extracted from the function passed to GetFunctionAddress() is mangled as _ZNKSbIcSt17char_traitsSt15allocatorE5c_strEv, which is not compressed as I would have expected. - Alex

Exactly. So the bug here is we are getting and uncompressed name from the JIT. We need to fix this. I looked at the name mangling code and there didn't seem to be an option to not compress, so it will be interesting to see how this uncompressed name is getting created.

Greg