Adding D language demangling support

Hi all,
I recently looked into adding demangling support for D in LLDB, but got lost in the code.
(right now, basic D support is there with: https://reviews.llvm.org/D24794)

I’d like some pointers to where demangling is done for the other languages, and to where I should add D support for it.

Thanks a lot,
Johan

C++ demangler is in libcxxabi (and a copy is kept in LLVM itself).

LLDB includes a “fast” demangler and falls back to the libcxxabi one when the “fast” one fails.

David Majnemer mentioned he was interested in rewriting a demangler functionality in LLVM, I don’t know the scope though (could it have a common infrastructure for multiple language/scheme?).

It might be nice to add demangling support to llvm and then use it by modifying "Mangled::GetDemangledName()" in Mangled.cpp. This is where all demangling happens. Hopefully you have a great prefix that won't conflict with other languages "_Z" for C++, "_T" for swift. But the code in Mangled::GetDemangledName() will look at the prefix and attempt to demangle the name based on what prefix it starts with.

Is there a way to provide a hook (eg, via an extern(C) function, or using a dynamically loaded shared library) to do this, so as to simply reuse D’s https://dlang.org/phobos/std_demangle.html and make sure it’s always in sync with D’s demangling instead of duplicating code

There is no external demangling plug-in infrastructure at the moment, but you could add functionality that would allow it. No one is going to have D installed by default. Where do you expect your demangler dylib to live? Would you just add code that tries to locate the dylib in N places on the current system and try to dlopen it? Avoiding duplication and just not having the functionality at all unless something else is might not make it that useful. Is D stable? Is the mangling changing at all? Will you require a demangler to be vended with each new version of the tool? Are all previous demanglings still valid in newer versions? Can you figure out the version of the D from a compiled executable so that you might be able to locate one of 5 different installs of D and select the right one? Let me know what you use case is.

Greg

There is no external demangling plug-in infrastructure at the moment, but
you could add functionality that would allow it. No one is going to have D
installed by default. Where do you expect your demangler dylib to live?

Would you just add code that tries to locate the dylib in N places on the

current system and try to dlopen it? Avoiding duplication and just not
having the functionality at all unless something else is might not make it
that useful. Is D stable? Is the mangling changing at all? Will you require
a demangler to be vended with each new version of the tool? Are all
previous demanglings still valid in newer versions? Can you figure out the
version of the D from a compiled executable so that you might be able to
locate one of 5 different installs of D and select the right one? Let me
know what you use case is.

Greg

one simple flexible backward compatible option would be to have a generic
environment variable:

export LLDB_DEMANGLER_EXE="/usr/bin/ddemangle"
lldb myprog

inside lldb (D-like pseudo code):

bool demangle(string symbol, string* output){
  auto path=env["LLDB_DEMANGLER_EXE"];
  if(!path.empty) {
     auto demangleCustom=cast(proper_type) dlopen(path);
     if(demangleCustom(symbol, output)) return true;
     // fallsback to default code if custom code didn't handle symbol
  }
  return run_default_lldb_demangle(symbol, output);
}

user defined demangler (eg D's demangler)

// return true if can demangle symbol (ie it's a D symbol in our case)
bool demangleCustom(string symbol, string* output);

Is the mangling changing at all?

yes, there's some ongoing work on making the mangling scheme produce much
shorter symbols. The logic is complex, and it'd be a lot of work to
reproduce this.

Bottomline: this scheme is very flexible, and it'd be no less useful than
current situation, where lldb just returns the symbol unchanged if it can't
demangle.

Sounds like you could then make a setting that is a dictionary where you say what the prefix is (like maybe "_D") and the value is the path to the tool to use? This would be easy to implement. Demangling does tend to be one of the most expensive parts of symbol file and debug info parsing, so if you do this, you will want to make sure the shell tool can be spawned and kept running maybe?

Greg

Sounds like you could then make a setting that is a dictionary where you
say what the prefix is (like maybe "_D") and the value is the path to the
tool to use? This would be easy to implement. Demangling does tend to be
one of the most expensive parts of symbol file and debug info parsing, so
if you do this, you will want to make sure the shell tool can be spawned
and kept running maybe?

Greg

where in the lldb code would be such entry point?

instead of a binary it can just be a library dynamically loaded via dlopen
(as i wrote, though I should've called it LLDB_DEMANGLER_LIB instead
of LLDB_DEMANGLER_EXE),
and the dynamically loaded symbol be cached to make sure it's dlopen'd at
most once per process.

Then it's easy enough for us to write a demangleCustom that is fast on the
D side of things. It can also work with a binary instead of a dllib but
would be a bit slower (could have a client server model, but that's more
complex than the simple dllib solution i was proposing).

yes, we could use a prefix for that as well.

You could have a setting that allows you to specify prefix as the key with a dylib path as a value. Would you expect a function with certain name or would you need to specify the function name (probably mangled) as well? Let me know what you are thinking?

Greg

It’d be great if an external lib could be used for the demangling.

Zooming out a little:
Mangled::GetDemangledName(lldb::LanguageType language) takes a language as parameter (unused), which to me looks like the plan was to move this into the language plugins?
Then D could have its own language plugin (possibly dynamically loaded). The D plugin could then do the dynamic loading of ddemangleCustom.

-Johan

It'd be great if an external lib could be used for the demangling.

Zooming out a little:
`Mangled::GetDemangledName(lldb::LanguageType language)` takes a language as parameter (unused), which to me looks like the plan was to move this into the language plugins?

That was added only because pascal uses the Itanium mangling scheme and the same mangled name needs to be demangled in different ways. We can move the demangling to the language plug-ins at some point, but it has to be a static thing. Let me know if you are interested in doing this work and I can guide you. Or maybe we can start with D and see if we like the solution we come up with and if so move to that.

Then D could have its own language plugin (possibly dynamically loaded). The D plugin could then do the dynamic loading of ddemangleCustom.

We currently don't expose the lldb_private layer in LLDB to any plugins and that is the layer that would be needed. The lldb_private layer isn't really an API that we can expose because it is free to change at any time. Our public API, anything in the "lldb" namespace, is an API and we don't break it. We could try to make these plug-ins at the public layer, but we can't done that with anything except command line command plugins.

Greg

You could have a setting that allows you to specify prefix as the key with
a dylib path as a value. Would you expect a function with certain name or
would you need to specify the function name (probably mangled) as well? Let
me know what you are thinking?

whatever works it doesn't really matter so long there's something to get
started, I was going for something simple to start with but if you want
this level of flexibility how about using a json config file:

export LLDB_DEMANGLE_CONFIG_FILE="~/.lldbl.demangle.conf"

cat ~/.lldbl.demangle.conf

{"demangle":
  ["D": {"prefix" : "_D", "shared_libary_file" :
"/path/libdemangled.dylib", "mangled_name", "_demangle_custom_D"}],
  ["nim": /* same for nim language */ ],
}

Greg

I like the JSON approach. We might need to include the mangled name for the function or specify where arguments go if we aren't going to expect a canned function to be in each dylib. That is a bit harder, but something we should think about.

If we look at __cxa_demangle:

char* abi::__cxa_demangle(const char *mangled_name, char *output_buffer, size_t *length, int *status);

I am not sure how we would logically specify this in the JSON... Where to put the name to demangle, how to call it etc...

Timothee, do you intend to work on this?
What can I do to help?

In the meanwhile, I’d appreciate it if someone could take a look at https://reviews.llvm.org/D24794 (currently, debugging D code is very much broken without that change).

-Johan

Just did, and it looks good.

update:

* D now correctly prefixes its symbols with an extra underscore on OSX
(cf Change Log: 2.079.0 - D Programming Language) and gdb
correctly demangles D symbols
* in https://github.com/dlang/druntime/pull/2083 I had a PR to support
demangling C++ symbols along with D symbols for D programs via runtime
loading (dlopen) of libc++ ; could we use a similar technique for lldb
by allowing user to dlopen a shared library that would customize
demangling?

I made it work:
https://github.com/llvm-mirror/lldb/pull/3
(note: also requires the D plugin on D side which I can submit to
another repo separately, and which is small)

not sure if lldb accepts github PR's but that's the simplest I could do

No, llvm/lldb is still on svn so we don't really accept pull requests
yet. You can submit a new review on Phabricator though.
That said, thank you for your contribution.
For new languages, we want to have a high quality barrier for entry. I
really appareciate the fact that you took the time to split in
multiple patches.
Every change that needs to be committed to lldb needs to have a test
associated.
You may consider taking a look at the tests in `lit/` or the ones in
`test/` and add tests for your changes.
Don't hesitate to ask if you get stuck/have other questions.

Thank you,

moved to: https://reviews.llvm.org/D44321