Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

Hi,

I am working on issue that in C++ program for some complex cases with templates showing dynamic type based on RTTI in lldb doesn’t work properly. Consider the following example:

enum class TagType : bool {
        Tag1
};

struct I {
        virtual ~I() = default;
};

template <TagType Tag>
struct Impl : public I {
    private:
        int v = 123;    
};

int main(int argc, const char * argv[]) {
        Impl<TagType::Tag1> impl;
        I& i = impl;
        return 0;
}

For this example clang generates type name “ImplTagType::Tag1” in DWARF and “__ZTS4ImplIL7TagType0EE” when mangling symbols (which lldb demangles to Impl<(TagType)0>). Thus when in ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress() lldb tries to resolve the type, it is unable to find it. More cases and the detailed description why lldb fails here can be found in this clang review, which tries to fix this in clang [1].

However, during the discussion around this review [2], it was pointed out that DWARF names are expected to be close to sources, which clang does perfectly, whereas mangling algorithm is strictly defined. Thus matching them on equality could sometimes fail. The suggested idea in [2] was to implement more semantically aware matching. There is enough information in the DWARF to semantically match “Impl<(TagType)0>)” with “ImplTagType::Tag1”, as enum TagType is in the DWARF, and the enumerator Tag1 is present with its value 0. I have some concerns about the performance of such solution, but I’d like to know your opinion about this idea in general. In case it is approved, I’m going to work on implementing it.

So what do you think about type names inequality and the suggested solution?

[1] - https://reviews.llvm.org/D39622
[2] - http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20171211/212859.html

Thank you,
Anton.

Sorry, I probably shouldn't have used HTML for that message. Converted to plain text.

First off, just a technical point. lldb doesn't use RTTI to find dynamic types, and in fact works for projects like lldb & clang that turn off RTTI. It just uses the fact that the vtable symbol for an object demangles to:

vtable for CLASSNAME

That's not terribly important, but I just wanted to make sure people didn't think lldb was doing something fancy with RTTI... Note, gdb does (or at least used to do) dynamic detection the same way.

If the compiler can't be fixed, then it seems like your solution [2] is what we'll have to try.

As it works now, we get the CLASSNAME from the vtable symbol and look it up in the the list of types. That is pretty quick because the type names are indexed, so we can find it with a quick search in the index. Changing this over to a method where we do some additional string matching rather than just using the table's hashing is going to be a fair bit slower because you have to run over EVERY type name. But this might not be that bad. You would first look it up by exact CLASSNAME and only fall back on your fuzzy match if this fails, so most dynamic type lookups won't see any slowdown. And if you know the cases where you get into this problem you can probably further restrict when you need to do this work so you don't suffer this penalty for every lookup where we don't have debug info for the dynamic type. And you could keep a side-table of mangled-name -> DWARF name, and maybe a black-list for unfound names, so you only have to do this once.

This estimation is based on the assumption that you can do your work just on the type names, without having to get more type information out of the DWARF for each candidate match. A solution that relies on realizing every class in lldb so you can get more information out of the type information to help with the match will defeat all our attempts at lazy DWARF reading. This can cause quite long delays in big programs. So I would be much more worried about a solution that requires this kind of work. Again, if you can reject most potential candidates by looking at the name, and only have to realize a few likely types, the approach might not be that slow.

Jim

Thank you for clarification, Jim, you are right, I misunderstood a little bit what lldb actually does.

It is not that the compiler can't be fixed, it's about the fact that relying on correspondence of mangled and demangled forms are not reliable enough, so we are looking for more robust alternatives. Moreover, I am not sure that such fuzzy matching could be done just basing on class name, so it will require reading more DIEs. Taking into account that, for instance, in our project there are quite many such types, it could noticeable slow down the debugger.

Thus, I'd like to mention one more alternative and get your feedback, if possible. Actually, what is necessary is the correspondence of mangled and demangled vtable symbol. Possibly, it worth preparing a separate section during compilation (like e.g. apple_types), which would store this correspondence? It will work fast and be more reliable than the current approach, but certainly, will increase debug info size (however, cannot estimate which exact increase will be, e.g. in persent).

What do you think? Which solution is preferable?

Thanks,
Anton.

Hi Anton and Jim,

What do you think about storing the mangled type name or the mangled vtable symbol name somewhere in DWARF in the DW_AT_MIPS_linkage_name attribute? We are already doing it for the mangled names of functions so extending it to types shouldn’t be too controversial.

Tamas

The linkage-name attribute was really intended for definitions of objects that have static memory addresses (static/global variables, and functions), but adding it to a class description would have an obvious meaning and seems completely in line with how DWARF works.

Given the size of mangled names, you probably want to do this only for definitions of classes that have vtables. With that caveat, I’d have no problem doing this.

–paulr

Hi Tamas,

First, why DW_AT_MIPS_linkage_name, but not just DW_AT_linkage_name? The later is standartized and currently generated by clang at least on x64.

Second, this doesn’t help to solve the issue, because this will require parsing all the DWARF types during startup to build a map that breaks DWARF lazy load, performed by lldb. Or am I missing something?

Thanks,
Anton.

Hi,

I thought most compiler still emits DW_AT_MIPS_linkage_name instead of the standard DW_AT_linkage_name but I agree that if we can we should use the standard one.

Regarding performance we have 2 different scenarios. On Apple platforms we have the apple accelerator tables to improve load time (might work on FreeBsd as well) while on other platforms we Index the DWARF data (DWARFCompileUnit::Index) to effectively generate accelerator tables in memory what is a faster process then fully parsing the DWARF (currently we only parse function DIEs and we don’t build the clang types). I think an ideal solution would be to have the vtable name stored in DWARF so the DWARF data is standalone and then have some accelerator tables to be able to do fast lookup from mangled symbol name to DIE offset. I am not too familiar with the apple accelerator tables but if we have anything what maps from mangled name to DIE offset then we can add a few entry to it to map from mangled vtable name to type DIE or vtable DIE.

Tamas

I agree with Tamas. The right way to do this it to add the DW_AT_linkage_name to the class. Apple accelerator tables have many different forms, but one is a mapping of type name to exact DIE offset (in the _DWARF segment in the __apple_types section). If the mangled name was added to the class, then the apple accelerator tables would have it. So when a lookup happens with these tables around, we do a very quick hash lookup, and we find the exact DIE (or DIEs) we need. Entries for classes in the Apple accelerator tables have both the mangled and raw class name as entries pointing to the same DIE since lookups don’t usually happen via mangled names. LLDB also knows how to pull names apart and search correctly, so if someone tries to lookup a type with “a::b::MyClass”, we will chop that up into “MyClass” and do a lookup on that. We might get many many different “MyClass” results back (a::c::MyClass, ::MyClass, b::MyClass), but then we cull those down by making sure any matches have a matching decl context of “a::b::”. For mangled names, it is easy and just a direct lookup.

The apple accelerator tables are only enabled for Darwin target, but there is nothing to say we couldn’t enable these for other targets in ELF files. It would be a quick way to gauge the performance improvement that these accelerator tables provide for linux. Currently linux will completely index the DWARF, but it will load the DWARF, index it, and unload the DWARF so we don’t hog memory for things we don’t need loaded yet. We must manually index the DWARF because the DWARF accelerator tables are really not accelerator tables, they are random indexes of related data (names in no particular order, addresses in or particular order). These tables are also not complete so no debugger can rely on them. For example “.debug_pubtypes” is for “public” types only. “.debug_pubnames” is a random name table with only public functions (no static functions or functions in anonymous namespaces). So the DWARF accelerator tables can’t be used by debuggers.

There is now a modified version of the Apple accelerator tables in the DWARF standard that can provide the same data as the Apple versions, but I don’t believe anyone has added this support to any compilers yet. So for simplicity, we can try things out with the Apple accelerator tables and see how things go.

Another solution involves using llvm-dsymutil, a DWARF linker that is used on Apple platforms. It is a tool that is normally run on executables where the DWARF is left in the .o files and linked later into final DWARF files. This tool also has a “–update” option that take a linked dSYM file and updates the accelerator tables in case they change over time, or in case an older version of llvm-dsymutil didn’t add everything that was needed to the tables due to a bug. So another way we can try this out is to modify the llvm-dsymutil to work with ELF files and have it generate and add the Apple accelerator tables to the ELF files. This is nice because it allows us to use DWARF that is generated by any compiler (no need for the compiler to support making the accelerator tables). This would a great way to try out the accelerator tables without requiring compiler changes.

The short term solution is to validate that the Apple accelerator tables work and do speed debugging up by a large amount. The long term solution is to have clang start emitting the new DWARF accelerator tables and modify LLDB to support and use those tables.

Let me know if there are any questions on any of this.

Greg Clayton

Tamas, Greg, thank you, I got the idea how it should work without accelerator tables, but I still cannot figure out how to use/update the existing accelerator tables. So let me walk trough it once again:
1. It is necessary to perform lookup by mangled name (as all we initially have is mangled "vtable for ClassName"-symbol).
2. All the existing apple accelerator tables (e.g. apple_types) have demangled and unqualified names as a key.
3. It is not always possible to get the original demanled type name by the mangled one (e.g. for templates parametrized with enums the demangled one is Impl<(TagType)0> vs original Impl<TagType::Tag1>, but there are more complex cases).

Thus, I don't see how adding DW_AT_linkage_name to vtable member of class (or even to class itself) could help, as it still won't be possible to resolve DIE by the mangled type name. However possible solutions are:
1. To generate a separate accelerator table: mangled name for vtable member of a class => DIE;
2. Build index on startup iterating through the apple_types and gather the map mangled name => DIE;

Greg, did you mean some of these or something else?

Thanks,
Anton.

19.12.2017 19:39, Greg Clayton wrote:

Tamas, Greg, thank you, I got the idea how it should work without accelerator tables, but I still cannot figure out how to use/update the existing accelerator tables. So let me walk trough it once again:

  1. It is necessary to perform lookup by mangled name (as all we initially have is mangled “vtable for ClassName”-symbol).
  2. All the existing apple accelerator tables (e.g. apple_types) have demangled and unqualified names as a key.
  3. It is not always possible to get the original demanled type name by the mangled one (e.g. for templates parametrized with enums the demangled one is Impl<(TagType)0> vs original ImplTagType::Tag1, but there are more complex cases).

Thus, I don’t see how adding DW_AT_linkage_name to vtable member of class (or even to class itself) could help, as it still won’t be possible to resolve DIE by the mangled type name. However possible solutions are:

  1. To generate a separate accelerator table: mangled name for vtable member of a class => DIE;
  2. Build index on startup iterating through the apple_types and gather the map mangled name => DIE;

Greg, did you mean some of these or something else?

I didn’t realize that the mangled name differs in certain cases and that it wouldn’t suffice for a lookup. Can you give an example of the name we try looking up versus what is actually in the symbol table?

IIUC right now we lookup the address of the first pointer within a class if it is virtual and find the symbol name that this corresponds to, and in the failing cases you have we don’t find anything in the DWARF that matches. Is that right?

19.12.2017 23:12, Greg Clayton wrote:

Tamas, Greg, thank you, I got the idea how it should work without accelerator tables, but I still cannot figure out how to use/update the existing accelerator tables. So let me walk trough it once again:
 1. It is necessary to perform lookup by mangled name (as all we initially have is mangled “vtable for ClassName”-symbol).
 2. All the existing apple accelerator tables (e.g. apple_types) have demangled and unqualified names as a key.
 3. It is not always possible to get the original demanled type name by the mangled one (e.g. for templates parametrized with enums the demangled one is Impl<(TagType)0> vs original ImplTagType::Tag1, but there are more complex cases).

Thus, I don’t see how adding DW_AT_linkage_name to vtable member of class (or even to class itself) could help, as it still won’t be possible to resolve DIE by the mangled type name. However possible solutions are:
 1. To generate a separate accelerator table: mangled name for vtable member of a class => DIE;
 2. Build index on startup iterating through the apple_types and gather the map mangled name => DIE;

Greg, did you mean some of these or something else?

I didn’t realize that the mangled name differs in certain cases and that it wouldn’t suffice for a lookup. Can you give an example of the name we try looking up versus what is actually in the symbol table?

Case 1:

enum class TagType : bool {
     Â  Â Tag1
};

struct I {
     Â  Â virtual ~I() = default;
};

template <TagType Tag>
struct Impl : public I {
 Â  Â private:
     Â  Â int v = 123;    
};

int main(int argc, const char * argv[]) {
     Â  Â Impl<TagType::Tag1> impl;
     Â  Â I& i = impl;
     Â  Â return 0;
}
lldb demangles the name to Impl<(TagType)0> and it's "Impl<TagType::Tag1>" in DWARF generated by clang.

Case 2:
struct I 
{
  virtual ~I(){}
};

template <int Tag>
struct Impl : public I
{
        int v = 123;
};

template <>
struct Impl<1+1+1> : public I  // Note the expression used for this specialization
{
        int v = 124;
};

template <class T>
struct TT {
  I* i = new T();
};

int main(int argc, const char * argv[]) {
    TT<Impl<3>> tt;
    return 0;  // [*]
}
lldb demangles name to "Impl<3>", whereas clang generates "Impl<1+1+1>" in DWARF.

IIUC right now we lookup the address of the first pointer within a class if it is virtual and find the symbol name that this corresponds to, and in the failing cases you have we don’t find anything in the DWARF that matches. Is that right?

Exactly, for the cases above and some others.

I was actually experimenting with this last month. Unfortunately, I've
learned that the situation is not as simple as flipping a switch in
the compiler. In fact, there is no switch to flip as clang will
already emit the apple tables if you pass -glldb. However, the
resulting tables will be unusable due to the differences in how dwarf
is linked on elf vs mach-o. In elf, we have the linker concatenate the
debug info into the final executable/shared library, which it will
also happily do for the .apple_*** sections.

The problem comes from the then we are not able to pry these sections
apart in the debugger, as they are not self-terminating and the linker
will not insert any metadata to help us with that. So all we can do in
the debugger is see that the .apple_names section is present, and
index the first table in the section (which is quite useless). To get
around this we would need to teach the linker(s) (elf targets use many
linkers) to merge these accelerator tables in the same way that
dsymutil does, or modify the build process to insert an additional
dsymutil step.

The second, more subtle problem I see is that these tables are an
all-or-nothing event. If we see an accelerator table, we assume it is
an index of the entire module, but that's not likely to be the case,
especially in the early days of this feature's uptake. You will have
people feeding the linkers with output from different compilers, some
of which will produce these tables, and some not. Then the users will
be surprised that the debugger is ignoring some of their symbols.

The easiest way to make the apple tables work might be to use them in
the split-dwarf scenario, as this is kinda similar to the mac
non-dsym-bundle scenario, where the debug info remains in the original
.o file and is not touched by the linker. Unfortunately, currently the
combination of -glldb and -gsplit-dwarf makes the compilation fail.

However, I would actually advocate for pushing for the dwarf-5
accelerator tables in the ELF case. The reason is that these solve the
both problems I mention above:
- they don't require any special support from the rest of the
toolchain -- if a linker just concatenates the tables, the debugger
will still be able to use them as indexes of the individual
compilation units. Of course, if the linker knows about these tables,
it can merge them and provide a single uber-table for indexing the
full module, but this is optional. This makes incremental deployment
much smoother, which leads me to the next item,
- they support mixing indexed and non-indexed .o files -- the tables
contain a list of compilation units they cover, so the debugger can
match this against the actual list of CUs, and if it sees that one
unit is not covered by any index, it can index that one manually.

This is probably a bit more work than just "flipping a switch", but I
hope it will not be too much work. The layout and contents of the
tables are generally the same, so I am hoping most of the compiler
code for the apple tables can be reused for the dwarf5 tables. If
things turn out they way I want them to, I'll be able to work on
getting this done next year.

The apple accelerator tables are only enabled for Darwin target, but there
is nothing to say we couldn't enable these for other targets in ELF files.
It would be a quick way to gauge the performance improvement that these
accelerator tables provide for linux.

I was actually experimenting with this last month. Unfortunately, I've
learned that the situation is not as simple as flipping a switch in
the compiler. In fact, there is no switch to flip as clang will
already emit the apple tables if you pass -glldb. However, the
resulting tables will be unusable due to the differences in how dwarf
is linked on elf vs mach-o. In elf, we have the linker concatenate the
debug info into the final executable/shared library, which it will
also happily do for the .apple_*** sections.

That ruins the whole idea of the accelerator tables if they are concatenated...

The problem comes from the then we are not able to pry these sections
apart in the debugger, as they are not self-terminating and the linker
will not insert any metadata to help us with that. So all we can do in
the debugger is see that the .apple_names section is present, and
index the first table in the section (which is quite useless). To get
around this we would need to teach the linker(s) (elf targets use many
linkers) to merge these accelerator tables in the same way that
dsymutil does, or modify the build process to insert an additional
dsymutil step.

agreed that is the only way to make this work.

The second, more subtle problem I see is that these tables are an
all-or-nothing event. If we see an accelerator table, we assume it is
an index of the entire module, but that's not likely to be the case,
especially in the early days of this feature's uptake. You will have
people feeding the linkers with output from different compilers, some
of which will produce these tables, and some not. Then the users will
be surprised that the debugger is ignoring some of their symbols.

I think it is best to auto generate the tables from the DWARF directly after it has all been linked. Skip teaching the linker about merging it, just teach it to generate it.

The easiest way to make the apple tables work might be to use them in
the split-dwarf scenario, as this is kinda similar to the mac
non-dsym-bundle scenario, where the debug info remains in the original
.o file and is not touched by the linker. Unfortunately, currently the
combination of -glldb and -gsplit-dwarf makes the compilation fail.

However, I would actually advocate for pushing for the dwarf-5
accelerator tables in the ELF case. The reason is that these solve the
both problems I mention above:
- they don't require any special support from the rest of the
toolchain -- if a linker just concatenates the tables, the debugger
will still be able to use them as indexes of the individual
compilation units. Of course, if the linker knows about these tables,
it can merge them and provide a single uber-table for indexing the
full module, but this is optional. This makes incremental deployment
much smoother, which leads me to the next item,
- they support mixing indexed and non-indexed .o files -- the tables
contain a list of compilation units they cover, so the debugger can
match this against the actual list of CUs, and if it sees that one
unit is not covered by any index, it can index that one manually.

This is probably a bit more work than just "flipping a switch", but I
hope it will not be too much work. The layout and contents of the
tables are generally the same, so I am hoping most of the compiler
code for the apple tables can be reused for the dwarf5 tables. If
things turn out they way I want them to, I'll be able to work on
getting this done next year.

Modifying llvm-dsymutil to handle ELF so we can use "llvm-dsymutil --update foo.elf" is the quickest way that doesn't involve modifying anything but llvm-dsymutil. It will generate the accelerator tables manually and add/modify the existing accelerator tables and write out the new elf file that is all fixed up. I would suggest going this route at first to see what performance improvements we will see with linux so that can drive how quickly we need to adopt this.

I like this. The question is - has everything we need in llvm-dsymutil been upstreamed by Apple?

From: lldb-dev [mailto:lldb-dev-bounces@lists.llvm.org] On Behalf Of Greg
Clayton via lldb-dev
Sent: Wednesday, December 20, 2017 12:41 PM
To: Pavel Labath <labath@google.com>
Cc: lldb-dev@lists.llvm.org
Subject: Re: [lldb-dev] Resolving dynamic type based on RTTI fails in case of
type names inequality in DWARF and mangled symbols

Modifying llvm-dsymutil to handle ELF so we can use "llvm-dsymutil --update
foo.elf" is the quickest way that doesn't involve modifying anything but llvm-
dsymutil. It will generate the accelerator tables manually and add/modify the
existing accelerator tables and write out the new elf file that is all fixed up. I
would suggest going this route at first to see what performance improvements
we will see with linux so that can drive how quickly we need to adopt this.

I like this. The question is - has everything we need in llvm-dsymutil been upstreamed by Apple?

Unfortunately not. Accelerator tables is one big missing piece (not because we don’t want to, it was blocked on some reviews). We plan to work on this in the coming months though.

Fred

The apple accelerator tables are only enabled for Darwin target, but there
is nothing to say we couldn't enable these for other targets in ELF files.
It would be a quick way to gauge the performance improvement that these
accelerator tables provide for linux.

I was actually experimenting with this last month. Unfortunately, I've
learned that the situation is not as simple as flipping a switch in
the compiler. In fact, there is no switch to flip as clang will
already emit the apple tables if you pass -glldb. However, the
resulting tables will be unusable due to the differences in how dwarf
is linked on elf vs mach-o. In elf, we have the linker concatenate the
debug info into the final executable/shared library, which it will
also happily do for the .apple_*** sections.

That ruins the whole idea of the accelerator tables if they are concatenated...

I'm not sure I'm convinced by that. I mean, obviously it's better if
you have just a single table to look up, but even if you have multiple
tables, looking up into each one may be faster that indexing the full
debug info yourself. Take liblldb for example. It has ~3000 compile
units and nearly 2GB of debug info. I don't have any solid data on
this (and it would certainly be interesting to make this experiment),
but I expect that doing 3000 hash lookups (which are basically just
array accesses) would be faster than indexing 2GB of dwarf (where you
have to deal with variable-sized fields and uleb encodings...). And
there is always the possibility to do the lookups in parallel or merge
the individual tables inside the debugger.

The second, more subtle problem I see is that these tables are an
all-or-nothing event. If we see an accelerator table, we assume it is
an index of the entire module, but that's not likely to be the case,
especially in the early days of this feature's uptake. You will have
people feeding the linkers with output from different compilers, some
of which will produce these tables, and some not. Then the users will
be surprised that the debugger is ignoring some of their symbols.

I think it is best to auto generate the tables from the DWARF directly after it has all been linked. Skip teaching the linker about merging it, just teach it to generate it.

If the linker does the full generation, then how is that any better
than doing the indexing in the debugger? Somebody still has to parse
the entire dwarf, so it might as well be the debugger. I think the
main advantage of doing it in the compiler is that the compiler
already has all the data about what should go into the index ready, so
it can just build it as it goes about writing out the object file.
Then, the merging should be a relatively simple and fast operation
(and the linker does not even have to know how to parse dwarf). Isn't
this how the darwin workflow works already?

This is probably a bit more work than just "flipping a switch", but I
hope it will not be too much work. The layout and contents of the
tables are generally the same, so I am hoping most of the compiler
code for the apple tables can be reused for the dwarf5 tables. If
things turn out they way I want them to, I'll be able to work on
getting this done next year.

Modifying llvm-dsymutil to handle ELF so we can use "llvm-dsymutil --update foo.elf" is the quickest way that doesn't involve modifying anything but llvm-dsymutil. It will generate the accelerator tables manually and add/modify the existing accelerator tables and write out the new elf file that is all fixed up. I would suggest going this route at first to see what performance improvements we will see with linux so that can drive how quickly we need to adopt this.

I'm not sure now whether you're suggesting to use the dsymutil
approach just to gauge the potential speedup we can obtain and get
people interested, or as a productized solution. If it's the first one
then I fully agree with you. Although I think I can see an even
simpler way to estimate the speedup: build lldb for mac with apple
indexes disabled and compare its performance to a vanilla one. I'm
going to see if I can get some numbers on this today.

The apple accelerator tables are only enabled for Darwin target, but there
is nothing to say we couldn't enable these for other targets in ELF files.
It would be a quick way to gauge the performance improvement that these
accelerator tables provide for linux.

I was actually experimenting with this last month. Unfortunately, I've
learned that the situation is not as simple as flipping a switch in
the compiler. In fact, there is no switch to flip as clang will
already emit the apple tables if you pass -glldb. However, the
resulting tables will be unusable due to the differences in how dwarf
is linked on elf vs mach-o. In elf, we have the linker concatenate the
debug info into the final executable/shared library, which it will
also happily do for the .apple_*** sections.

That ruins the whole idea of the accelerator tables if they are concatenated...

I'm not sure I'm convinced by that. I mean, obviously it's better if
you have just a single table to look up, but even if you have multiple
tables, looking up into each one may be faster that indexing the full
debug info yourself. Take liblldb for example. It has ~3000 compile
units and nearly 2GB of debug info. I don't have any solid data on
this (and it would certainly be interesting to make this experiment),
but I expect that doing 3000 hash lookups (which are basically just
array accesses) would be faster than indexing 2GB of dwarf (where you
have to deal with variable-sized fields and uleb encodings...). And
there is always the possibility to do the lookups in parallel or merge
the individual tables inside the debugger.

The second, more subtle problem I see is that these tables are an
all-or-nothing event. If we see an accelerator table, we assume it is
an index of the entire module, but that's not likely to be the case,
especially in the early days of this feature's uptake. You will have
people feeding the linkers with output from different compilers, some
of which will produce these tables, and some not. Then the users will
be surprised that the debugger is ignoring some of their symbols.

I think it is best to auto generate the tables from the DWARF directly after it has all been linked. Skip teaching the linker about merging it, just teach it to generate it.

If the linker does the full generation, then how is that any better
than doing the indexing in the debugger? Somebody still has to parse
the entire dwarf, so it might as well be the debugger.

I suppose, the difference is that linker does it one time and debugger has to do it every time on startup, as the results are not saved anywhere (or are they?). So possibly, instead of building accelerator tables by compiler for debugger, possibly, the debugger should save its own indexes somewhere (e.g. in a cache-file near the binary)? Or is there already such mechanism and I just don't know about it?

Currently the indexes aren't saved, but that is exactly where I was
going with this. We *could* save this index (we already cache
downloaded remote object files in ~/.lldb/module_cache, we could just
put this next to it) and reuse it for the subsequent debug sessions.

The apple accelerator tables are only enabled for Darwin target, but there
is nothing to say we couldn’t enable these for other targets in ELF files.
It would be a quick way to gauge the performance improvement that these
accelerator tables provide for linux.

I was actually experimenting with this last month. Unfortunately, I’ve
learned that the situation is not as simple as flipping a switch in
the compiler. In fact, there is no switch to flip as clang will
already emit the apple tables if you pass -glldb. However, the
resulting tables will be unusable due to the differences in how dwarf
is linked on elf vs mach-o. In elf, we have the linker concatenate the
debug info into the final executable/shared library, which it will
also happily do for the .apple_*** sections.

That ruins the whole idea of the accelerator tables if they are concatenated…

I’m not sure I’m convinced by that. I mean, obviously it’s better if
you have just a single table to look up, but even if you have multiple
tables, looking up into each one may be faster that indexing the full
debug info yourself. Take liblldb for example. It has ~3000 compile
units and nearly 2GB of debug info. I don’t have any solid data on
this (and it would certainly be interesting to make this experiment),
but I expect that doing 3000 hash lookups (which are basically just
array accesses) would be faster than indexing 2GB of dwarf (where you
have to deal with variable-sized fields and uleb encodings…). And
there is always the possibility to do the lookups in parallel or merge
the individual tables inside the debugger.

The main idea is to touch as few pages as possible when doing searches. We effectively have this scenario right now with Apple DWARF in .o file debugging. So much time is spent paging in each accelerator table that we have very long delays starting up large apps. This would be more localized, but there would be a similar issue. Concatenation would be fine for now if we make it work, but for long term archival, the real solution is to merge the tables.

The second, more subtle problem I see is that these tables are an
all-or-nothing event. If we see an accelerator table, we assume it is
an index of the entire module, but that’s not likely to be the case,
especially in the early days of this feature’s uptake. You will have
people feeding the linkers with output from different compilers, some
of which will produce these tables, and some not. Then the users will
be surprised that the debugger is ignoring some of their symbols.

I think it is best to auto generate the tables from the DWARF directly after it has all been linked. Skip teaching the linker about merging it, just teach it to generate it.

If the linker does the full generation, then how is that any better
than doing the indexing in the debugger?

It would be better in that debugging the same thing twice would be super quick.

Somebody still has to parse
the entire dwarf, so it might as well be the debugger. I think the
main advantage of doing it in the compiler is that the compiler
already has all the data about what should go into the index ready, so
it can just build it as it goes about writing out the object file.

This is kind of why I would really like to see the “llvm-dsymutil --update” work, in case the compiler has bugs where it doesn’t generate things correctly. The question is how much time does it cost the compiler to generate vs we generate it in the linker or post linking.

Then, the merging should be a relatively simple and fast operation
(and the linker does not even have to know how to parse dwarf). Isn’t
this how the darwin workflow works already?

Sure is easier on the linker. But as I stated above, paging in many tables is really slow for thousands of object files with the MacOS DWARF in .o files with a link map in the main executable.

This is probably a bit more work than just “flipping a switch”, but I
hope it will not be too much work. The layout and contents of the
tables are generally the same, so I am hoping most of the compiler
code for the apple tables can be reused for the dwarf5 tables. If
things turn out they way I want them to, I’ll be able to work on
getting this done next year.

Modifying llvm-dsymutil to handle ELF so we can use “llvm-dsymutil --update foo.elf” is the quickest way that doesn’t involve modifying anything but llvm-dsymutil. It will generate the accelerator tables manually and add/modify the existing accelerator tables and write out the new elf file that is all fixed up. I would suggest going this route at first to see what performance improvements we will see with linux so that can drive how quickly we need to adopt this.

I’m not sure now whether you’re suggesting to use the dsymutil
approach just to gauge the potential speedup we can obtain and get
people interested, or as a productized solution. If it’s the first one
then I fully agree with you. Although I think I can see an even
simpler way to estimate the speedup: build lldb for mac with apple
indexes disabled and compare its performance to a vanilla one. I’m
going to see if I can get some numbers on this today.

You are correct in that I want to gauge the potential speedup with llvm-dsymutil so we know how much effort we should put into this on the linux and other platform side. The nice thing about the llvm-dsymutil approach is it allows anyone to try it out on their system. You are correct that we can disable the accelerator tables on Darwin by doing full build of clang with debug info and doing a few startup tests. The results were huge for us when we did that: 2 minutes without accelerator tables, under 5 seconds with them.

Greg