[RFC][LLDB] Handling abi_tagged Constructors/Destructors in Expression Evaluator

Michael137 · October 29, 2024, 11:49am

This RFC seeks to discuss how LLDB’s expression evaluator should handle calls to C++ constructors/destructors (I refer to these as “structors” throughout the document) that are annotated with the [[gnu::abi_tag]] attribute.

Problem

Given this example:

struct Tagged {
  [[gnu::abi_tag("CtorTag")]] Tagged() {}
};

int main() {
	Tagged t;
	return 0; // break here
}

Calling the Tagged() constructor will fail in LLDB with following error:

(lldb) target create "a.out"                                                                       
(lldb) r                                                                                           
Process 961323 launched: 'a.out' (x86_64)  
Process 961323 stopped                                                                             
* thread #1, name = 'a.out', stop reason = breakpoint 1.1                                          
    frame #0: 0x0000000000401114 a.out`main at tags.cpp:7:2                                        
   4                                                                                               
   5    int main() {                                                                               
   6            Tagged t;                                                                          
-> 7            return 0; // break here                                                             
   8    }                                                                                          
(lldb) expr Tagged()                                                                               
error: Couldn't lookup symbols:                                                                    
  Tagged::Tagged()

If we look at the IR that Clang lowered the LLDB (DWARF) AST into,
the reason for the failed symbol lookup becomes more apparent:

lldb             Module as passed in to IRForTarget: 
"; ModuleID = '$__lldb_module'
source_filename = "$__lldb_module"

<-- snipped -->

define dso_local void @"_Z12$__lldb_exprPv"(ptr %"$__lldb_arg") #0 {
entry:
  <-- snipped -->

init.check:                                       ; preds = %entry
  call void @_ZN6TaggedC1Ev(ptr nonnull align 1 dereferenceable(1) @"_ZZ12$__lldb_exprPvE19$__lldb_expr_result") #2

We’re trying to call a function symbol with mangled name _ZN6TaggedC1Ev.
However, no such symbol exists in the binary because the actual mangled
name for the constructors is _ZN6TaggedC2B7CtorTagEv:

$ llvm-dwarfdump a.out --name=Tagged
a.out:  file format elf64-x86-64

...

0x0000006e: DW_TAG_subprogram
              DW_AT_low_pc      (0x0000000000401130)
              DW_AT_high_pc     (0x000000000040113a)
              DW_AT_frame_base  (DW_OP_reg6 RBP)
              DW_AT_object_pointer      (0x00000089)
              DW_AT_linkage_name        ("_ZN6TaggedC2B7CtorTagEv")
              DW_AT_specification       (0x00000033 "Tagged")

I.e., the ABI tags are missing from the round-tripped mangled name.

When LLDB creates the Clang AST from DWARF, it creates clang::CXXConstructorDecls
from the DW_TAG_subprogram of a constructor’s declaration. DWARF doesn’t
tell us anything about the existence of abi-tags on the declaration, so we
never attach an clang::AbiTagAttr to said decl. Hence when Clang mangles the
CXXConstructorDecl, it doesn’t know to include the tags in the mangling.

This may seem like a contrived example, but this manifests quite frequently. E.g.,
when dereferncing the result of a function returing a std::shared_ptr (whose structors
are abi-tagged). And there are no great workarounds to recommend to users for this currently.

Example of Different Structor Variants in Expression Evaluator

This example is contrived, but the following would cause a call to the C2 constructor variant from within expr.

struct A {                                   
  A(int) {}     
};                                           
                                             
struct B : virtual A {                       
  B() : A(5){};
};                                           
                        
struct C : B {                               
  C() : A(5) {}                              
};                                           
                                             
int main() {                                 
    A a(6);                                  
    B b;                                     
    C c(6);                              
    return 0;                                
}

Then in LLDB:

(lldb) expr
Enter expressions, then terminate with an empty line to evaluate:
  1: struct F : B { 
  2:   F() : A(5) {} 
  3: }; 
  4: F()

(F) $0 = {}

The call to F() here generates a call to the C1 variant of F which calls the C2 variants of B and A.

What’s special about structors?

The declarations for structors don’t have a DW_AT_linkage_name. That’s because Clang (when using the Itanium ABI) will generate multiple variants for a constructor, each mangled differently: Itanium C++ ABI

So we can end up with following (simplified) DWARF:

0x00000057:   DW_TAG_structure_type
                DW_AT_containing_type   (0x00000057 "X")
                DW_AT_calling_convention        (DW_CC_pass_by_reference)
                DW_AT_name      ("X")

0x0000007c:     DW_TAG_subprogram
                  DW_AT_name    ("X")
                  DW_AT_declaration     (true)
                  DW_AT_external        (true)

0x000000a9:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x000000000000001c)
                DW_AT_linkage_name      ("_ZN1XC2Ev")
                DW_AT_specification     (0x0000007c "X")

0x000000de:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000020)
                DW_AT_high_pc   (0x0000000000000054)
                DW_AT_linkage_name      ("_ZN1XC1Ev")
                DW_AT_specification     (0x0000007c "X")

Note how the constructor is just a declaration with a name, and we have two
constructor definitions (with different DW_AT_linkage_names) pointing
back to the same declaration.

So we can’t really pick a linkage name to attach to a CXXConstructorDecl (like we did for methods) up-front.

This RFC proposes solutions to exactly this problem.

MS-ABI and GCC

Structor variant emission is both ABI and compiler dependent. The MS-ABI doesn’t encode the structor variants in the mangled name. Instead it adds indicates the variant using an implicit parameter to the constructor.

On the other hand, GCC on Itanium will emit following DWARF:

0x0000002e:   DW_TAG_structure_type                                                                       
                DW_AT_name      ("Tagged")           
                                                     
0x0000003b:     DW_TAG_subprogram                 
                  DW_AT_external        (true)                                                            
                  DW_AT_name    ("Tagged")          
                  DW_AT_linkage_name    ("_ZN6TaggedC4B7CtorTagEv")
                  DW_AT_declaration     (true)                                                            

0x000000ae:   DW_TAG_subprogram
                DW_AT_abstract_origin   (0x00000094 "_ZN6TaggedC4B7CtorTagEv")
                DW_AT_linkage_name      ("_ZN6TaggedC2B7CtorTagEv")
                DW_AT_low_pc    (0x0000000000401106)
                DW_AT_high_pc   (0x0000000000401111)

0x000000cd:     DW_TAG_formal_parameter
                  DW_AT_abstract_origin (0x000000a4 "this")
                  DW_AT_location        (DW_OP_fbreg -24)

0x000000d5:     NULL

Note how it uses the “unified” C4 constructor variant for the declaration. But the definition has the resolved C2 constructor linkage name (and the C4 linkage name is not a symbol in the binary). The C4 mangling is a GCC extension.

Side-note, just noticed that when debugging GCC binaries we never manage to resolve a constructor call. Even:

struct S {
 S {}                                                                                                                                                                                                         
};                                                                                                                                                                                                               
                                                                                                                                                                                                                 
(lldb) expr S()                                                                                                                                                                                                     error: Couldn't lookup symbols:                                                                                                                                                                                  
   S::S()

This happens because the ctor declaration has the C4 DW_AT_linkage_name. So LLDB does create an AsmLabelAttr for that constructor, which it can’t find in the binary. This proposal would help with that, though we’d have to specifically account for cases where the constructor declaration does have a linkage name.

Structor Variant Aliases

Clang (and GCC) will only emit separate definitions for each structor variant if the definitions actually differ.

In the common case (where a class doesn’t have virtual bases), the complete object constructor (C1) is aliased to the base object constructor (C2). E.g., this is valid IR that Clang generates:

@_ZN6TaggedC1B7CtorTagEv = dso_local unnamed_addr alias void (ptr), ptr @_ZN6TaggedC2B7CtorTagEv

define dso_local void @_ZN6TaggedC2B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %0) unnamed_addr #0 align 2 !dbg !16 {
 <-- snipped definition -->
}

define dso_local noundef i32 @main() #2 !dbg !22 {
 %1 = alloca i32, align 4                                                                                                                                                                                       
 %2 = alloca %struct.Tagged, align 1                                                                                                                                                                            
 store i32 0, ptr %1, align 4                                                                                                                                                                                   
 call void @llvm.dbg.declare(metadata ptr %2, metadata !26, metadata !DIExpression()), !dbg !27                                                                                                                 
 call void @_ZN6TaggedC1B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %2), !dbg !27                                                                                                                
 ret i32 0, !dbg !28                                                                                                                                                                                            
}

Notice how _ZN6TaggedC1B7CtorTagEv is actually just an alias for _ZN6TaggedC2B7CtorTagEv. But importantly the call is to the C1 variant. This could present complications depending on which solution we implement because DWARF will only contain a definition DIE for the C2 variant, whereas the expression evaluator will generate a call to C1 (which is also what Clang would do, and is valid in practice because those constructor variants are the same in this case).

There are also cases where Clang may “replace all uses with (RAUW)” between constructor variants (determined by this logic in ItaniumCXXABI.cpp. So this is all something we’ll need to consider.

Inheriting Constuctor Variant

The CI1 and CI2 constructor variants in the Itanium ABI will need special handling. These occur with code like:

struct Foo {
  Foo(int) {}
};

struct Bar : public Foo {
  using Foo::Foo;
};

int main() {
  Bar b(5);
}

This would produce DWARF like:

0x0000002a:   DW_TAG_structure_type
                DW_AT_calling_convention        (DW_CC_pass_by_value)
                DW_AT_name      ("Bar")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("inherit.cpp")
                DW_AT_decl_line (5)

0x00000033:     DW_TAG_inheritance
                  DW_AT_type    (0x0000004a "Foo")
                  DW_AT_data_member_location    (0x00)

0x00000039:     DW_TAG_subprogram
                  DW_AT_name    ("Foo")
                  DW_AT_declaration     (true)
                  DW_AT_artificial      (true)
                  DW_AT_external        (true)

0x0000003e:       DW_TAG_formal_parameter
                    DW_AT_type  (0x0000009a "Bar *")
                    DW_AT_artificial    (true)

Note how the name of the constructor is Foo, not Bar (which might be a Clang bug…GCC doesn’t do this). This breaks LLDB’s assumption when creating CXXConstructorDecls. Given the support for inheriting constructors is already somewhat broken, this RFC doesn’t look to fix it. We should however try not to make it harder to fix in the future.

Using the `std` module

We have a target.import-std-module setting that would work around this problem
because we can load an accurate AST into LLDB without going via DWARF. Unfortunately
it has its own issues at the moment (doesn’t support all STL types and is not stable
enough to be enabled by default). Also, it wouldn’t help with users debugging with
libstdc++.

Potential Solutions

1. Encode ABI tags in DWARF

The idea here would be to introduce a new attribute DW_AT_abi_tag whose value would be a
string (presumably a DW_FORM_strp) holding the contents of a single abi_tag of a structure/function/namespace declaration (there can be multiple tags on a declaration). We can’t get away with only attaching them to constructors/destructors because abi-tags are part of a type’s mangling. I.e., they could also appear in a structor’s mangled name via function parameters (or template arguments/return types in the case of templated constructors).

This approach was attempted in D144181 but stalled because of the downsides described below.

Downsides:

A lot of functions/types in the STL are abi-tagged. So we would need to be careful to mitigate size impact. When developing the prototype in D144181, attaching them to all abi-tagged entities (not just structors) showed a non-trivial increase in the .debug_info section (albeit we used DW_TAG_llvm_annotations for this, so it wasn’t the most space-efficient representation). I have yet to measure the size impact of encoding it with a dedicated attribute.
libc++ may in the future decide to abi-tag the namespace (instead of individual functions/types). Which means LLDB needs to be aware of implicitly abi-tagged types/functions. One solution to that would be to check the containing DeclContext chain for an abi-tag for every record/function decl LLDB creates.
This deviates from how we handle this for other types of function calls. Which might be fine, but does raise the question: do we want to rely on the mangled name roundtripping given LLDB’s reconstructed AST isn’t/can’t be fully accurate). Using the linkage name seems more robust (though in an offline discussion @labath did point out that even linkage names aren’t the most robust here, since they need not uniquely identify a function. A more complete/robust solution would encode, in the mangled name or elsewhere, enough info to point LLDB directly to the function).
It’s unclear how useful this attribute would be for any other DWARF consumer.

2. Attach all mangled names to structor AST node

(this came out of the discussion in ⚙ D144181 [clang][DebugInfo] Add abi-tags on constructors/destructors as LLVM annotations)

In this approach we would tell Clang about all the mangled names for a given CXXConstructorDecl. Then the Clang mangler would pick the correct mangled name to use for a given structor kind.

The current proposal would be to introduce a new internal-only Clang attribute that LLDB would attach to structor decls (attribute name pending). E.g.,:

|-CXXDestructorDecl 0x3d15efd8 parent 0x3d15e418 prev 0x3d15eb10 <line:19:17, line:21:1> line:19:25 used ~Tagged 'void () noexcept'
| |-CompoundStmt 0x3d182858 <col:35, line:21:1>
| |-StructorMangledNamesAttr 0x3d15f108 <line:16:3, line:18:22> deleting:_ZN6TaggedD0Ev complete:_ZN6TaggedD1Ev base:_ZN6TaggedD2Ev
| |-AbiTagAttr 0x3d15f1e8 <col:28, col:58> DtorTag Test
| `-AbiTagAttr 0x3d182790 <col:64, line:19:13> v1

This is only something LLDB would ever set by collecting the mangled names for a given constructor declaration. All the infrastructure in the Clang mangler is already available (and there’s precendence for setting the mangled name via attributes using the AsmLabelAttr).

LLDB will also need a way to tell which DW_TAG_subprogram definition corresponds to what structor kind so we can provide that information to Clang. There’s a couple of ways we could do this:

A DWARF attribute on the DW_TAG_subprogram (e.g., DW_AT_structor_kind) whose value would be a constant such as DW_STRUCTOR_cxx_complete_ctor (name pending).
Use the ItaniumPartialDemangler to walk the demangle tree of the DW_AT_linkage_name until we find the structor kind.

Upsides:

This aligns with how we handle other kinds of function calls in LLDB
Doesn’t rely on mangled names round-tripping (which other C++ constructs other than abi-tags might inhibit. E.g., I suspect we don’t represent templated constructors accurately in all cases for the roundtripping to work in the general case. Though this is only a hunch at the moment)

Downsides:

Requires Clang attribute that only LLDB would use (though from a brief discussion a while back with @AaronBallman it sounded like there’s already precedent for internal-only attributes like this).
Need to do extra work to determine which structor kind a DW_AT_linkage_name corresponds to. If we require a DWARF attribute, it’s unclear whether other consumers (or non-Itanium platforms) would benefit from encoding a structor-kind

Michael137 · October 29, 2024, 11:51am

Tagging some people who have been involved in this discussion

CC @labath @dblaikie @adrian.prantl

AaronBallman · October 29, 2024, 12:00pm

Internal-only attributes include: AlignMac68kAttr, AlignNaturalAttr, MaxFieldAlignmentAttr, etc. We have let Documentation = [InternalOnly]; in Attr.td for attributes which should be explicitly undocumented because they’re internal-only implementation details.

dblaikie · October 29, 2024, 10:05pm

So GDB handles the following more aggressive/challenging example with GCC’s debug info somehow, but can’t handle it with Clang’s - might be worth better understanding how it handles that.
test.h:

extern int my_global;
struct Tagged {
  [[gnu::abi_tag("CtorTag")]] Tagged() { my_global = 42; }
  int f1();
};

Tagged f1();

test.cpp:

#include "test.h"
int my_global = 3;
int main() {
  Tagged t = f1();
  return my_global; // break here
}

test2.cpp:

#include "test.h"
Tagged f1() {
  return Tagged();
}

then:

h$ g++ test2.cpp -c && g++ test.cpp -g test2.o && gdb ./a.out -batch -ex "b 4" -ex "r" -ex "p my_global" -ex "p t.Tagged()" -ex "p my_global"
Breakpoint 1 at 0x1131: file test.cpp, line 4.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at test.cpp:4
4         Tagged t = f1();
$1 = 3
$2 = void
$3 = 42

So it’s interesting to see what debug info GCC generates:

0x00000041:   DW_TAG_structure_type
                DW_AT_name      ("Tagged")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("/usr/local/google/home/blaikie/dev/scratch/test.h")
                DW_AT_decl_line (2)
                DW_AT_decl_column       (8)
                DW_AT_sibling   (0x00000082)

0x0000004e:     DW_TAG_subprogram
                  DW_AT_external        (true)
                  DW_AT_name    ("Tagged")
                  DW_AT_decl_file       ("/usr/local/google/home/blaikie/dev/scratch/test.h")
                  DW_AT_decl_line       (3)
                  DW_AT_decl_column     (31)
                  DW_AT_linkage_name    ("_ZN6TaggedC4B7CtorTagEv")
                  DW_AT_declaration     (true)
                  DW_AT_object_pointer  (0x00000062)
                  DW_AT_sibling (0x00000068)

0x00000062:       DW_TAG_formal_parameter
                    DW_AT_type  (0x00000082 "Tagged *")
                    DW_AT_artificial    (true)

0x00000067:       NULL

C4?
Though C4 isn’t defined at all in the program, the CN symbols we get from test2.cpp are:

0000000000001156 W _ZN6TaggedC1B7CtorTagEv
0000000000001156 W _ZN6TaggedC2B7CtorTagEv

So, somehow gdb’s takking the C4 mangling and using that to find the C1/C2 as needed?
Clang doesn’t produce any mangled name, and in that case (presuming that’s the important difference):

$ g++ test2.cpp -c && clang++ test.cpp -g test2.o && gdb ./a.out -batch -ex "b 4" -ex "r" -ex "p my_global" -ex "p t.Tagged()" -ex "p my_global"
Breakpoint 1 at 0x113f: file test.cpp, line 4.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at test.cpp:4
4         Tagged t = f1();
$1 = 3
Couldn't find method Tagged::Tagged
$2 = 3

dblaikie · October 29, 2024, 10:16pm

So from GCC/GDB’s behavior, Clang would be better if it put /a/ mangled name on the ctor. (not sure anyone has a solution if the ctor is implicit and not instantiated in one side of the example above - then there’d be no hint of debug info for the ctor, so probably nothing much to be done there (short of putting the abi tag wherever it appeared in that case (if the ctor’s implicit, it wouldn’t be written on the ctor - so it’d be on the class or namespace, etc))

If clang put a ctor mangled name on the ctor declaration - could that be enough for LLDB to be able to demangle enough to get an abi tag out of it, or enough to substitute in the CN number?

Michael137 · October 29, 2024, 10:42pm

Have yet to read/reply to the full comment. But just to clarify the C4 constructor variant. The following is an excerpt from ItaniumDemangle.h:

extension      ::= C4  # gcc old-style "[unified]" constructor

Interesting that this is referred to as “old-style”…

And some more detail here: D4/D5 destructor and other manglings not specified · Issue #73 · itanium-cxx-abi/cxx-abi · GitHub

labath · October 30, 2024, 11:10am

Correct. The problem is with anonymous/CU-local entities. If two compile units define a class with the same name, then its constructors will have the same mangled name. LLDB will not be able to tell them apart in the clang output without extra information.

This is why I think it would be better to move away from using mangled names as function identifiers (instead of doubling down on it). For regular functions, this is already sort of doable (although not very ergonomic): we can take the existing AsmLabelAttr and – instead of the linkage name – encode any information we need to look up the function into the asm string. If we put some identifier of the module (shared library) and the identifier of the DWARF DIE within that module, then we can get back to the exact same data that we’re using to construct the clang declaration. From that point, we can do anything we want (including using the linkage name to look up the function address).

Of course, this doesn’t help with the constructors, as they don’t have a single linkage name due to the variants. However, that’s the part I like about the new attribute idea. If we are able to attach different metadata (whether it is in string form or otherwise) to each variant, then we can also get back to the DWARF DIE for the constructor and know which constructor variant we are looking for.

From there we still need to look up the correct DWARF definition (so that we know the constructor address and everything), but since this only needs to be done for constructors that are actually called, I think it would be acceptable to use a relatively expensive approach like using the accelerator table to find all constructors with that constructor name, and seeing which one points back to the declaration DIE.

To determine which constructor definition DIE corresponds to which variant, I believe that a new attribute would be the cleanest solution, but I suppose we could get away with parsing the mangled linkage name as well. (The problem with that is that it’s ABI specific, but then again, this whole variant business is ABI specific (MSVC uses an constructor argument for disambiguation), so I don’t think we’ll be able to completely ignore ABI here).

Michael137 · October 30, 2024, 11:16am

Thanks for pointing this out. I spent some time playing around with gdb and it doesn’t seem like it really works in the way we want it to. E.g., if we expand the example a bit:

struct Tagged {                               
  [[gnu::abi_tag("CtorTag")]] Tagged(int a) {}
};                                            
                                              
int main() {                                  
    Tagged t(5);                           
    return 0; // break here                   
}

Then in gdb:

(gdb) p t.Tagged(5)
$1 = void

(gdb) compile print Tagged(5)
gdb command line: In function ‘void _gdb_expr(__gdb_regs*, void*)’:
gdb command line:1:31: error: no matching function for call to ‘Tagged::Tagged(int)’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged()’
gdb command line:1:23: note:   candidate expects 0 arguments, 1 provided
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(const Tagged&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘const Tagged&’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(Tagged&&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘Tagged&&’
gdb command line:2:59: error: no matching function for call to ‘Tagged::Tagged(int)’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged()’
gdb command line:1:23: note:   candidate expects 0 arguments, 1 provided
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(const Tagged&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘const Tagged&’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(Tagged&&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘Tagged&&’
gdb command line:2:59: error: no matching function for call to ‘Tagged::Tagged(int)’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged()’
gdb command line:1:23: note:   candidate expects 0 arguments, 1 provided
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(const Tagged&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘const Tagged&’
gdb command line:1:23: note: candidate: ‘constexpr Tagged::Tagged(Tagged&&)’
gdb command line:1:23: note:   no known conversion for argument 1 from ‘int’ to ‘Tagged&&’
gdb command line:2:61: error: template argument 1 is invalid
gdb command line:2:68: error: template argument 1 is invalid
gdb command line:2:77: error: expected initializer before ‘__gdb_expr_ptr’
gdb command line:3:1: error: ‘__gdb_expr_ptr’ was not declared in this scope; did you mean ‘_gdb_expr’?
gdb command line:5:11: error: ‘__gdb_expr_ptr_type’ was not declared in this scope; did you mean ‘__vtbl_ptr_type’?
Compilation failed.

So the gdb’s expression evaluator equivalent falls over here (and not because of the abi-tags it seems. Generally something about the way constructor calls are handled).

I suspect that the way gdb handles this is:

Demangle the DW_TAG_linkage_name of the constructor declaration
Do a symbol lookup for the demangled name
Pick whatever constructor we find

Why i suspect this is the case is if I generate the assembly for this and change the DW_AT_linkage_name from _ZN6TaggedC4B7CtorTagEv to even just _ZN6Tagged6TaggedEv, the lookup still works.

Or maybe it has some sort of “fuzzy lookup via demangled name” logic? I’m not allowed to look into the gdb sources so I can’t confirm this. If someone knows a gdb maintainer that they could tag that’d be great. Maybe @jingham knows how this is done.

If clang put a ctor mangled name on the ctor declaration - could that be enough for LLDB to be able to demangle enough to get an abi tag out of it, or enough to substitute in the CN number?

Technically we could walk the demangle tree like the ItaniumPartialDemangler does and generate a new mangling for each variant. I think fishing out the abi-tags is not enough/not feasible because the mangled constructor name might contain references to abi-tagged types, in which case the AST nodes for those records also need to have AbiTagAttrs on them. But those aren’t encoded in DWARF. Generating a new mangling could work, in which case we wouldn’t need a dedicated DWARF attribute. Whether that’s cheaper/cleaner than a dedicated attribute I’m not sure?

labath · October 30, 2024, 11:17am

Also, the part I like most about this (not using mangled names as function identifiers) is that I think/hope that this will allow us to get rid of the GenerateAlternateFunctionManglings machinery, which is an institutionalized hack to work around the fact that sometimes we don’t roundtrip the mangled names completely correctly.

Michael137 · October 30, 2024, 11:23am

though in an offline discussion @labath didn’t point out

whoops that was meant to say “did point out”
(corrected it now)

Michael137 · October 30, 2024, 12:36pm

Great point. I’m happy to try this out. Especially if that means we can get rid of/reduce reliance on the whole GenerateAlternateFunctionManglings stuff.

Just to clarify. Your suggestion would pretty much just change the contents of whatever we put into AsmLabelAttr/StructorNamesAttr (name pending)? E.g., for a regular function call LLDB would create the following attribute:

AsmLabelAttr("libfoo.dylib:<die_addr>")

And then we’d make our symbol resolver aware of this? And make sure we pick the correct function? And for constructors it’d work almost the same, but we just have a mapping from variant -> special function identifier?

labath · October 30, 2024, 1:03pm

Something like that (“details are left as an exercise to the reader” :P), except that I don’t think that the module (file) name is a good identifier (its not really unique). I don’t think we have a really good unique identifier right now. For the purposes of the prototype at least, the lldb_private::Module pointer (in hex?) should be fine. If we do that for real, we can think about whether we’re fine with using a raw pointer in this way, or if we want to create some sort of a surrogate key. For identifying a DIE, I think its ID (GetID()) is a natural choice, as we use that in a number of places already.

We also need a way to mark that this name uses our fancy “mangling scheme”. For some internal things, we already use $__lldb prefix, but we may be able to go even wilder here, as the result does not have to be a legal c++ name.

dblaikie · October 30, 2024, 6:21pm

Yeah, sorry, that’s why in my example I called the ctor like a normal member function, since gdb doesn’t handle actually calling the ctor with C++ syntax so far as I know.

But sounds like the direction that moves away from requiring this info in a mangled name anyway is one you’re looking into and perhaps dodges all the stuff I was talking about, so I’ll leave that alone.

Michael137 · November 1, 2024, 11:11am

Thanks for the feedback so far. I have a naive prototype of this here: [WIP][lldb][Expression] More reliable function call resolution by Michael137 · Pull Request #114529 · llvm/llvm-project · GitHub without requiring a new DWARF attribute for the constructor variant and using a Module pointer+DIE ID for identifying the definition DIEs and things seem to work fine.

labath · November 4, 2024, 10:03am

This looks very interesting. Thanks for trying this out. A couple of things come to mind when looking at the prototype:

we need to figure out how will this work with the MSVC abi, which doesn’t have constructor variants (well, it sort of does, but it passes that information as a function argument).
somewhat related to the previous item, I realized that we don’t actually need to pass a set of names, if we make the constructor variant part of the encoding implicit – clang could just append something :<variant> to the name we provide, and we’d parse the variant out at the other end. The way this could (might?) help with the MSVC ABI is that clang could just not emit the variant part when working with an ABI that does not use variants. I don’t know whether this would be easy or desirable from clang’s POV…
Will this cause any problems with the AST import/merge for ASTs from different modules (shared libraries). Right now, the ASTs parsed from two modules will be mostly (thanks to a lot of finger-crossing) the same, but if we do this, they will never be identical. That is kind of the point of all this (to be able to go back to the module/die and retrieve more info), but I wonder/fear if this won’t cause problems when importing the AST. This is something that did not occur to me when I proposed this, but I remembered it now because this issue came up in another bug.

I think the last part is the most problematic (though maybe that’s the part I know the least about).

Michael137 · November 4, 2024, 12:10pm

we need to figure out how will this work with the MSVC abi, which doesn’t have constructor variants (well, it sort of does, but it passes that information as a function argument).

Good question, I kind of side-stepped MSVC so far. I assume we’re talking about following configurations:
(a) MS-ABI with DWARF (is that even a supported/recommended configuration?)
(b) MS-ABI with PDB

I can’t really comment on (b) because I’m not very familiar with PDB. Interestingly, from looking at the PDB AST builder it looks like we’re not doing the “attach mangled name to function decls” trick currently? Should we?

For (a), as you alluded in your second point, I think we’d have to support a “no variant specified” encoding. E.g., :<module>:<die>, and the mangler would know this empty variant prefix is only a thing for MS-ABI. Seems doable, but it does depend on the below.

somewhat related to the previous item, I realized that we don’t actually need to pass a set of names, if we make the constructor variant part of the encoding implicit – clang could just append something :<variant> to the name we provide, and we’d parse the variant out at the other end. The way this could (might?) help with the MSVC ABI is that clang could just not emit the variant part when working with an ABI that does not use variants. I don’t know whether this would be easy or desirable from clang’s POV…

Interesting point. You’re suggesting something like:

[[clang::structor_name("$__lldb_func_<module>:<declaration die>")]]

which would mangle to:

$__lldb_func_<module>:<declaration die>:<variant>

(and for MS-ABI this would effectively function like AsmLabel (i.e., it wouldn’t attach a variant))? Then during symbol resolution we would find the appropriate definition DIE for the given variant? I like this idea since it simplifies things.

I think doing something special for MS-ABI in the mangler isn’t outrageous. The mangler already checks for MS-ABI in other places.

Will this cause any problems with the AST import/merge for ASTs from different modules (shared libraries). Right now, the ASTs parsed from two modules will be mostly (thanks to a lot of finger-crossing) the same, but if we do this, they will never be identical. That is kind of the point of all this (to be able to go back to the module/die and retrieve more info), but I wonder/fear if this won’t cause problems when importing the AST. This is something that did not occur to me when I proposed this, but I remembered it now because this issue came up in another bug.

Great (and concerning) question. I’ll do some investigating

Michael137 · November 4, 2024, 6:57pm

Ok so the behaviour of importing types with the same name is quite interesting. I looked at two cases.

Case 1: two dylibs hard linked

Take this example of two class definitions in two different dylibs. I added an asm label to see which definition LLDB ends up importing into the scratch AST (and it mimicks the situation in the proposal).

// lib1.cpp
namespace {                         
struct Foo {                        
  int func() __asm("_Z8funcLib1v") {               
    return 5;                       
  }                      
};                                  
}      

// lib2.cpp
namespace {                         
struct Foo {                        
  int func() __asm("_Z8funcLib2v") {               
    return 20;                       
  }                      
};                                  
}

Then in LLDB:

(lldb) expr Foo{}
(lldb) image dump ast
Dumping clang ast for 47 modules.
TranslationUnitDecl 0x122379a08 <<invalid sloc>> <invalid sloc> <undeserialized declarations>
`-NamespaceDecl 0x12237a8e0 <<invalid sloc>> <invalid sloc>
  `-CXXRecordDecl 0x12237a970 <<invalid sloc>> <invalid sloc> struct Foo definition
    |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
    ...
    `-CXXMethodDecl 0x12237c6e0 <<invalid sloc>> <invalid sloc> func 'int ()'
      `-AsmLabelAttr 0x12237c788 <<invalid sloc>> Implicit "_Z8funcLib1v"

Regardless of whether we stop in lib1 or lib2, LLDB will pick the first type Foo that it finds in the target and stick with it. Subsequent evaluations don’t seem to change this fact. Which Foo it picks seems non-deterministics based on my experimentation.

Case 2: load/unload multiple dylibs

(lldb) process load lib1.dylib
Loading "lib1.dylib"...ok
Image 0 loaded.
(lldb) expr Foo{}
(Foo) $0 = {}
(lldb) process unload 0
Unloading shared library with index 0...ok
(lldb) process load lib2.dylib
Loading "lib2.dylib"...ok
Image 1 loaded.
(lldb) expr Foo{}
(Foo) $2 = {}
(lldb) target dump typesystem                                                                                                                                                                                         State of scratch Clang type system:
TranslationUnitDecl 0x148216608 <<invalid sloc>> <invalid sloc> <undeserialized declarations>
`-NamespaceDecl 0x1481d48e8 <<invalid sloc>> <invalid sloc>
  |-CXXRecordDecl 0x1481d4978 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct Foo definition
  | |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
  ...
  | `-CXXMethodDecl 0x1481d4b38 <<invalid sloc>> <invalid sloc> func 'int ()'
  |   `-AsmLabelAttr 0x1481d4be0 <<invalid sloc>> Implicit "_Z8funcLib1v"
  `-CXXRecordDecl 0x1481d4d20 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct Foo definition
    |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
    ...
    `-CXXMethodDecl 0x1481d4ed0 <<invalid sloc>> <invalid sloc> func 'int ()'
      `-AsmLabelAttr 0x1481d4f78 <<invalid sloc>> Implicit "_Z8funcLib2v"

Note how we now have two definitions in the scratch typesystem.

Moreover, the AsmLabelAttr here doesn’t matter. Even if the two definitions of Foo matched exactly, we’d still import both into the scratch AST in this case.

The ASTImporter behaviour here is interesting. When importing the second Foo into the scratch AST, the ASTImporter does detect a potentially conflicting name. So it goes on to check whether it is really a name conflict to worry about (or to create a redeclaration chain). In both the LLDB case (and if I run the -ast-merge through Clang), this hasSameVisibilityContextAndLinkage condition fires. So we never end up calling MapImported, and we end up creating a new decl and add it to the scratch AST.

BUT, even if that check didn’t fire, the structural equivalence check between two RecordDecls doesn’t actually check whether the method signatures are the same (in fact it doesn’t consider methods at all). There’s a FIXME right around where said name conflict check is. So what would happen is that we’d get a scratch AST like this:

(lldb) target dump typesystem 
State of scratch Clang type system:
TranslationUnitDecl 0x11dbd2008 <<invalid sloc>> <invalid sloc> <undeserialized declarations>
`-NamespaceDecl 0x10d895bc8 <<invalid sloc>> <invalid sloc>
  `-CXXRecordDecl 0x10d895c58 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct Foo definition
    |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
    |-CXXMethodDecl 0x10d895e20 <<invalid sloc>> <invalid sloc> func 'int ()'
    | `-AsmLabelAttr 0x10d895ec8 <<invalid sloc>> Implicit "_Z8funcLib1v"
    `-CXXMethodDecl 0x10d896088 <<invalid sloc>> <invalid sloc> func 'int ()'
      `-AsmLabelAttr 0x10d896140 <<invalid sloc>> Implicit "_Z8funcLib2v"

But interestingly, even if I hadn’t attached a custom asm label here, it looks like the ASTImporter would still not have merged the two decls together. Which makes sense, since their definitions could be different I suppose (?):

    |-CXXMethodDecl 0x1229bc220 <<invalid sloc>> <invalid sloc> func 'int ()'
    | `-AsmLabelAttr 0x1229bc2c8 <<invalid sloc>> Implicit "_ZN12_GLOBAL__N_13Foo4funcEv"
    `-CXXMethodDecl 0x1229bc498 <<invalid sloc>> <invalid sloc> func 'int ()'
      `-AsmLabelAttr 0x1229bc550 <<invalid sloc>> Implicit "_ZN12_GLOBAL__N_13Foo4funcEv"

So with that being said, it looks like the proposed <module>:<die> syntax wouldn’t regress the situation (or at least for the above cases). I guess the question is what the expected behaviour of these imports is and whether the proposal would make it harder for us to implement those expectations?

Michael137 · November 5, 2024, 10:10am

An important wrinkle I didn’t notice until just now is that Clang will usually alias the C1 and C2 constructors. Meaning, we might end up with a definition for C2 in the binary, but the constructor call being emitted is for C1. E.g., in IR generated by Clang:

@_ZN6TaggedC1B7CtorTagEv = dso_local unnamed_addr alias void (ptr), ptr @_ZN6TaggedC2B7CtorTagEv

define dso_local void @_ZN6TaggedC2B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %0) unnamed_addr #0 align 2 !dbg !16 {
 <-- snipped definition -->
}

define dso_local noundef i32 @main() #2 !dbg !22 {
 %1 = alloca i32, align 4                                                                                                                                                                                       
 %2 = alloca %struct.Tagged, align 1                                                                                                                                                                            
 store i32 0, ptr %1, align 4                                                                                                                                                                                   
 call void @llvm.dbg.declare(metadata ptr %2, metadata !26, metadata !DIExpression()), !dbg !27                                                                                                                 
 call void @_ZN6TaggedC1B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %2), !dbg !27                                                                                                                
 ret i32 0, !dbg !28                                                                                                                                                                                            
}

So we only really have a definition for C2, so that’s the DW_TAG_subprogram that DWARF will contain. But when calling the constructor from the expression evaluator we try to call C1, which is valid since they’re the same function, but that variant doesn’t have a separate DIE entry.

labath · November 5, 2024, 4:16pm

Yes, both of these. (I didn’t actually realize they were two of them). I don’t know how common (a) is, but it exists, so I think it’d be good to at least have a story for how to support that.

The PDB builder does not attach asm labels probably because noone ran into the situation where they’re necessary. Nonetheless, if we do this with DWARF, I’d like the PDB to do that as well eventually (I’m not expecting you to do that), as that’d mean we can get rid of the existing symbol lookup code path. For PDBs I think/hope that we may be able to get even more value out of this, because COFF files don’t have the equivalent of a .symtab (only .dynsym), which is why we currently have some dodgy code which synthesizes a “symbol table” using debug info data. I think the mangled name lookup code might be one user of that data. It’s probably not the only one, so we wouldn’t be able to drop the symtab synthesis part, but I still like the prospect of getting rid of one dependency on it.

Michael137:

Interesting point. You’re suggesting something like:
[[clang::structor_name("$__lldb_func_<module>:<declaration die>")]]
which would mangle to:
$__lldb_func_<module>:<declaration die>:<variant>
(and for MS-ABI this would effectively function like AsmLabel (i.e., it wouldn’t attach a variant))? Then during symbol resolution we would find the appropriate definition DIE for the given variant? I like this idea since it simplifies things.

Yes, that’s exactly what I’m suggesting (and I’m glad you like it)

Based on your second post (which is too long to quote) it sounds like we might be okay (which makes me very happy). I think it might be useful to also check the situation where two versions of the same object get pulled into the same expression ast. From the POV of the AST importer, this should be the same as the scratch AST, but it might make a difference because we run more clang code over the expression AST. I’m thinking of something like this:

// lib1
struct A { int x; } a;
// lib2
struct A { int x; };
struct B: A { int y; } b;

and then have lldb do something like a.x + b.x

That’s a tricky question. I guess that if we knew that the two type definitions are the same type, then I’d say it would be best to merge them, including merging the two method declarations (and just pick one asm label, according to the ODR rule or something). If we knew they aren’t the same type, then it would be best not the merge the class types. This proposal could make implementing that harder if it turns out to be a problem to merge the method definitions in the first case. I don’t know if that is what will happen. And I don’t know how/if we’ll ever be able to say whether we’re dealing with the first case or the second one.

labath · November 5, 2024, 4:37pm

Ah, yes, this is a tricky wrinkle indeed (one that I unfortunately have personal experience with).

The situation is kind of even worse on linux/elf, as clang (unless it is forced to by the ABI) will not even emit the C1 alias, and will instead replace all calls to C1 with the C2 version. (I tried to change this at one point, but it turns out these extra symbols regressed the object file size too much)

I believe we currently have our “alternate mangling name” generator replace C1 by C2 for exactly this reason (which, like all of the substitutions done there, isn’t always correct).

I guess that, within this new framework, we could do the inverse thing. If clang tells us that it wants to call a C1 constructor (by pointing us to the C2 die – the only one we have, with a “c1” mangling tag), we could look up the C1 “alternate mangling” version in the symtab, and call that one if we find it. That said, I think this is something where it would be reasonable to expect some help from the DWARF debug info. Even if the C1 constructor was “optimized out” by turning it into an alias (or even removed completely), I believe DWARF should retain some record of it (summoning @dblaikie back into the conversation )

Topic		Replies	Views
[Bug 16191] New: LLDB fails to evaluate expressions that dereference a struct when inferior is built with recent Clang LLDB	2	87	June 3, 2013
Detection of expression language LLDB	3	112	October 7, 2013
Ambiguous symbols LLDB	6	414	December 1, 2023
Custom expression evaluation target options LLDB	2	78	December 4, 2015
[Bug 50814] New: Several tests fail with GCC as LLDB can't properly call constructors from expressions LLDB	0	79	June 23, 2021