This RFC seeks to discuss how LLDB’s expression evaluator should handle calls to C++ constructors/destructors (I refer to these as “structors” throughout the document) that are annotated with the [[gnu::abi_tag]]
attribute.
Problem
Given this example:
struct Tagged {
[[gnu::abi_tag("CtorTag")]] Tagged() {}
};
int main() {
Tagged t;
return 0; // break here
}
Calling the Tagged()
constructor will fail in LLDB with following error:
(lldb) target create "a.out"
(lldb) r
Process 961323 launched: 'a.out' (x86_64)
Process 961323 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000401114 a.out`main at tags.cpp:7:2
4
5 int main() {
6 Tagged t;
-> 7 return 0; // break here
8 }
(lldb) expr Tagged()
error: Couldn't lookup symbols:
Tagged::Tagged()
If we look at the IR that Clang lowered the LLDB (DWARF) AST into,
the reason for the failed symbol lookup becomes more apparent:
lldb Module as passed in to IRForTarget:
"; ModuleID = '$__lldb_module'
source_filename = "$__lldb_module"
<-- snipped -->
define dso_local void @"_Z12$__lldb_exprPv"(ptr %"$__lldb_arg") #0 {
entry:
<-- snipped -->
init.check: ; preds = %entry
call void @_ZN6TaggedC1Ev(ptr nonnull align 1 dereferenceable(1) @"_ZZ12$__lldb_exprPvE19$__lldb_expr_result") #2
We’re trying to call a function symbol with mangled name _ZN6TaggedC1Ev
.
However, no such symbol exists in the binary because the actual mangled
name for the constructors is _ZN6TaggedC2B7CtorTagEv
:
$ llvm-dwarfdump a.out --name=Tagged
a.out: file format elf64-x86-64
...
0x0000006e: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000401130)
DW_AT_high_pc (0x000000000040113a)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_object_pointer (0x00000089)
DW_AT_linkage_name ("_ZN6TaggedC2B7CtorTagEv")
DW_AT_specification (0x00000033 "Tagged")
I.e., the ABI tags are missing from the round-tripped mangled name.
When LLDB creates the Clang AST from DWARF, it creates clang::CXXConstructorDecl
s
from the DW_TAG_subprogram
of a constructor’s declaration. DWARF doesn’t
tell us anything about the existence of abi-tags on the declaration, so we
never attach an clang::AbiTagAttr
to said decl. Hence when Clang mangles the
CXXConstructorDecl
, it doesn’t know to include the tags in the mangling.
This may seem like a contrived example, but this manifests quite frequently. E.g.,
when dereferncing the result of a function returing a std::shared_ptr
(whose structors
are abi-tagged). And there are no great workarounds to recommend to users for this currently.
Example of Different Structor Variants in Expression Evaluator
This example is contrived, but the following would cause a call to the C2
constructor variant from within expr
.
struct A {
A(int) {}
};
struct B : virtual A {
B() : A(5){};
};
struct C : B {
C() : A(5) {}
};
int main() {
A a(6);
B b;
C c(6);
return 0;
}
Then in LLDB:
(lldb) expr
Enter expressions, then terminate with an empty line to evaluate:
1: struct F : B {
2: F() : A(5) {}
3: };
4: F()
(F) $0 = {}
The call to F()
here generates a call to the C1
variant of F
which calls the C2
variants of B
and A
.
Related Solutions
We already solved this problem for non-structor FunctionDecl
s in ⚙ D40283 lldb: Use the DWARF linkage name when importing C++ methods.
The idea is to use the DW_AT_linkage_name
on the function definition to
provide Clang with the mangled name of the function. We let Clang know about
the mangling using the clang::AsmLabelAttr
(which Clang will note as the definitive
mangling it should use).
We then did the same for CXXMethodDecl
s in ⚙ D131974 [lldb][ClangExpression] Add asm() label to all FunctionDecls we create from DWARF. This relies
on the fact that a DW_AT_linkage_name
is attached to method declarations (since
LLDB creates the AST nodes for methods by parsing the parent DW_TAG_structure_type
,
it only ever sees the method declaration, not definition).
What’s special about structors?
The declarations for structors don’t have a DW_AT_linkage_name
. That’s because Clang (when using the Itanium ABI) will generate multiple variants for a constructor, each mangled differently: Itanium C++ ABI
So we can end up with following (simplified) DWARF:
0x00000057: DW_TAG_structure_type
DW_AT_containing_type (0x00000057 "X")
DW_AT_calling_convention (DW_CC_pass_by_reference)
DW_AT_name ("X")
0x0000007c: DW_TAG_subprogram
DW_AT_name ("X")
DW_AT_declaration (true)
DW_AT_external (true)
0x000000a9: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x000000000000001c)
DW_AT_linkage_name ("_ZN1XC2Ev")
DW_AT_specification (0x0000007c "X")
0x000000de: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000020)
DW_AT_high_pc (0x0000000000000054)
DW_AT_linkage_name ("_ZN1XC1Ev")
DW_AT_specification (0x0000007c "X")
Note how the constructor is just a declaration with a name, and we have two
constructor definitions (with different DW_AT_linkage_name
s) pointing
back to the same declaration.
So we can’t really pick a linkage name to attach to a CXXConstructorDecl
(like we did for methods) up-front.
This RFC proposes solutions to exactly this problem.
MS-ABI and GCC
Structor variant emission is both ABI and compiler dependent. The MS-ABI doesn’t encode the structor variants in the mangled name. Instead it adds indicates the variant using an implicit parameter to the constructor.
On the other hand, GCC on Itanium will emit following DWARF:
0x0000002e: DW_TAG_structure_type
DW_AT_name ("Tagged")
0x0000003b: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("Tagged")
DW_AT_linkage_name ("_ZN6TaggedC4B7CtorTagEv")
DW_AT_declaration (true)
0x000000ae: DW_TAG_subprogram
DW_AT_abstract_origin (0x00000094 "_ZN6TaggedC4B7CtorTagEv")
DW_AT_linkage_name ("_ZN6TaggedC2B7CtorTagEv")
DW_AT_low_pc (0x0000000000401106)
DW_AT_high_pc (0x0000000000401111)
0x000000cd: DW_TAG_formal_parameter
DW_AT_abstract_origin (0x000000a4 "this")
DW_AT_location (DW_OP_fbreg -24)
0x000000d5: NULL
Note how it uses the “unified” C4
constructor variant for the declaration. But the definition has the resolved C2
constructor linkage name (and the C4
linkage name is not a symbol in the binary). The C4
mangling is a GCC extension.
Side-note, just noticed that when debugging GCC binaries we never manage to resolve a constructor call. Even:
struct S {
S {}
};
(lldb) expr S() error: Couldn't lookup symbols:
S::S()
This happens because the ctor declaration has the C4
DW_AT_linkage_name
. So LLDB does create an AsmLabelAttr
for that constructor, which it can’t find in the binary. This proposal would help with that, though we’d have to specifically account for cases where the constructor declaration does have a linkage name.
Structor Variant Aliases
Clang (and GCC) will only emit separate definitions for each structor variant if the definitions actually differ.
In the common case (where a class doesn’t have virtual bases), the complete object constructor (C1
) is aliased to the base object constructor (C2
). E.g., this is valid IR that Clang generates:
@_ZN6TaggedC1B7CtorTagEv = dso_local unnamed_addr alias void (ptr), ptr @_ZN6TaggedC2B7CtorTagEv
define dso_local void @_ZN6TaggedC2B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %0) unnamed_addr #0 align 2 !dbg !16 {
<-- snipped definition -->
}
define dso_local noundef i32 @main() #2 !dbg !22 {
%1 = alloca i32, align 4
%2 = alloca %struct.Tagged, align 1
store i32 0, ptr %1, align 4
call void @llvm.dbg.declare(metadata ptr %2, metadata !26, metadata !DIExpression()), !dbg !27
call void @_ZN6TaggedC1B7CtorTagEv(ptr noundef nonnull align 1 dereferenceable(1) %2), !dbg !27
ret i32 0, !dbg !28
}
Notice how _ZN6TaggedC1B7CtorTagEv
is actually just an alias for _ZN6TaggedC2B7CtorTagEv
. But importantly the call is to the C1
variant. This could present complications depending on which solution we implement because DWARF will only contain a definition DIE for the C2
variant, whereas the expression evaluator will generate a call to C1
(which is also what Clang would do, and is valid in practice because those constructor variants are the same in this case).
There are also cases where Clang may “replace all uses with (RAUW)” between constructor variants (determined by this logic in ItaniumCXXABI.cpp
. So this is all something we’ll need to consider.
Inheriting Constuctor Variant
The CI1
and CI2
constructor variants in the Itanium ABI will need special handling. These occur with code like:
struct Foo {
Foo(int) {}
};
struct Bar : public Foo {
using Foo::Foo;
};
int main() {
Bar b(5);
}
This would produce DWARF like:
0x0000002a: DW_TAG_structure_type
DW_AT_calling_convention (DW_CC_pass_by_value)
DW_AT_name ("Bar")
DW_AT_byte_size (0x01)
DW_AT_decl_file ("inherit.cpp")
DW_AT_decl_line (5)
0x00000033: DW_TAG_inheritance
DW_AT_type (0x0000004a "Foo")
DW_AT_data_member_location (0x00)
0x00000039: DW_TAG_subprogram
DW_AT_name ("Foo")
DW_AT_declaration (true)
DW_AT_artificial (true)
DW_AT_external (true)
0x0000003e: DW_TAG_formal_parameter
DW_AT_type (0x0000009a "Bar *")
DW_AT_artificial (true)
Note how the name of the constructor is Foo
, not Bar
(which might be a Clang bug…GCC doesn’t do this). This breaks LLDB’s assumption when creating CXXConstructorDecl
s. Given the support for inheriting constructors is already somewhat broken, this RFC doesn’t look to fix it. We should however try not to make it harder to fix in the future.
Using the std
module
We have a target.import-std-module
setting that would work around this problem
because we can load an accurate AST into LLDB without going via DWARF. Unfortunately
it has its own issues at the moment (doesn’t support all STL types and is not stable
enough to be enabled by default). Also, it wouldn’t help with users debugging with
libstdc++
.
Potential Solutions
1. Encode ABI tags in DWARF
The idea here would be to introduce a new attribute DW_AT_abi_tag
whose value would be a
string (presumably a DW_FORM_strp
) holding the contents of a single abi_tag
of a structure/function/namespace declaration (there can be multiple tags on a declaration). We can’t get away with only attaching them to constructors/destructors because abi-tags are part of a type’s mangling. I.e., they could also appear in a structor’s mangled name via function parameters (or template arguments/return types in the case of templated constructors).
This approach was attempted in D144181 but stalled because of the downsides described below.
Downsides:
- A lot of functions/types in the STL are abi-tagged. So we would need to be careful to mitigate size impact. When developing the prototype in D144181, attaching them to all abi-tagged entities (not just structors) showed a non-trivial increase in the
.debug_info
section (albeit we usedDW_TAG_llvm_annotation
s for this, so it wasn’t the most space-efficient representation). I have yet to measure the size impact of encoding it with a dedicated attribute. - libc++ may in the future decide to abi-tag the namespace (instead of individual functions/types). Which means LLDB needs to be aware of implicitly abi-tagged types/functions. One solution to that would be to check the containing
DeclContext
chain for an abi-tag for every record/function decl LLDB creates. - This deviates from how we handle this for other types of function calls. Which might be fine, but does raise the question: do we want to rely on the mangled name roundtripping given LLDB’s reconstructed AST isn’t/can’t be fully accurate). Using the linkage name seems more robust (though in an offline discussion @labath did point out that even linkage names aren’t the most robust here, since they need not uniquely identify a function. A more complete/robust solution would encode, in the mangled name or elsewhere, enough info to point LLDB directly to the function).
- It’s unclear how useful this attribute would be for any other DWARF consumer.
2. Attach all mangled names to structor AST node
(this came out of the discussion in ⚙ D144181 [clang][DebugInfo] Add abi-tags on constructors/destructors as LLVM annotations)
In this approach we would tell Clang about all the mangled names for a given CXXConstructorDecl
. Then the Clang mangler would pick the correct mangled name to use for a given structor kind.
The current proposal would be to introduce a new internal-only Clang attribute that LLDB would attach to structor decls (attribute name pending). E.g.,:
|-CXXDestructorDecl 0x3d15efd8 parent 0x3d15e418 prev 0x3d15eb10 <line:19:17, line:21:1> line:19:25 used ~Tagged 'void () noexcept'
| |-CompoundStmt 0x3d182858 <col:35, line:21:1>
| |-StructorMangledNamesAttr 0x3d15f108 <line:16:3, line:18:22> deleting:_ZN6TaggedD0Ev complete:_ZN6TaggedD1Ev base:_ZN6TaggedD2Ev
| |-AbiTagAttr 0x3d15f1e8 <col:28, col:58> DtorTag Test
| `-AbiTagAttr 0x3d182790 <col:64, line:19:13> v1
This is only something LLDB would ever set by collecting the mangled names for a given constructor declaration. All the infrastructure in the Clang mangler is already available (and there’s precendence for setting the mangled name via attributes using the AsmLabelAttr
).
LLDB will also need a way to tell which DW_TAG_subprogram
definition corresponds to what structor kind so we can provide that information to Clang. There’s a couple of ways we could do this:
- A DWARF attribute on the
DW_TAG_subprogram
(e.g.,DW_AT_structor_kind
) whose value would be a constant such asDW_STRUCTOR_cxx_complete_ctor
(name pending). - Use the
ItaniumPartialDemangler
to walk the demangle tree of theDW_AT_linkage_name
until we find the structor kind.
Upsides:
- This aligns with how we handle other kinds of function calls in LLDB
- Doesn’t rely on mangled names round-tripping (which other C++ constructs other than abi-tags might inhibit. E.g., I suspect we don’t represent templated constructors accurately in all cases for the roundtripping to work in the general case. Though this is only a hunch at the moment)
Downsides:
- Requires Clang attribute that only LLDB would use (though from a brief discussion a while back with @AaronBallman it sounded like there’s already precedent for internal-only attributes like this).
- Need to do extra work to determine which structor kind a
DW_AT_linkage_name
corresponds to. If we require a DWARF attribute, it’s unclear whether other consumers (or non-Itanium platforms) would benefit from encoding a structor-kind