[RFC][LLDB] Handling [[no_unique_address]] in LLDB

LLDB’s expression evaluator uses Clang to compile user-provided C++ expressions. It does so by reconstructing an AST from DWARF and passing it to Clang for parsing and codegen. The codegen phase, particularly when we lower the AST into LLVM types (in CGRecordLowering ), needs to know about type layouts. Clang allows an ExternalASTSource to provide the layout for a type (LLDB implements this interface in ClangASTImporter::LayoutRecordType ). This suffices for a large number of cases but occasionally Clang still relies on the structure of the AST to derive layout information. Clang is sensitive to incorrect layout calculations, and violating any invariants in this phase manifest in assertions and crashes. Because LLDB doesn’t always accurately reconstruct ASTs from DWARF, there are currently several C++ idioms that cause LLDB to crash in Clang’s AST lowering phase. This post focuses on the known failure modes when debugging code with [[no_unique_address]] (NUA) annotations, discusses the solutions previously attempted and outlines a way forward.

This issue has been brought up multiple times over the years [1][2] but hasn’t been resolved yet. Now this topic is being revived because it’s going to be a blocker for [libc++] Replace __compressed_pair with [[no_unique_address]].

Let me know if you have any questions/suggestions/concerns!

Failure Modes

Scenario 1: basic [[no_unique_address]] usage

Details

This is a reduced example of the crashes we would see in the LLDB test-suite after applying [libc++] Replace __compressed_pair with [[no_unique_address]]. It’s the very basic use-case for NUA.

struct Empty {};

struct Foo {
int* m_begin = 0;
[[no_unique_address]] Empty m_alloc;
};

Clang produces following DWARF for this:

DW_TAG_structure_type
  DW_AT_name ("Foo")
  DW_AT_byte_size (0x08)
  DW_TAG_member
    DW_AT_name ("m_begin")
    DW_AT_type (0x0000000000000087 "int *")
    DW_AT_data_member_location (0x00)
  DW_TAG_member
    DW_AT_name ("m_alloc")
    DW_AT_type (0x0000000000000093 "Empty")
    DW_AT_data_member_location (0x00)

Observe how m_begin and m_alloc were folded into the same DW_AT_member_location .

This triggers following assertion when debugging with LLDB:

(lldb) run
Process 45607 launched: 'nua-libcxx-repro.out' (arm64)
Process 45607 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003f4c nua-libcxx-repro.out`main at libcxx-repro.cpp:16:5
    13
    14 int main() {
    15   Foo f;
->  16   return 0;
    17 }
(lldb) expr f
Assertion failed: (Prior->FD->hasAttr<NoUniqueAddressAttr>() && "should not have reused this field's tail padding"), function clipTailPadding, file CGRecordLayoutBuilder.cpp, line 760.

Here we are at the point in clipTailPadding after we’ve placed m_begin at offset 0 and expect the next member to be placed at an offset of 8 bytes. But Clang placed it at 0, which can only happen with a NoUniqueAddressAttr on the field, which we don’t have in LLDB’s AST.

Scenario 2: [[no_unique_address]] for empty base-class optimisation

Details
struct Empty {};

struct A
{
  long c,d;
};

struct B
{
  [[no_unique_address]] Empty x;
};

struct C
{
  [[no_unique_address]] Empty x;
};

struct Foo : B, A, C {};

This generates following DWARF:

DW_TAG_structure_type
  DW_AT_name ("A")
  DW_AT_byte_size (0x10)
  DW_TAG_member
    DW_AT_name ("c")
    DW_AT_type (0x0000000000000069 "long")
    DW_AT_data_member_location (0x00)
  DW_TAG_member
    DW_AT_name ("d")
    DW_AT_type (0x0000000000000069 "long")
    DW_AT_data_member_location (0x08)
  NULL

DW_TAG_structure_type
  DW_AT_name ("B")
  DW_AT_byte_size (0x01)
  DW_TAG_member
    DW_AT_name ("x")
    DW_AT_type (0x000000000000009b "Empty")
    DW_AT_data_member_location (0x00)
  NULL

DW_TAG_structure_type
  DW_AT_name ("C")
  DW_AT_byte_size (0x01)
  DW_TAG_member
    DW_AT_name ("x")
    DW_AT_type (0x000000000000009b "Empty")
    DW_AT_data_member_location (0x00)
  NULL

DW_TAG_structure_type
  DW_AT_name ("Empty")
  DW_AT_byte_size (0x01)

DW_TAG_structure_type
  DW_AT_name ("Foo")
  DW_AT_byte_size (0x10)
  DW_TAG_inheritance
    DW_AT_type (0x0000000000000085 "B")
    DW_AT_data_member_location (0x00)
  DW_TAG_inheritance
    DW_AT_type (0x0000000000000047 "A")
    DW_AT_data_member_location (0x00)
  DW_TAG_inheritance
    DW_AT_type (0x00000000000000b9 "C")
    DW_AT_data_member_location (0x10)
  NULL

Few things to note:

  • The DW_AT_byte_size of B and C are 0x1. I.e., the fact their use may change the size to zero when used as bases is not reflected here (this “maybe zero” property of a RecordDecl is query-able as isZeroSize in the AST)
  • The DW_TAG_inheritance members for A and B of Foo share a DW_AT_data_member_location , here 0x0 , and C is placed at the 0x10 , i.e., after sizeof(A). This makes sense since [[no_unique_address]] will only allow overlap of distinct types (i.e., if we change C to contain a different empty type, all of A, B, C would fold into a single location. However, this placement depends on the order of the inheritance. If we declare Foo as:
struct Foo : B, C, A {};

…we get following DWARF instead:

DW_TAG_inheritance
  DW_AT_type (0x0000000000000085 "B")
  DW_AT_data_member_location (0x00)
DW_TAG_inheritance
  DW_AT_type (0x00000000000000b9 "C")
  DW_AT_data_member_location (0x01)
DW_TAG_inheritance
  DW_AT_type (0x0000000000000047 "A")
  DW_AT_data_member_location (0x00)

When we then try to lay out this type in LLDB, we run into following clang assertion:

(lldb) run
Process 59690 launched: 'nua-bases.out' (arm64)
Process 59690 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003fa0 nua-bases.out`main at nua-bases.cpp:27:5
  24 Foo f;
  25
  26 int main() {
→ 27   return 0;
  28 }
  29
(lldb) expr f
Assertion failed: (Prior→Kind == MemberInfo::Field && "Only storage fields have tail padding!"), function clipTailPadding, file CGRecordLayoutBuilder.cpp, line 754.

This crash originates all the way in Decl::addedMember when called on the field x of B and C , where we set isZeroSize to false (because NoUniqueAdress isn’t attached to x). This means we set the Empty bit on the fields definition to false too. Then in CGRecordLowering::accumulateBases , we incorrectly add empty base classes to the list of Members we iterateduring the clipTailPadding phase. This confuses the layout builder when it sees that multiple bases have been folded into the same offset, resulting in the assertion above.

Scenario 3: tail padding re-use

Details

The [[no_unique_address]] on non-empty fields can also affect codegen. Here’s what cppreference has to say:

If the member is not empty, any tail padding in it may be also reused to store other data members.

Clang doesn’t currently perform such optimization (see GH#50766), but GCC does (see Godbolt). Because LLDB gets used to debug non-Clang binaries, and because Clang may implement this in the future, LLDB should be able to handle this case too.

Take this example:

struct A {
  [[no_unique_address]] double x;
  char y;
};

struct B {
  [[no_unique_address]] A a;
  int z;
};

struct C {
  A a1;
  B b1;
};

A a;
B b;
C c;

static_assert(sizeof(a) == 16);
static_assert(sizeof(b) == 16);

GCC will generate following DWARF:

DW_TAG_structure_type
  DW_AT_name ("A")
  DW_AT_byte_size (0x10)
  DW_TAG_member
    DW_AT_name ("x")
    DW_AT_type (0x0000000000000051 "double")
    DW_AT_data_member_location (0x00)
  DW_TAG_member
    DW_AT_name ("y")
    DW_AT_type (0x0000000000000058 "char")
    DW_AT_data_member_location (0x08)
  NULL

DW_TAG_structure_type
  DW_AT_name ("B")
  DW_AT_byte_size (0x10)
  DW_TAG_member
    DW_AT_name ("a")
    DW_AT_type (0x000000000000002d "A")
    DW_AT_data_member_location (0x00)
  DW_TAG_member
    DW_AT_name ("z")
    DW_AT_type (0x0000000000000083 "int")
    DW_AT_data_member_location (0x0c)
  NULL

DW_TAG_structure_type
  DW_AT_name ("C")
  DW_AT_byte_size (0x20)
  DW_TAG_member
    DW_AT_name ("a1")
    DW_AT_type (0x000000000000002d "A")
    DW_AT_data_member_location (0x00)
  DW_TAG_member
    DW_AT_name ("b1")
    DW_AT_type (0x000000000000005f "B")
    DW_AT_data_member_location (0x10)
  NULL

Observe the following:

  • The DW_AT_byte_size of A and B are both 0x10. This is because GCC was able to fold z into 4 bytes of A ‘s tail padding, placing the z at 0x0c.

This crashes LLDB as follows:

(lldb) run
Process 78192 launched: 'nua-tail-padding-gcc.out' (arm64)
Process 78192 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003fb4 nua-tail-padding-gcc.out`main at nua-tail-padding-gcc.cpp:26:12
  23
  24 int main() {
  25   C c;
→ 26   return 0;
  27 }
  28
(lldb) expr c
Assertion failed: (Prior→FD→hasAttr<NoUniqueAddressAttr>() && "should not have reused this field's tail padding"), function clipTailPadding, file CGRecordLayoutBuilder.cpp, line 760.

We are at the point in clipTailPadding where we see that z was placed at a 12 byte offset, within Tail , and Clang knows that the only way this can happen is with a NoUniqueAddressAttr on the field, which we don’t have in LLDB’s AST (this is pretty much the same issue as in Scenario 1 just based on different kinds of offset problems).

Previous Solutions

In my opinion any solution to this should make sure we attach the NoUniqueAddressAttr to decls that LLDB creates, since we assert on its existence during CGRecordLowering and it’s currently checked in various places around Clang (and can possible decide more things in the future).

Solution 1: Emit 0 DW_AT_byte_size for empty [[no_unique_address]] fields

This was originally raised as a possible solution in D101237 and further discussed on the DWARF mailing list. The idea here is that we can signal to LLDB whether a field is marked NUA by setting its DW_AT_byte_size to 0. This is allowed by the DWARF spec Section 5.7.6 Data Member Entries :

If the size of a data member is not the same as the size of the type given for the data member, the data member has either a DW_AT_byte_size or a DW_AT_bit_size attribute whose integer constant value is the amount of storage needed to hold a value of the type.

This would also give us an already standard compliant way to support GCC-produced binaries (assuming GCC implements this).

Implementation

The prototype in D101237 added an isZeroSize flag to the DIDerivedType , which is set to the value of FieldDecl::isZeroSize API. Then during DWARF emission we would emit a DW_AT_byte_size 0x0 if the flag was set.

Drawbacks

This only works for NUA on fields with empty types. If a field’s type is not empty, isZeroSize doesn’t apply. Meaning this would fix scenarios (1) and (2) but doesn’t handle (3).

Solution 2: Derive attribute during LLDB DWARF parsing

Initially attempted in ⚙ D143347 [lldb][DWARF] Infer no_unique_address attribute. The implementation quickly gets unwieldy. Take the example in Scenario 3 :

struct A {
  [[no_unique_address]] double x;
  char y;
};

struct B {
  [[no_unique_address]] A a;
  int z;
};

struct C {
  A a1;
  B b1;
};

C c;

The order in which LLDB parses c is:

  • cAxyb1Baz

Here, we only realise that there has been any overlap by the time we parse z . We could deduce that since z ‘s offset is within sizeof(a) from a ’s offset, that a has a [[no_unique_address]] attribute. But it’s unclear to me how we would backtrack and deduce the existence of [[no_unique_address]] on x . But maybe that is enough for LLDB to avoid any critical invariant violations? Are there cases where these types of optimisations kick in without [[no_unique_address]] ?

Solution 3: Introduce a new DWARF attribute

Discussed as a viable alternative in ⚙ D101237 [lldb] Fix [[no_unique_address]] and libstdc++ 11's std::unique_ptr. This would be the most straightforward solution at the cost of increasing debug-info and having to standardise a potentially overly specific attribute. However, there could be room for a more general type of attribute that could aid with following related layout-related LLDB crash: deriving tail-padding re-use from DWARF.

Solution 4: Mark ALL decls LLDB creates as [[no_unique_address]]

Discussed as alternatives in ⚙ D101237 [lldb] Fix [[no_unique_address]] and libstdc++ 11's std::unique_ptr and ⚙ D143347 [lldb][DWARF] Infer no_unique_address attribute, but both times it seemed like a rather drastic solution. Given the attribute is check throughout various parts of Clang and CodeGen this has the potential of introducing hard-to-diagnose problems in the future. In my opinion we should try and keep the LLDB-constructed AST as closely aligned to the source AST as possible.

Prior Art

As far as I can tell, GCC generates the same kind of DWARF that Clang does for no_unique_address annotations. I haven’t checked how GDB handles the above examples. Since it doesn’t try to reconstruct the AST like LLDB does, it probably isn’t as much of an issue for them. I found one bug report around NUA handling in GDB here, the description of which suggests it tries to handle overlapping of fields when it parses DWARF (which LLDB currently doesn’t).

Summary

LLDB provides Clang with layout information for record types based on the offsets and sizes parsed out of DWARF. In some cases, however, Clang will derive additional layout information from the structure of the AST. This discrepancy can result in layout invariant violations if LLDB doesn’t create the necessary AST constructs. This RFC discussed how we could create [[no_unique_address]] attributes for ASTs constructed by LLDB.

I see three potential ways forward:

  1. A mix of Solution 1 and 3. Emit DW_AT_byte_size 0x0 for fields that are empty. This won’t address all NUA usages, but does address the ones that Clang actually supports. We can add a new attribute for the potential tail padding re-use case, which could help address another non-NUA layout issue.
  2. Implement Solution 3 only.
  3. Implement Solution 2 only. Fixes this purely in LLDB by deriving the attribute when parsing fields and base classes. As far as I can tell, this would have to be a best-effort algorithm because some cases aren’t derivable purely based on the DWARF offsets. Meaning the LLDB AST wouldn’t necessarily always match what we had in the source.

My preference would be (1) or (2) above, but happy to hear other suggestions/concerns.

(CCing people who might be interested @dblaikie @adrian.prantl @pogo59 @clayborg)

1 Like

[[msvc::no_unique_address]] might be throwing a wrench into solutions that encode the attribute in roundabout ways (in my reading solutions 1 and 2), since now you have to encode two kinds of NUA.

I’m absolutely not insisting on supporting [[msvc::no_unique_address]], but it’d be nice to explicitly consider it in the design (and potential layout mismatches leading to crashes on Windows).

msvc::no_unique_address is effectively an alias to no_unique_address: llvm-project/clang/include/clang/Basic/Attr.td at 464d9d96b3565ead06396ffb8d02b4dcf9cb9556 · llvm/llvm-project · GitHub so hopefully that won’t pose any significant issues for LLDB.

Thanks for resurrecting this discussion/pushing on it - I’m quite looking forward to the removal of __compressed_pair in favor of [[no_unique_address]] (from a DWARF size impact, etc), so this is good to see progress on & it’s non-trivial to figure out.

A somewhat alternative/aggressive idea, maybe @AaronBallman has thoughts on this - what if we removed the assertions from clang? Allowing lldb more freedom to lay out the struct as it wants to/as DWARF dictates without extra checks saying that layout is not possible. I fully appreciate the LLVM projects liberal assertion policy, but maybe if we have ways to specify struct layouts anyway (my understanding of how LLDB generates ASTs - that LLDB uses the DWARF to specify where the members are, etc) - we could argue that we should respect those layouts without the double checks that they were derived from certain properties?

I realize this’d still potentially be an ongoing maintenance issue - perhaps some fuzzing could be added to check Clang’s robustness with novel layouts (or even with valid layouts only, but missing the extra attributes and things that would create those layouts).

Guessing when to apply the attribute is, as you say, inconsistent/not 100% reliable (absolute worst case would be a type with no_unique_address that’s not used in any other type - but in an LLDB expression you want to create a struct that uses the NUA struct and create the correct layout (admittedly in this case, there’s not much reason this LLDB-expression struct needs to match the ABI correctly anyway, but in a really hypothetical scenario, maybe it would - like you want to call some function you don’t have debug info for & you can reconstruct the struct it uses yourself?)) - but the layout in the DWARF is correct/real/what we want to respect…

Though I guess the hypothetical scenario of a newly derived type can’t be addressed by clang respecting the layout more - because we don’t have the layout for a newly created structure… :confused:

Sorry, going round in circles a bit, which I tend to on this issue :confused:

1 Like

Oh, and that layout difference between GCC and Clang is an ABI bug… so I’ll file that. Clang and GCC (on Linux, at least - on MacOS Clang’s ABI is the authority, so no change would be made there) need to agree.

Oh, already filed here: [no_unique_address] Member not stored in previous member's padding · Issue #50766 · llvm/llvm-project · GitHub

1 Like

I think it makes sense to relax the assertions against field overlap for types with externally-provided layouts. Setting aside NUA attributes, we should try to come up with a solution that handles non-sensical DWARF, or DWARF that has overlapping fields for other unforeseen reasons. Presumably, LLDB should not crash in clang on such inputs. CGRecordLowering should give up in these cases and use i8 arrays of appropriate size so Clang can generate GEPs for the correct offsets, and the user just gets the emergent behavior of LLDB with overlapping fields.

We should keep the asserts for non-external layouts, of course.

A somewhat alternative/aggressive idea, maybe @AaronBallman has thoughts on this - what if we removed the assertions from clang? Allowing lldb more freedom to lay out the struct as it wants to/as DWARF dictates without extra checks saying that layout is not possible. I fully appreciate the LLVM projects liberal assertion policy, but maybe if we have ways to specify struct layouts anyway (my understanding of how LLDB generates ASTs - that LLDB uses the DWARF to specify where the members are, etc) - we could argue that we should respect those layouts without the double checks that they were derived from certain properties?

Good point, this would also probably require the least effort to implement. Something I should’ve mentioned in the RFC is that the failure cases only manifest in assertion-enabled builds. I have not observed Clang crashes when these assertions are disabled.

My concerns with disabling the assertions and leaving the LLDB AST as-is are mainly that the existence of the attribute may be checked elsewhere in the front-end (now or in the future). Currently, these assertions help signal to us that something about the AST that LLDB created isn’t correct. Usually the effect of those discrepancies are tricky to diagnose and if we get rid of these checks we’ll have even less of a chance to discover these. Also there’s no guarantee that violating these assertions won’t in the future become a more serious problem. So maybe there’s a middle ground where we do both, on the one hand be more lenient with the layout (for the tricky layout cases, e.g., Scenario 3) but also emit DW_AT_byte_size 0x0 for empty fields that are obviously marked NUA? Though I’m not sure I like the idea of the AST being correct only some of the time…

Setting aside NUA attributes, we should try to come up with a solution that handles non-sensical DWARF, or DWARF that has overlapping fields for other unforeseen reasons. Presumably, LLDB should not crash in clang on such inputs.

True, I’m not sure what safeguards/tests exist currently against malformed offsets. Would be worth exploring how we might test and support such cases.

but also emit DW_AT_byte_size 0x0 for empty fields that are obviously marked NUA? Though I’m not sure I like the idea of the AST being correct only some of the time…

Of course if we do that we could also just emit a new DW_AT_no_unique_address or something like that instead

Not sure I’m following - why would leniance be more relevant to the trickier scenarios? Oh, because the NUA attribute applies/has an effect, but doesn’t make the size zero, so DW_AT_byte_size 0x0 couldn’t be used there.

If anything, it seems like we wouldn’t want the more esoteric cases to rely on the (as you say, perhaps harder to guarantee) relaxed AST invariants, while more common cases don’t - more chance that the relaxed AST invariants would end up undertested/buggy? I’d be more inclined to have the mainstream cases rely on the same strategy as the esoteric cases to help flush out any bugs there.

I agree that making a new path through clang for things isn’t trivial - I’d highly encourage struct layout fuzzing clang (folks working on clang-cl did some great struct layout fuzzing to implement MSVC compatible struct layout - doing something like that for clang+lldb to ensure the layout can be reproduced without assertions, etc, would be good/give some reasonable confidence (possible there are interesting /uses/ of types, even once we’re past the “is the layout OK” that still hit assertions, and testing all the ways a type might be used that could hit a codepath that clang has some excessive assertions on wouldn’t be clear/simple/easy - but I think most of the assertions will be in the layout code itself?)) to ensure this and other layout issues are discovered & regressions are found relatively quickly.

I assume lldb’s super not safe/assertion-free on garbage DWARF… that seems like a whole other question/problem to solve, which I imagine is going to be way down on peoples priority list (again, had we done DWARF fuzzing LLDB from day 1, maybe - but the amount of work required at this time, and the relative payoff unless someone has new uses of LLDB that need to be really stable/safe (like untrusted inputs, etc))

@AaronBallman @adrian.prantl @dblaikie and I met last week to decide on a way forward. Here is a summary of what was discussed:

  • We discussed two approaches: either encoding NUA as DWARF attribute or relax assertions in Clang’s LayoutBuilder

  • We came to the agreement that the attribute would really be only useful for LLDB’s expression evaluator and would only solve this narrow problem. Other instances of “AST discrepancy affects layout” would still remain (e.g., triviality attribute)

  • For now we’ll go ahead and relax the structure layout assertions

  • Remaining concerns/action items:

    • Is this compatible with LLDB’s module loading infra. E.g., what if we load a module that does have the attribute but DWARF doesn’t?
    • NUA is checked in a couple of places in Sema, most notably for structure layout compatibility. Could this bite us?
    • Can we still somehow let LLDB know of these discrepancies (to avoid issues being harder to debug without assertions firing)? Aaron pointed out remarks could do what we want here?
    • David’s stance is that we should be less strict with the assertions around LayoutBuilder for all layout providers (not just LLDB). Aaron points out that we probably rely on those assertions for test-coverage too heavily to remove them entirely. Should talk to Eli/John about what’s feasible here.
    • How should we go about testing/fuzzing layouts that LLDB passes to Clang

TL;DR: I’ll guard the assertions in the LayoutBuilder on whether LLDB provided the layout and follow-up on the action items above. Thanks everyone for their inputs!

1 Like