[RFC][DwarfDebug] Fix and improve handling imported entities, types and static local in subprogram and lexical block scopes

chbessonova · February 17, 2023, 2:22pm

This RFC is to propose an approach to address issues of function-local entities emission and implement support of static variables and types scoped in a lexical (bracketed) block. I’ve made several attempts to address the problem [0][1] before I came up with the proposed solution, so put the history of this discussion into the RFC to have all relevant information to progress with the proposal.

The approach

This changes debug metadata: all local entities are now tracked by DISubprogram’s retainedNodes field. Previously they were in different fields of DICompileUnit. DICompileUnit can no longer track any local entity in its fields with the exception of retainedTypes – it has to persist because local scopes can be fully optimized out. This change allows removing several FIXMEs in the code. It also unifies and simplifies retrieving information about local entities scoped in a particular DISubprogram.
DWARF emission for all (not just function-local) global variables, types and imported declarations get moved from DwarfDebug::beginModule() to DwarfDebug::endModule(). For function-local entities we have to do this to place them into a proper subprogram tree (abstract or concrete): only by DwarfDebug::endModule() it is resolved whether a particular subprogram has an abstract sub-tree. Non function-local (i.e. global) entities handling are moved to endModule() mostly for the sake of unification (to handle all the entities in a single place). Global imported entities, however, also depend on all subprograms being created since they can refer to a function.
Besides unification and enabling function-local debug entities emission, (2) addressed a design issue of the current approach: passes that run after DwarfDebug::beginModule() could change the IR making emitted debug info incorrect (for instance, it was the case for NVPTX generic-to-nvvm lowering).
(2) by itself this is an NFCI-like change; it affects only the order of emitted debug entities.
To emit function-local entities DwarfDebug::endFunctionImpl() parses DISubprgram’s retained nodes and creates a mapping between a lexical scope and function-local entities it contains. Later while visiting scopes, this mapping is used to collect function-local nodes each CU must emit. Physically they are SmallVectors of nodes in every CU object. When emitting in DwarfDebug::endModule() the same algorithm is used for both function-local and global entities, and the previously collected lists are used as additional source of entities to emit.
DwarfCompileUnit::getOrCreateContext() can now handle DILexicalBlock scopes to emit entities scoped in a lexical block:

it chooses an abstract subprogram or lexical block DIE if available (assuming all abstract entities get created by the time we emit function-local entities),
it chooses an existing concrete out-of-line subprogram or a lexical block DIE if there is no abstract counterpart and if available,
it falls back to the most close existing parent DIE if there is no DIE corresponding to the given local scope.

The patchset

I put to review 7 patches, one of which is a review-only and should be squashed with the subsequent one before commit. The patches are:

https://reviews.llvm.org/D143984
NFC. Simplify DISubprogram’s retainedNodes to extend them for the purpose of tracking other function-local declarations.
https://reviews.llvm.org/D143985
Review only. Move imported entities emission to DwarfDebug::endModule(). I have to extract it from the next patch because it significantly affects tests (more than 6k lines) and reviewing the next patch might be problematic. But it’s not possible to commit them separately, because this patch isn’t fully complete and causes issues with tests.
https://reviews.llvm.org/D144004
Does the most of the work of changing debug metadata and fixes function-local imported entities emission. Fixes [2].
https://reviews.llvm.org/D144005
Move types emission to DwarfDebug::endModule().
https://reviews.llvm.org/D144006
Implement support for local types scoped in a lexical block.
https://reviews.llvm.org/D144007
Move global variables emission to DwarfDebug::endModule().
https://reviews.llvm.org/D144008
Implement support of local static variables (globals in terms of LLVM IR) scoped in a lexical block. Fixes [3][[4].

Patches 2, 4 and 6, and 3, 5 and 7 are split from each other to separate test changes due to debug info reordering from test changes due to functional changes.

Discussion

Adding ‘retainedNodes’ to lexical blocks enclosing them (not just to a subprogram) was discussed in [1] as an improvement for the patch set. It would make the implementation simpler and cleaner and allow skipping/deleting scopes/entities that were optimized away. The first show stopper for this idea I’ve encountered is a local type (and I haven’t thoroughly investigated if there are others). If the enclosing lexical scope is optimized out, the information of the type may be lost, despite there being a reference to it, or emitted incorrectly (in a wrong parent entity, in a wrong CU, etc). My attempts to hack around that issue ended up with a 23% compilation time increase on CTMark and the alternative price is memory footprint which doesn’t look good to me either.

Historical perspective

The work on the patches started about a year ago, but previous attempts didn’t account for all the issues and introduced problems with split-dwarf [0][5]. Finally, before coming up with the current set of patches I posted [1]. The patches in essence implemented the same approach I described in this RFC, but they were organized differently and they were harder to review. The current patch set is more elaborated and better organized work which addressed previous concerns.

Conclusion

This RFC is actually a call for action. The comments about the design are appreciated, but if it looks ok, I encourage the reader to assist me by reviewing the patches.

[0] ⚙ D113741 [RFC][DwarfDebug][AsmPrinter] Support emitting function-local declaration for a lexical block
[1] ⚙ D125693 [DebugInfo] Support types, imports and static locals declared in a lexical block (3/5)
[2] Debug info for imported declarations sometimes reference empty DIEs · Issue #51501 · llvm/llvm-project · GitHub
[3] Debug info not generated correctly for function static variables · Issue #19612 · llvm/llvm-project · GitHub
[4] Orphaned DWARF for static local of inlined func · Issue #29985 · llvm/llvm-project · GitHub
[5] ⚙ D109703 [DebugInfo] Fix scope for local static variables

Yashasvi · May 10, 2024, 1:59pm

Hi @chbessonova @avdzhidzhoev ,

Thanks for working on this! I see that patches 5-7 have been accepted. What’s missing to land these patches?

avdzhidzhoev · July 9, 2024, 10:31pm

Hello! Thank you. Sorry for the delay.
Currently, the process stuck a bit on the fourth patch (⚙ D144006 [DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)) because of some problems reported by Chromium [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined functions by dzhidzhoev · Pull Request #75385 · llvm/llvm-project · GitHub.
Unfortunately, I had to focus on another project at the moment, but it would be great to eventually make them done.

jmorse · July 15, 2024, 1:28pm

Thanks for working on all of this!,

[…] because of some problems reported by Chromium

Just to note, I don’t think the problems encountered are to do with the patch-set, they’re just unfortunately exposing some kind of latent bug within LLVM. We’ve seen something suspiciously similar (merged debug-info getting the wrong scope-links) in downstream code a few years ago, but could never pin down what was wrong.

tromey · May 2, 2025, 3:04pm

Hi. While working on debugging support for an LLVM-based Ada compiler, I discovered that it’s not possible to use a DILexicalBlock as the scope of a type. Eventually I found this thread.

I looked through Phabricator a bit, and also the github PRs, and I found this PR (which points to the most up-to-date branch for this work I could find).

I’m writing to ask two things.

First, is there anything more recent than this work, and is this still being developed?

Second, does this work account for the scenario where a type may depend on local variables? As one example, in Ada an array type’s bounds may be determined by local variables. If I understand the problem correctly (which isn’t entirely clear), then a type like this can’t really be moved to the block’s abstract origin, because it will refer to variables in the concrete instance. (This scenario does come up in C, with VLAs.)

jmorse · May 2, 2025, 5:49pm

Hi – that PR [0] ended up blocking as there were some block DW_AT_abstract_origin links that weren’t appearing in the output. Handily, @avdzhidzhoev added those links in [1], so it might be feasible again. Realistically I can’t look at it for at least a couple of weeks right now, alas.

Second, does this work account for the scenario where a type may depend on local variables?

For the narrow portion I’ve investigated I would say no; however it might be supported by the larger patch set that @avdzhidzhoev produced.

[0] [DebugInfo] Place local ODR-uniqued types in decl DISubprograms by jmorse · Pull Request #119001 · llvm/llvm-project · GitHub
[1] [DebugInfo][DWARF] Emit DW_AT_abstract_origin for concrete/inlined DW_TAG_lexical_blocks by dzhidzhoev · Pull Request #136205 · llvm/llvm-project · GitHub

avdzhidzhoev · May 5, 2025, 11:11am

As a result of these changes, it should be possible to place local types in corresponding concrete DISubprograms. However, they still won’t allow types inside specific DILexicalBlocks.

Does DISubrange with count: pointing to DILocalVariable work in that case? (As in llvm/test/DebugInfo/X86/vla-multi.ll, for example).

avdzhidzhoev · May 5, 2025, 11:12am

Should I take [DebugInfo] Place local ODR-uniqued types in decl DISubprograms by jmorse · Pull Request #119001 · llvm/llvm-project · GitHub over?

jmorse · May 5, 2025, 12:27pm

Should I take #119001 over?

Yes please; I’m still not certain what’ll happen when there’s a combination of LTO / static-local variables / inlining as mentioned here [0], but the patch overall is a strict improvement IMO.

[0] [DebugInfo] Place local ODR-uniqued types in decl DISubprograms by jmorse · Pull Request #119001 · llvm/llvm-project · GitHub

tromey · May 12, 2025, 5:12pm

I assume so but I haven’t yet implemented this in gnat-llvm. Note that Ada has some more complicated scenarios here, like a field offset might depend on the value of several variables or other fields in combination. I hope DIExpression in conjunction with DW_OP_LLVM_arg will work here, but I haven’t really tried it yet.

avdzhidzhoev · May 14, 2025, 12:40pm

@tromey Sorry, I’ve misinterpreted your original message a bit. Yes, it should be possible to have DICompositeType with scope: of lexical scope after [0]. They should be tracked within DISubprograms (which is what I meant by they still won’t allow types inside specific DILexicalBlocks).

[0] [DebugInfo] Place local ODR-uniqued types in decl DISubprograms by jmorse · Pull Request #119001 · llvm/llvm-project · GitHub

tromey · May 28, 2025, 6:25pm

Thanks for your response. I think full support for Ada will need the lexical block fix as well, though I confess I don’t really have any idea how to approach it right now.

avdzhidzhoev · November 13, 2025, 1:55pm

Since this patchset was initially created back when we were using Discourse, and merging it has taken multiple attempts with new issues appearing each time, it’s becoming harder for the community to keep track of what’s going on. I’m putting together a list of patches and merge attempts to make it easier to track the progress of these changes.

Patches were reordered in the following way:

⚙ D143984 [DebugMetadata] Simplify handling subprogram's retainedNodes field. NFCI (1/7) – merged.
⚙ D143985 [DwarfDebug] Move emission of imported entities from beginModule() to endModule() (2/7) – merged.
⚙ D144004 [DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7) – merged.
⚙ D144006 [DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7) Implement support for local types scoped in a lexical block. – not merged.
⚙ D144007 [DwarfDebug] Move emission of globals from beginModule() to endModule() (5/7) – not merged.
⚙ D144008 [DebugMetadata][DwarfDebug][CodeView] Support function-local static variables in lexical block scopes (6/7) – not merged.
⚙ D144005 [DwarfDebug] Move emission of types from beginModule() to endModule() (7/7) – not merged.

We are currently (still) at the patch #4 ⚙ D144006 [DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7) .

Here’s the history of its merge attemps and problems that were encountered:

Initial attempt https://reviews.llvm.org/D144006
This change had to be reverted due to a crash with reproducer https://reviews.llvm.org/D144006#4434113.
https://reviews.llvm.org/D155818 - when a function and its DISubprogram are cloned, types from retainedNodes should be cloned too, to avoid types belonging to one subprogram be present in retainedNodes of another subprogram.
Next attempt: https://reviews.llvm.org/D158730 — fixes DIBuilder::retainType, so that local types are not tracked twice, in both DICompileUnit’s retainedTypes and DISubprogram’s retainedNodes.
Reverted due to
Problem 1. LTO crash https://reviews.llvm.org/D144006#4651955 https://issues.chromium.org/issues/40283442#comment4. Due to enableDebugTypeODRUniquing feature, we get multiple DISubprograms having the same local type in their retainedNodes lists.
Problem 2. Problem with function cloning, that led to the situation when we several formal parameters with the same index are in the scope of a single DISubprogram https://bugs.chromium.org/p/chromium/issues/detail?id=1500022.
Next attempt: https://github.com/llvm/llvm-project/pull/68986 — if multiple DISubprograms reference the same local type from their retained nodes list (which can happen due to enableDebugTypeODRUniquing), replace such references with clones of the local types. Solved actually neither problem 1 nor problem 2 from the previous attempt.
Reverted due to https://issues.chromium.org/issues/40288032 (similar problem to https://bugs.chromium.org/p/chromium/issues/detail?id=1500022).
Next attempt: https://github.com/llvm/llvm-project/pull/75385 - Avoid cloning DILocalVariables of inlined functions.
Revert after multiple crash reports https://github.com/llvm/llvm-project/pull/75385#issuecomment-1893362122.
It solved the problem with a function having multiple parameters with the same index, but problem 1 still persisted. Discussions and PRs that followed this patch were mostly dedicated to making local types in DISubprogram’s retainedNodes and ODR type uniquing compatible with each other (problems show up mostly during LTO, when enableDebugTypeODRUniquing is on).
@jmorse proposed to track local types in declaration DISubprograms, instead of definition DISubprograms https://github.com/llvm/llvm-project/pull/119001, https://github.com/llvm/llvm-project/pull/142166. It might have helped to solve problem 1, as declarations are uniqued. However, declarations do not have information about lexical scopes, and the goal is to enable placing local types in lexical scopes of a DISubprogram.
It was proposed to make definition DISubprograms uniqued across LLVM IR in general, or only for inlined out DISubprograms. I’ve made attemps to do that with a chain PRs (links to all of them are in https://github.com/llvm/llvm-project/pull/142166#issuecomment-3390586414). However, it makes dwarf emission, as well as modifying DISubprograms in IR passes, complicated, so I’ve given up on that idea. When it comes to merging definition DISubprograms for inlined-out subprograms from different modules having the same linkageName, I still think it’s an interesting idea, I’ll try to do that separately.
Current attempt: https://github.com/llvm/llvm-project/pull/165032. If a local type is uniqued due to enableDebugTypeODRUniquing in MetadataLoader/LLParser, MetadataLoader/LLParser should clean up the lists of retainedNodes of affected DISubprograms, so that only one DISubprogram will reference the local type. If other functions (which are most likely the clones of the same source code function) reference this local type, we rely on cross-CU references. It might be suboptimal, as we will still have multiple abstract subprograms for the same source code function, and the local type visibility for debugger might be limited from cloned functions. But I’m hoping “DISubprogram clones deduplication” will be handled separately from this patchset.
Another concern about this patch is the case when a lexical scope is optimized-out, but the corresponding local type still persists in retained nodes list, but I’m not sure if it will be a problem.

Topic		Replies	Views
RFC: Supporting all entities declared in lexical scope in LLVM debug info LLVM Dev List Archives	6	124	January 20, 2016
Catching Temporarily Used Types in Debug Info Clang Frontend	8	190	March 3, 2014
[ThinLTO] Reducing imported debug metadata LLVM Dev List Archives	11	174	December 9, 2016
DwarfDebug problems LLVM Dev List Archives	6	159	December 4, 2013
Seeking clarification and way forward on limited scope variables. LLVM Dev List Archives	9	141	June 23, 2020