tl;dr by changing source locations (DILocation
s) from being global MDNode
objects to being efficiently-stored function-local data, memory usage in debug info builds can be significantly cut; this comes with significant disruption to downstream forks and users of LLVM’s API, and so needs careful consideration.
Background
In LLVM’s debug info metadata, source locations of instructions are represented with the DILocation
class, a subclass of MDNode
, which contains the fields (Scope, InlinedAt, Line, Column, IsImplicitCode)
. Although it is necessary for debug info, DILocation
ends up comprising a significant portion of LLVM’s memory consumption depending on the input. The exact percentage varies greatly depending on the source and build configuration, anywhere from <0.1% to 10% at peak memory usage (and often significantly higher during optimization passes). The reason for the heavy memory consumption can roughly be summarized as follows:
DILocation
uses 16-24 bytes to store source location data, and 24 bytes to store genericMDNode
data; this data is useful in some contexts, such as parsing, but is wasted throughout most of compilation.DILocation
s are far more numerous than otherMDNodes
; in the cases I looked through they generally comprised anywhere from 30-80% of allMDNode
s, scaling with the amount of inlining that takes place as we duplicate every inlinedDILocation
.- For each
DILocation
, we must also add a DenseMap entry in the LLVM context object as part of the uniquing behaviour of metadata, which can quickly end up consuming a measurable % of program memory.
It is certainly possible to make a variety of improvements that partially address each of these points, but they can’t be fundamentally changed without breaking some of the behaviours that LLVM currently relies on. With that in mind, we (Sony) decided to experiment with completely reworking source locations, and believe the result is worth implementing in full.
Proposal
In the prototype, we’ve split the DILocation
fields above into two separate structs - “Context” data, (Scope, InlinedAt)
, and “Location” data, (Line, Column, IsImplicitCode)
. These are stored in two separate arrays owned by a DISubprogram
, such that source locations are now function-local metadata. Finally, instead of each Instruction
holding a pointer to a DILocation
, they instead hold a pair of uint32_t
indexes into the context and location arrays.
This has some significant implications for usage of source locations: Instruction
s no longer have a direct reference to their own source location, so a reference to the owning DISubprogram
is needed. Although this is sometimes inconvenient, it is a logical limitation: DILocation
s are only used in a function-local context, and so everywhere that they are used, a reference to the owning DISubprogram
is either present or easily-obtainable. In return, so far the prototype reduces the overall memory cost of DILocation
s significantly, around 50% for most inputs tested in the CTMark suite, and as the current implementation is very much unrefined and unoptimized we expect further improvements to be made.
For this post I’ll leave the technical explanation at just a rough overview, as the implementation is very much a work-in-progress - the MIR backend has yet to be fully implemented, the prototype as a whole is not review-ready, and there are a number of core components that will change before the final implementation. For more details on the implementation however, see the draft prototype and accompanying documentation here: Prototype: Replace DILocations with function-local source locations by SLTozer · Pull Request #133949 · llvm/llvm-project · GitHub
What comes next
The concept is not fully proven yet, though the prototype fundamentally works: there are still bugs and missing features, but the “tricky” cases are solved or have known solutions. Runtime performance impacts are still unclear - the prototype currently has a high runtime cost in Clang (which will be fixed later), and is about equal during optimizations, but this may change either way as the implementation is finalized.
More challenging than the mere implementation of this change however is the rollout - the change fundamentally affects all APIs that interact with source locations, such as the C API, which currently uses opaque MetadataRef
s to pass debug locations around. While the in-tree updates to uses of DILocation
s are mostly trivial, this still creates work for any maintainers of downstream forks; unlike other significant rewrites, such as the replacing debug intrinsics with debug records, there is no simple runtime fallback for this change. Therefore, if the approach is accepted, the transition would need some careful planning to avoid causing too much disruption.
What we’re interested in right now is input from stakeholders, primarily anyone who:
- has experience working with LLVM’s metadata model,
- has a strong interest in reducing memory consumption in debug info builds,
- consumes any of LLVM’s APIs that may be affected by these changes,
- maintains a downstream fork of LLVM, particularly with any debug-info-related changes.
Any input is welcome, but we would particularly appreciate any technical feedback on the design direction, any issues you foresee with this approach, how this change could/would affect your own usage of or modifications to LLVM, and any suggestions for how this change could be made easier to work with/transition into.
Note also that by packing source location data more efficiently, we free up some headroom to expand our representation of source locations: this work was motivated by our work on Key Instructions, a feature which adds new fields to DILocation
to improve stepping in a debugger (more details here). This rewrite reduces the cost of adding these fields, and may similarly benefit any other projects that look to extend LLVM’s representation of source locations.
Pinging a few individuals who this may be particularly relevant to: @adrian.prantl @dblaikie @echristo @nikic