[RFC] Amortizing debug info processing cost in CoroSplit

arr · September 16, 2024, 8:41pm

CoroSplitPass can get very slow for source files compiled with full debug info (we see 10x to 100x slowdown in some cases when switching from -g1 to -g2). The slowdown is caused by:

repeated work in metadata processing during coroutine cloning (3 clones per a switch coroutine and arbitrary number of clones for other kinds), and the fact that
debug info metadata cloning is effectively O(Module) rather than O(Function) currently.

Problem #2 was described in this commit together with the idea to revamp metadata ownership making it easy to identify metadata owned by a Function and cloning it efficiently. This is the right fix conceptually, but it also seems like a larger endeavour.

In the meantime, we can significantly reduce the overhead by pre-calculating and deduping some work. Which is not a fundamental fix to the metadata ownership model but seems worthwhile doing anyway.

I prepared a patch set that makes CoroSplitPass more efficient (see below). The changes are not too invasive but not trivial either, so I’m looking for feedback on the approach.

Each commit in the patch set is individually buildable and reviewable, and I’m happy to submit them as individual PRs in a stack. With that said, I thought that providing high-level context for the whole changeset together with some commentary for groups of commits could be useful.

Anecdata

These numbers are taken from a traceview of a sample C++ source file (it’s a larger one but this is exactly what ends up on build’s critical path).

Each column corresponds to a certain commit in the patch set (see the commentary below), while rows are time trace scopes.

The final speed up is 18x, however I think it could be made another 2x faster in another incremental change if the overall direction makes sense.

	Baseline / 0	IdentityMD set / 1	Prebuilt GlobalDI / 2	Cached CU DIFinder / 3
CoroSplitPass	306ms	221ms	68ms	17ms
CoroCloner	101ms	72ms	63ms	0.5ms
CollectGlobalDI	-	-	63ms	13ms
Speed up	1x	1.4x	4.5x	18x

The file has hundreds of coroutines, so the effect on the total compile time is dramatic: 2m30s in coroutine processing before vs 9.5s after:

Patch set commentary

Step 0:

The first group of commits is a step-by-step refactoring of CloneFunctionInto, trying to extract reusable pieces out of it. The resulting APIs are not ideal but hopefully good enough / better and a step in the right direction.

(0) [NFC][Coro] Add helpers for coro cloning with a TimeTraceScope

[NFC][Utils] Extract CloneFunctionAttributesInto from CloneFunctionInto

[Utils] Extract ProcessSubprogramAttachment from CloneFunctionInto

[NFC][Utils] Remove DebugInfoFinder parameter from CloneBasicBlock

[NFC][Utils] Clone basic blocks after we’re done with metadata in CloneFunctionInto

[NFC][Utils] Extract BuildDebugInfoMDMap from CloneFunctionInto

[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto

[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto

[Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap

[NFC] Remove adhoc definition of MDMapT in IRMover (<- this one is not strictly necessary)

Step 1:

This commit changes how we communicate global debug info that shouldn’t be cloned to the ValueMapper. Previously, CloneFunctionInto would eagerly identity-map global debug info in a ValueMap to avoid cloning it, but this is expensive and complicates sharing.

With this commit, such global metadata is passed to ValueMapper separately and is identity-mapped on first use. This is needed for the rest of the patchset to work (unless doing it this way is subtly wrong of course!)

I tried other ways to prime MD map but they ended up being a lot slower / harder to manage.

(1) [Utils] Identity map global debug info on first use in CloneFunction*

Step 2:

This is a straightforward continuation of Step 1. All coroutine clones share the same set of global debug infos, so we build it once and then pass directly using individual CloneFunction* helpers extracted in Step 0.

(2) [Coro] Prebuild a global debug info set and share it between all coroutine clones

Step 3:

All global debug info sets from Step 2 share a common core coming from DICompileUnit. We can build it once, cache, and then re-use it for each run of CoroSplitPass.

I implemented it as a simple module-level analysis. But I’m not sure if the way I wired it is the best (or even the right one!), so would certainly appreciate input on this.

[Analysis] Add DebugInfoCache analysis
(3) [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass

arr · April 21, 2025, 2:44pm

Quick update: the changes have now been fully merged in a series of PRs. The umbrella PR has all the references. The final version didn’t require caching after all and works for all users of CloneFunctionInto without any changes, rather than just for the CoroSplit pass. The trick was to figure out how to make debug info processing cheaper when cloning a function (specifically, avoiding an expensive traversal and identity mapping of debug info nodes that should NOT be cloned, such as types, enums, etc.)

The final implementation also turned out faster than the initial one: in terms of the CoroSplit pass runtime on a sample cpp file with -g2 : 306ms → 17ms @ 18x initially vs 306ms → 3.8ms @ 80x now which is almost as fast as -g1 .

The notable PRs are:

#118627: extend ValueMapper to explicitly accept a set of metadata nodes to identity map, rather than identity-mapping it in VMap.
#129147: replace the set above with a predicate; still use set.contains as the body of the predicate.
#129148: remove the set at all; instead supply a pure predicate created for a function to be cloned.

Topic		Replies	Views
Should we construct debug information in LLVM? LLVM Dev List Archives	2	117	April 12, 2021
Asking for advice: how to best place the CoroSplit pass LLVM Dev List Archives	3	142	January 28, 2021
can debug info for coroutines be improved? LLVM Dev List Archives	2	153	June 29, 2018
CloneFunctionInto produces invalid debug info LLVM Dev List Archives	17	234	June 23, 2017
Debug Info Slowing Things Down?! LLVM Dev List Archives	8	164	November 18, 2013