[RFC] Clearing Clang AST before running backend optimizations/codegen to save memory

We keep around the Clang AST when we do backend optimizations on the IR. This causes the peak memory usage to be more than necessary since (I believe) generally we don’t need the Clang AST when running optimizations. This gives us more room to work with things like caching analyses, at least for frontend compilations.

Measuring the effects of this when building LLVM’s PassBuilder.cpp (longest LLVM file to compile), I measured a drop of peak memory usage (/usr/bin/time’s max rss) from ~1.3-1.4G to ~1.0G.

There are still a couple issues I haven’t dug too deeply into yet, mostly to do with cleaning things up when freeing memory, so right now it’s only enabled with -disable-free which works around those issues. Most clang tests pass with this patch; there are a couple things that crash (e.g. -Rpass, clang interpreter) where we can investigate further or just disable this feature.

Prototype: https://reviews.llvm.org/D109781
llvm-compile-time-tracker memory metrics

Any concerns with this?

I think it’d be unfortunate if certain features don’t work in this mode, unless we understand why/are pretty sure that’s a fairly fundamental limitation. For instance, at Google we’ve got memory limitations (hence the motivation for this work) and I think we created, or at least have some interest in -Rpass - if -Rpass couldn’t be composed with this feature, then we’d make it harder to investigate performance issues (because -Rpass wouldn’t be available) in larger compiles that need this memory savings to fit into the memory limits we have. I’d guess the issue is that -Rpass I think traffics in Clang source locations. So it’s possible the source location infrastructure/data structures would have to be kept, even though the AST/semantic pieces could be torn down. (unless that source location stuff can refer into ASTs for differentiating template specializations, etc - that’d be the tipping point for me in “OK, it may be worth the benefit to make these incompatible, or reduce the quality of -Rpass diagnostics when using this memory saving technique” - wonder if it’s only the -Rpass diagnostics, or other backend diagnostics that use that infrastructure)

Which is to say I’d be /slightly/ averse to adding this feature as a Clang default or driver flag (& similarly averse to leaving it as a cc1 off-by-default flag indefinitely) without a pretty good answer to those crashing/non-functioning tests.

(lower priority, but fairly nice-to-have would be some answer to the cleaning up issues, -disable-free, etc - sort of weird that we’d have to /disable-free/ to enable freeing things earlier… that seems pretty suspicious)

but if those issues can be resolved

Looking at -Rpass (and various things like warnings for inability to vectorize when we specifically request it with #pragma clang loop vectorize(enable)), it does end up using objects from the AST to approximate the source location (BackendConsumer::getBestLocationFromDebugLoc()) if it can’t find the source location from debug info. So this would affect users who don’t build with debug info. Without debug info or the AST, clang will print a warning/remark without a source location. This is a tradeoff we’d have to decide on.

There’s a similar issue with backend warnings but those don’t even pass debug info to the diagnostic handler (clang/test/Misc/backend-stack-frame-diagnostics.cpp). Perhaps it could be extended to do that though.

I haven’t looked too much into the -disable-free stuff, but the reason it mitigates crashes is because if we clear AST objects we still have dangling references to them that we later attempt to clean up unless we -disable-free.

Updated https://reviews.llvm.org/D109781

Looking at -Rpass (and various things like warnings for inability to vectorize when we specifically request it with #pragma clang loop vectorize(enable)), it does end up using objects from the AST to approximate the source location (BackendConsumer::getBestLocationFromDebugLoc()) if it can’t find the source location from debug info. So this would affect users who don’t build with debug info. Without debug info or the AST, clang will print a warning/remark without a source location. This is a tradeoff we’d have to decide on.

Ah, the AST lookup is only to retrieve the location of a function by name - if we made a mapping/record of all those locations (shouldn’t take up much space, I’d think) then we could use that instead and not need the AST for that callback?

There’s a similar issue with backend warnings but those don’t even pass debug info to the diagnostic handler (clang/test/Misc/backend-stack-frame-diagnostics.cpp). Perhaps it could be extended to do that though.

I haven’t looked too much into the -disable-free stuff, but the reason it mitigates crashes is because if we clear AST objects we still have dangling references to them that we later attempt to clean up unless we -disable-free.

So disabling free is keeping more things alive - perhaps then the RAM savings aren’t as much as they could be if freeing was enabled? But yeah, more to look into.

Drive-by thought, debug-info-for-profiling retains source info, maybe that could be unconditionally on and Rpass could use it?

–paulr

Drive-by thought, debug-info-for-profiling retains source info, maybe that could be unconditionally on and Rpass could use it?

I think -fdebug-info-for-profiling tweaks the DWARF output to include some more info (like mangled names for inlined functions even in gmlt, start of function line numbers so sample profiles can be robust to certain amounts of code change, etc)

But yeah - we do have the https://reviews.llvm.org/D4234 “LocTrackingOnly” mode which looks like it could be used for Rpass diagnostics, for instance. (& removing use of the AST from the LLVM diagnostic system might help make it more consistent behavior even when doing LTO or other separations between AST parsing and IR transformations)

I haven’t looked too much into the -disable-free stuff, but the reason it mitigates crashes is because if we clear AST objects we still have dangling references to them that we later attempt to clean up unless we -disable-free.

So disabling free is keeping more things alive - perhaps then the RAM savings aren’t as much as they could be if freeing was enabled? But yeah, more to look into.

-disable-free only disables freeing things right before we exit to save some runtime during teardown (since the kernel freeing things is probably faster than the userspace allocator freeing things), so it shouldn’t affect the amount of memory savings we get.

Right right, fair enough. (still worries me a bit - reflects that we might be creating a less-than-library-usable code change, which would be best avoided given all the interesting uses of clang as a library)

But yeah - we do have the https://reviews.llvm.org/D4234 “LocTrackingOnly” mode which looks like it could be used for Rpass diagnostics, for instance. (& removing use of the AST from the LLVM diagnostic system might help make it more consistent behavior even when doing LTO or other separations between AST parsing and IR transformations)

Thanks for the pointer to that. I was getting confused because “-Rpass=foo” was triggering LocTrackingOnly but “-Rpass” wasn’t, which I’ve fixed. So -Rpass is no longer a concern.

So we just have to worry about non-Rpass diagnostics, e.g. -fwarn-stack-size. All of these only use the function name for source locations. We can either just give up and have worse diagnostics for these, meaning no demangling and no source location per diagnostic, or create a side table of these right before codegen as you suggested. For now I’ll go with creating a side table to preserve the status quo, hopefully its construction runtime is not measurable.

Inline asm is special in that it carries around a source location token even in the IR, so we don’t need to go through the AST to find a source location which is nice.

That all sounds pretty good to me - thanks for looking into all these nooks and crannies!

Adding a map of mangled names to source locations (only those emitted in the IR since those are the only ones the backend can see) doesn’t noticeably impact compile time, but does noticeably increase memory usage. Of course, we’ll win that back with clearing the Clang AST.

I found the recently implemented “dontcall” attribute which relies on the Clang AST. I have a draft patch to make it not rely on the Clang AST: https://reviews.llvm.org/D110364.
But that and the changes in https://reviews.llvm.org/D109781 highlight an issue in that we don’t only use the Clang AST to find source locations, we might also use it for other things.

For example, we have custom naming of functions/lambdas/etc when printing out their name in a diagnostic with the Clang AST node. I’ve worked around this by calling LLVM’s demangler, but it doesn’t produce the same naming.

And for the “dontcall” attribute, the initial implementation looked into the Clang AST to determine whether to emit a warning or an error. I’ve worked around this by splitting it into two attributes, one for warning and one for error.
But overall, if we go through with this, we’ll end up not having access to the Clang AST for future backend diagnostics. That’s a tradeoff we’ll have to decide on. Forcing this does make backend diagnostics more likely to be consistent when they don’t have a Clang AST (e.g. ThinLTO post-link compiles, IR as input to Clang).

Adding a map of mangled names to source locations (only those emitted in the IR since those are the only ones the backend can see) doesn’t noticeably impact compile time, but does noticeably increase memory usage. Of course, we’ll win that back with clearing the Clang AST.

I guess most of that is from the string data itself? Any chance that could be shared - by referencing (using StringRef) string data in the llvm::Module or otherwise? (bit more nuanced than that because presumably it’d be a problem if the string data were to go away - if a function is optimized out of an llvm::Module, etc - even if that meant the string value would never be queried for in the map anyway, it’d still violate some map invariants/etc)

I found the recently implemented “dontcall” attribute which relies on the Clang AST. I have a draft patch to make it not rely on the Clang AST: https://reviews.llvm.org/D110364.

But that and the changes in https://reviews.llvm.org/D109781 highlight an issue in that we don’t only use the Clang AST to find source locations, we might also use it for other things.

All the more reason to make the AST deleting mode the default (maybe the only mode, if possible)/usable everywhere - to avoid building new features that rely on the AST. (generally this makes situations like LTO better anyway - since they also break the idea that the AST is available during transformations)

For example, we have custom naming of functions/lambdas/etc when printing out their name in a diagnostic with the Clang AST node. I’ve worked around this by calling LLVM’s demangler, but it doesn’t produce the same naming.

And for the “dontcall” attribute, the initial implementation looked into the Clang AST to determine whether to emit a warning or an error. I’ve worked around this by splitting it into two attributes, one for warning and one for error.
But overall, if we go through with this, we’ll end up not having access to the Clang AST for future backend diagnostics. That’s a tradeoff we’ll have to decide on. Forcing this does make backend diagnostics more likely to be consistent when they don’t have a Clang AST (e.g. ThinLTO post-link compiles, IR as input to Clang).

yep!

Adding a map of mangled names to source locations (only those emitted in the IR since those are the only ones the backend can see) doesn’t noticeably impact compile time, but does noticeably increase memory usage. Of course, we’ll win that back with clearing the Clang AST.

I guess most of that is from the string data itself? Any chance that could be shared - by referencing (using StringRef) string data in the llvm::Module or otherwise? (bit more nuanced than that because presumably it’d be a problem if the string data were to go away - if a function is optimized out of an llvm::Module, etc - even if that meant the string value would never be queried for in the map anyway, it’d still violate some map invariants/etc)

Yeah, reusing strings could have issues. However, if we use a hash of the strings as the keys then for the most part we the memory usage numbers look better.

I found the recently implemented “dontcall” attribute which relies on the Clang AST. I have a draft patch to make it not rely on the Clang AST: https://reviews.llvm.org/D110364.

But that and the changes in https://reviews.llvm.org/D109781 highlight an issue in that we don’t only use the Clang AST to find source locations, we might also use it for other things.

All the more reason to make the AST deleting mode the default (maybe the only mode, if possible)/usable everywhere - to avoid building new features that rely on the AST. (generally this makes situations like LTO better anyway - since they also break the idea that the AST is available during transformations)

For example, we have custom naming of functions/lambdas/etc when printing out their name in a diagnostic with the Clang AST node. I’ve worked around this by calling LLVM’s demangler, but it doesn’t produce the same naming.

And for the “dontcall” attribute, the initial implementation looked into the Clang AST to determine whether to emit a warning or an error. I’ve worked around this by splitting it into two attributes, one for warning and one for error.
But overall, if we go through with this, we’ll end up not having access to the Clang AST for future backend diagnostics. That’s a tradeoff we’ll have to decide on. Forcing this does make backend diagnostics more likely to be consistent when they don’t have a Clang AST (e.g. ThinLTO post-link compiles, IR as input to Clang).

yep!

The tradeoff is a real tradeoff though, we may want to use the Clang AST to provide more specific diagnostics when we have access to the Clang AST/sources. I’m waiting for somebody to object, but if nobody objects then perhaps we’re good.

This has been submitted as https://reviews.llvm.org/D111270, with the corresponding llvm-compile-time-tracker max-rss results.
Please let me know if this causes any issues.

Thanks to David for the reviews!