Eliminating global memory roots (or not) to help leak checkers

[Continuing discussion from https://reviews.llvm.org/D69428]

Llvm is fairly conservative when eliminating global variables (or fields of such) that may point to dynamically allocated memory. This behavior is entirely to help leak checking tools such as Valgrind, Google’s HeapLeakChecker, and LSAN, all of which treat memory that is reachable at exit as “not leaked”, even though it will never be freed. Without these global variables to hold the pointer, the leak checkers can’t determine that it is actually reachable, and will report a leak. Global variables that dynamically allocate memory but don’t clean themselves up are fairly common in the wild, and various leak checkers have long not reported errors.

This behavior was added all the way back in 2012 in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20120625/145646.html.

https://reviews.llvm.org/D69428 removed this behavior, and I subsequently reverted it when many internal Google tests started failing, but I believe many other users who use leak checking will encounter errors when this hits more mainstream releases.

So: What to do?

Preventing a valid transformation (the global variables are never read and can be eliminated) to help the leak checkers leaves some performance and code size on the table. Just how much is unclear.

On the other hand, having leak checkers suddenly start reporting failures where they didn’t before also seems suboptimal. Cleaning this somewhat common scenario up is surprisingly difficult at the user level.

Some possibilities:

  1. Only do this at high optimization levels, say -O3. This would give aggressive users all the performance we can, but also make leak checkers report leaks sometimes, but not others.

  2. Hide it behind a flag or configurable option. Users who care can set it as they prefer. Creates more confusing options, different testing matrices and such, but everyone can get the behaviour that they want.

  3. Do it all the time, and users who encounter issues can clean up their code. Users get the most performance they possibly can, but have to clean up code or drop leak checking. Seems a little user hostile.

Other possibilities?:

What was the original motivation for making the change - given the
gains are unclear? Might help frame what the right path forward is.

If the goal is to experiment to see if making this change is
sufficiently valuable to either regress heap leak checkers or create a
divergence where this is configurable/appears only in certain modes -
then I'd say having a flag (maybe even an internal/cc1 only flag) is
the right first step, enabling data to be gathered to help inform the
design discussion.

Hi,

Most (all?) leak checkers support suppression files. Isn’t that sufficient for your use case?

Marking your leak roots with attribute((used)) is also an alternative.

I understand that leaking memory on purpose happens because it’s expensive to clean it up. But reachable memory may well be a true leak. So flagging it as such is useful. None of us has data about the % of reachable memory that is a true leak, so it’s not possible to argue what’s user friendly/hostile.

Programs that leak memory on purpose are often sophisticated. And sophisticated devs can handle a little bit of extra effort to hide those smarts I think.

Nuno

P.S.: The original patch went in almost a decade ago when the ecosystem was a bit less developed. It was always meant to be temporary.

Hi,

What was the original motivation for making the change - given the
gains are unclear? Might help frame what the right path forward is.

If the goal is to experiment to see if making this change is
sufficiently valuable to either regress heap leak checkers or create a
divergence where this is configurable/appears only in certain modes -
then I’d say having a flag (maybe even an internal/cc1 only flag) is
the right first step, enabling data to be gathered to help inform the
design discussion.

[Continuing discussion from https://reviews.llvm.org/D69428]

Llvm is fairly conservative when eliminating global variables (or fields of such) that may point to dynamically allocated memory. This behavior is entirely to help leak checking tools such as Valgrind, Google’s HeapLeakChecker, and LSAN, all of which treat memory that is reachable at exit as “not leaked”, even though it will never be freed. Without these global variables to hold the pointer, the leak checkers can’t determine that it is actually reachable, and will report a leak. Global variables that dynamically allocate memory but don’t clean themselves up are fairly common in the wild, and various leak checkers have long not reported errors.

This behavior was added all the way back in 2012 in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20120625/145646.html.

https://reviews.llvm.org/D69428 removed this behavior, and I subsequently reverted it when many internal Google tests started failing, but I believe many other users who use leak checking will encounter errors when this hits more mainstream releases.

So: What to do?

Preventing a valid transformation (the global variables are never read and can be eliminated) to help the leak checkers leaves some performance and code size on the table. Just how much is unclear.

On the other hand, having leak checkers suddenly start reporting failures where they didn’t before also seems suboptimal. Cleaning this somewhat common scenario up is surprisingly difficult at the user level.

Some possibilities:

  1. Only do this at high optimization levels, say -O3. This would give aggressive users all the performance we can, but also make leak checkers report leaks sometimes, but not others.

  2. Hide it behind a flag or configurable option. Users who care can set it as they prefer. Creates more confusing options, different testing matrices and such, but everyone can get the behaviour that they want.

  3. Do it all the time, and users who encounter issues can clean up their code. Users get the most performance they possibly can, but have to clean up code or drop leak checking. Seems a little user hostile.

It seems to me that to some extent this change may expose more leaks which aren’t necessarily false positives, which is a net benefit. And in cases that are indeed false positives, shouldn’t the users just add volatile on these pointers (or __attribute__(__used__), but compared to volatile it does not guarantee that the stores won’t be elided I think)?

James points to me that the more universally accepted definition of leaks is not “memory allocated that isn’t deallocated” as I naively thought, but that the memory isn’t reachable at program termination.
Under this definition, it seems like we ought to be more conservative in the optimizer instead, so I retract what I wrote previously :slight_smile:

[Continuing discussion from https://reviews.llvm.org/D69428]

Llvm is fairly conservative when eliminating global variables (or fields of such) that may point to dynamically allocated memory. This behavior is entirely to help leak checking tools such as Valgrind, Google’s HeapLeakChecker, and LSAN, all of which treat memory that is reachable at exit as “not leaked”, even though it will never be freed. Without these global variables to hold the pointer, the leak checkers can’t determine that it is actually reachable, and will report a leak. Global variables that dynamically allocate memory but don’t clean themselves up are fairly common in the wild, and various leak checkers have long not reported errors.

This behavior was added all the way back in 2012 in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20120625/145646.html.

https://reviews.llvm.org/D69428 removed this behavior, and I subsequently reverted it when many internal Google tests started failing, but I believe many other users who use leak checking will encounter errors when this hits more mainstream releases.

So: What to do?

Preventing a valid transformation (the global variables are never read and can be eliminated) to help the leak checkers leaves some performance and code size on the table. Just how much is unclear.

On the other hand, having leak checkers suddenly start reporting failures where they didn’t before also seems suboptimal. Cleaning this somewhat common scenario up is surprisingly difficult at the user level.

Some possibilities:

  1. Only do this at high optimization levels, say -O3. This would give aggressive users all the performance we can, but also make leak checkers report leaks sometimes, but not others.

This could be disabled for Asan by default as it very likely runs with lsan.

  1. Hide it behind a flag or configurable option. Users who care can set it as they prefer. Creates more confusing options, different testing matrices and such, but everyone can get the behaviour that they want.

  2. Do it all the time, and users who encounter issues can clean up their code. Users get the most performance they possibly can, but have to clean up code or drop leak checking. Seems a little user hostile.

I expect this requires significant cleanup effort, and not just in Google. It’s quite a common pattern, but it would be nice to see some real data.

Other possibilities?:

  1. Maybe replace writes into such removed global with no-op callback which can be intercepted by LeakChecker? This will prevent other useful optimizations.

The motivation in ⚙ D69428 [GlobalOpt] Remove valgrind specific hacks (revert r160529) was a function pointer
example. A function pointer should not point to an allocated object, so
ignoring it for root-set semantics is totally fine.

D69428 extended the function pointer cases to non-function-pointer
cases, which can be problematic.

Most (all?) leak checkers support suppression files. Isn’t that sufficient for your use case?

Marking your leak roots with __attribute((used))__ is also an alternative.

I understand that leaking memory on purpose happens because it’s expensive to clean it up. But reachable memory may well be a true leak. So flagging it as such is useful. None of us has data about the % of reachable memory that is a true leak, so it’s not possible to argue what’s user friendly/hostile.

As is, many code patterns in various projects can be affected by the
aggressive optimization. They may use a global pointer referencing an
allocated object as a replacement for a global variable with a
non-trivial destructor ([[clang::no_destroy]] :). Dynamic destruction is
not ordered across translation units, this can lead to all sorts of
static finalization order fiasco problems. If there are threads not
joined at exit time, some threads may access objects which have been
destructed.

In addition, the leak checker may be registered as an atexit callback
instead of running after all destructors have run.
If the leak checker is registered by atexit, normally it runs before
destructors. Even if you have sophisticated destructors which deallocate
objects properly, if you ignore them as roots, the checker will report
false positives.

Don’t really have an opinion on the question as asked, but want to point out an alternate framing. I will comment that the code being removed looks more than a bit fragile.

From a very quick skim of the original code, it appears to be focused on DCE of globals. If we wanted to keep the “leak detector safe” semantic, but allow more aggressive optimization, we could re-imagine this as global SROA. We could deconstruct the struct/array/etc and keep only the fields which could potentially be allocation roots. We could also write an optimization which leverages the knowledge of the allocation root being otherwise unused to eliminate mallocs which are stored into them.

I haven’t fully thought that through, but it seems like we have quite a bit of room to optimize better without changing our handling for the leak detectors.

Philip

I would expect this change to cause a large cleanup and render lsan unusable on any non-trivial code base.
But indeed, it would be nice to see the data.

#1 is undesirable as it will make asan/lsan behave differently depending on the opt level.
We can disable the transformation under asan (which includes lsan),
but it will be harder to disable it under pure lsan (which is a link-time option).

#2 might be the right first step to demonstrate the impact on code size and on lsan.

Suppressions are not a good way to fix this (and generally, suppressions are almost never a good long term answer).
Creating suppressions will take more work than adding attribute((used)), will be harder to maintain,
and will cause true leaks to be lost.

attribute((used)) is not a complete solution either.

imagine a code base which imports a third_party code base verbatim and can’t add attributes there.

Philip’s suggestion is worth investigating.

For globals that are never read and are written to only once, we may have an lsan-specific solution, by calling __lsan_enable/__lsan_disable around the to-be-removed assignment.Globals that are written to multiple times will be harder to handle.

–kcc

Hi Sterling,

I agree with the others: there are better (and more robust ways) to disable leak checkers. This only addresses one narrow case. The code is also super verbose and fragile looking. This is also eliminating an optimization.

I’d recommend dropping the code.

-Chris

There is no problem (no leaks) in the code that users wrote, so adding code annotations (sanitizer suppression file, or attributes) is not a good solution to this issue. The problem is that this optimization introduces a “leak” (from the point of view of the leak checker), which wasn’t there before. And in practice, this seems to cause a large number of false positives.

This can apparently happen simply by having the code which reads from a global variable happen to be unused or optimized away in the current compilation. Users would be effectively randomly annotating variables to work around the compiler, not for any reason apparent in the code they wrote.

I don’t know what the right answer should be – I’ve not looked into it at all. But I don’t think it’s correct to just dismiss this as not-a-problem, even if the current solution is not a good solution.

There may be other ways to disable leak checkers, but they put a burden on users that was not there before. Further, users who try leak checking with and without optimization will get different answers. The bug report will read: “clang at -O2 makes my program leak”. And, as James notes, whether or not you need to suppress the leak depends on whether or not the optimizer does away with the variable. Subtle changes to the code that have nothing to do with memory allocation will appear to add or fix leaks. That is not something I would care to explain or document.

Although the code could definitely be improved, the comments that it is brittle or fragile seems overstated. Google uses this code without issue against against many millions of lines of code every day, in tens-of-thousands of targets, and has since 2012. It may be ugly, but it does work. We are open to improving it, if it is only the ugliness that is of concern.

I am working on quantifying the benefit the change gives, but I won’t have results until tomorrow.

In order to understand how much benefit this change gives to code size, I built clang and related files with and without the patch, both with CMAKE_BUILD_TYPE=Release.

clang itself gets about 0.4% smaller (from 145217124 to 144631078)
lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)

Spot checking a few other random targets, in general, it seems that the benefits depend heavily on the coding style, but roughly, bigger the binary, the less benefit this brings.

I suspect that a more aggressive pass as described by Philip Reames could capture a significant part of the benefit without sacrificing functionality. But that would require a more detailed study to be sure.

+Chris Lattner for the last two updates as well. I really don’t think this optimization is getting the overall win that it’s thought to and breaking roughly every kind of leak checker mechanism I’ve seen.

-eric

There is no problem (no leaks) in the code that users wrote, so adding code annotations (sanitizer suppression file, or attributes) is not a good solution to this issue. The problem is that this optimization introduces a "leak" (from the point of view of the leak checker), which wasn't there before. And in practice, this seems to cause a large number of false positives.

I think that “from the point of view of the leak checker” is the key thing there.

Code that this triggers *is* leaking memory, it was just silenced because the leak was spuriously reachable from a global. Global variables aren’t a preferred way to silence leak detectors, they have other ways to do so :slight_smile:

There may be other ways to disable leak checkers, but they put a burden on users that was not there before. Further, users who try leak checking with and without optimization will get different answers. The bug report will read: "clang at -O2 makes my program leak". And, as James notes, whether or not you need to suppress the leak depends on whether or not the optimizer does away with the variable. Subtle changes to the code that have nothing to do with memory allocation will appear to add or fix leaks. That is not something I would care to explain or document.

I can see that concern, but this isn’t a battle that we can win: optimizations in general can expose leaks.

IMO, If someone doesn’t want the global to be removed, they should mark it volatile. If they do want it removable, then they can use leak detector features to silence the warning.

Hello,
Seems like long and interesting thread. I would love to learn what problem were facing here. Maybe im ignorant enough to try to tackle it.

Famous quote: I could make great violin player, i never tried it.

Haha __attribute(immortal) on global variable.

Stupid man from poland would say its volatile keyword abuse but maybe minor and what do i know.

Can someone briefly describe key problematic code, what component fails and how do we want to workaround it? I could study pros and cons and maybe come up with set of proposals.

Best regards,
Pawel Kunio

pt., 23.04.2021, 01:29 użytkownik Chris Lattner via llvm-dev <llvm-dev@lists.llvm.org> napisał:

>
> There is no problem (no leaks) in the code that users wrote, so adding code annotations (sanitizer suppression file, or attributes) is not a good solution to this issue. The problem is that this optimization introduces a "leak" (from the point of view of the leak checker), which wasn't there before. And in practice, this seems to cause a large number of false positives.

I think that “from the point of view of the leak checker” is the key thing there.

Code that this triggers *is* leaking memory, it was just silenced because the leak was spuriously reachable from a global. Global variables aren’t a preferred way to silence leak detectors, they have other ways to do so :slight_smile:

"spuriously" reachable is quite questionable here. This way of writing
code is to make the allocation quite intentionally reachable from a
global.

For instance, one of the ways to disable global destruction: (
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
)

"If all else fails, you can create an object dynamically and never
delete it by using a function-local static pointer or reference (e.g.,
static const auto& impl = *new T(args...);)."

(all else fails in a large enough codebase rather often... )

+Chris Lattner to make sure Chris sees this response too.

Have any of the proponents of the patch tried lsan against a reasonable large code base both with and with out it?

+Chris Lattner <clattner@nondot.org> for the last two updates as well. I
really don't think this optimization is getting the overall win that it's
thought to and breaking roughly every kind of leak checker mechanism I've
seen.

-eric

A sanitizer maintainer has contributed a LeakSanitizer patch to ensure
we don't regress (⚙ D100906 [lsan] Test to show lsan dependency on globals).
(The test can test the false positive under valgrind which would be
introduced by the original GlobalOpt patch (D69428) as well.)

Just few days ago, I used this pattern to suppress a leak in the in-tree
project LLDB. (⚙ D100806 [lldb] Fix one leak in reproducer).

To bring back some potentially lost optimization, we can try making the
GlobalOpt heuristic smarter, e.g. recognize function pointer types
(as I suggested in the very first few comments on D69428).
Then there is also Philip's suggestion to improve global variable SROA.

A simple code removal does not pull its weight and can cause large
friction to various downstream projects.