Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?

Hi,

r151033 and r151037 marked a few dump() methods as LLVM_ATTRIBUTE_USED, and over time this inspired marking several other dump() methods to be marked as such too (see e.g. 178400, 182186, 159790, 173548).

I understand that having the dump() methods available in the debugger is useful, but these annotations prevent the dump() methods from being dead-stripped, and they end up keeping lots of code alive. For example, clang-format depends on ASTDumper, TypePrinter, StmtVisitor and related stuff solely for these dump methods.

Since binaries now get dead-stripped, this leads to measurable bloat: clang-format goes from 1.7 MB to 1.2 MB if I remove the LLVM_ATTRIBUTE_USEDs on the 17 dump methods in include/clang – an almost 30% reduction.

Does it make sense to only mark dump methods as LLVM_ATTRIBUTE_USED if !NDEBUG? (If so, I’m thankful for naming suggestions for this new macro – LLVM_ATTRIBUTE_USED_IN_DEBUG comes to mind, but is too long. Also, should this be an LLVM macro or a clang macro?)

Thanks,
Nico

I think what makes sense is to not have these methods in !NDEBUG. This is
the way dump methods are being written in LLVM these days:

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  void some_helper_function_only_used_by_dump() const;
  void LLVM_ATTRIBUTE_NOINLINE LLVM_ATTRIBUTE_USED dump() const;
#endif

At least, this is how SROA.cpp does it, and I think it has been suggested
to do it consistently in this way.

My preference would be to:

a) Make this pattern easier to reproduce where needed.
b) Try to make existing dump code use it more regularly throughout both
LLVM and Clang to fix just the problem you've identified.

Oh, and a concrete suggestion of what I think would be easier:

#if LLVM_ENABLE_DUMP
  void some_helper_function_only_used_by_dump() const;
  void LLVM_DUMP_METHOD dump() const;
#endif

I've no better name for "LLVM_DUMP_METHOD" though, it's a terrible name.
Anything you come up with will likely be better. =]

I’d say an llvm macro of LLVM_ATTTRIBUTE_DEBUG_USED - if that’s still too long probably abbreviate attribute to attr.

Ah, good point bringing up dump() in llvm! Looking through ` ack -C 3 -i
dump include/` in llvm, it looks like none of the dump methods in llvm are
marked as LLVM_USED – maybe it's reasonable that people who want to call
this from gdb during debugging insert that locally manually?

Argyrios, would just removing all the LLVM_ATTRIBUTE_USEDs on dump methods
in clang be fine with you? That sounds like the simples change, and it
would be consistent with what's done in llvm.

Are you suggesting that one should add ‘LLVM_ATTRIBUTE_USED’ locally and rebuild if s/he wants to dump something in the debugger ?

We should be consistent across LLVM but having the dump methods available at least for !NDEBUG, as Chandler suggested, makes sense for me.

Hi,

r151033 and r151037 marked a few dump() methods as LLVM_ATTRIBUTE_USED,
and over time this inspired marking several other dump() methods to be
marked as such too (see e.g. 178400, 182186, 159790, 173548).

I understand that having the dump() methods available in the debugger is
useful, but these annotations prevent the dump() methods from being
dead-stripped, and they end up keeping lots of code alive. For example,
clang-format depends on ASTDumper, TypePrinter, StmtVisitor and related
stuff solely for these dump methods.

Since binaries now get dead-stripped, this leads to measurable bloat:
clang-format goes from 1.7 MB to 1.2 MB if I remove the
LLVM_ATTRIBUTE_USEDs on the 17 dump methods in include/clang – an almost
30% reduction.

Does it make sense to only mark dump methods as LLVM_ATTRIBUTE_USED if
!NDEBUG?

I think what makes sense is to not have these methods in !NDEBUG. This is
the way dump methods are being written in LLVM these days:

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  void some_helper_function_only_used_by_dump() const;
  void LLVM_ATTRIBUTE_NOINLINE LLVM_ATTRIBUTE_USED dump() const;
#endif

At least, this is how SROA.cpp does it, and I think it has been suggested
to do it consistently in this way.

My preference would be to:

a) Make this pattern easier to reproduce where needed.
b) Try to make existing dump code use it more regularly throughout both
LLVM and Clang to fix just the problem you've identified.

Ah, good point bringing up dump() in llvm! Looking through ` ack -C 3 -i
dump include/` in llvm, it looks like none of the dump methods in llvm are
marked as LLVM_USED – maybe it's reasonable that people who want to call
this from gdb during debugging insert that locally manually?

Argyrios, would just removing all the LLVM_ATTRIBUTE_USEDs on dump methods
in clang be fine with you? That sounds like the simples change, and it
would be consistent with what's done in llvm.

Are you suggesting that one should add ‘LLVM_ATTRIBUTE_USED’ locally and
rebuild if s/he wants to dump something in the debugger ?

Yes.

We should be consistent across LLVM but having the dump methods available
at least for !NDEBUG, as Chandler suggested, makes sense for me.

With "available", do you mean "marked LLVM_ATTRIBUTE_USED"? If we remove
LLVM_ATTRIBUTE_USED, dump() could be compiled in both release and debug,
which is nice in a way (and it's unlikely someone will accidentally add a
call to them from live code).

I don’t like this at all, I use the dump methods in the debugger all the time, I don’t want to perpetually have local changes in order to use them.

Not sure I’m following, what is the benefit if the dump methods are always stripped, even in debug build ? Do you care about code-bloat in debug builds ?
How about just do your original simple suggestion for a “LLVM_ATTRIBUTE_USED_IN_DEBUG” macro ?

I very strongly second this. I would be in complete opposition to any proposal which has the effect of worsening the out-of-box debugging experience. I understand and support your desire to reduce size in Release builds. Can you spell out a case for such reduction in Debug builds? Maybe there’s something I’m missing here. Philip

I'd like to re-iterate my concrete suggestion. I think that *all* of the
dump methods in LLVM and Clang should be consistently defined as such:

#if LLVM_ENABLE_DUMP
  void some_helper_function_only_used_by_dump() const;
  LLVM_DUMP_METHOD void dump() const;
#endif

And then I would provide the following defines in Compiler.h or whatever
header seems appropriate:

#define LLVM_ENABLE_DUMP !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
#define LLVM_DUMP_METHOD LLVM_ATTRIBUTE_NOINLINE LLVM_ATTRIBUTE_USED

This ensures that dump methods are always available in assert builds, and
can be enabled explicitly in no-assert builds as needed (LLVM_ENABLE_DUMP
comes from configure or cmake), and when dump methods are disabled both
those methods and everything they use gets pruned out, with no warnings
about unused private methods etc.

What doesn't work with this approach?

I very strongly second this. I would be in complete opposition to any
proposal which has the effect of worsening the out-of-box debugging
experience.

I understand and support your desire to reduce size in Release builds.
Can you spell out a case for such reduction in Debug builds? Maybe there's
something I'm missing here.

I'd like to re-iterate my concrete suggestion. I think that *all* of the
dump methods in LLVM and Clang should be consistently defined as such:

#if LLVM_ENABLE_DUMP
  void some_helper_function_only_used_by_dump() const;
  LLVM_DUMP_METHOD void dump() const;
#endif

And then I would provide the following defines in Compiler.h or whatever
header seems appropriate:

#define LLVM_ENABLE_DUMP !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
#define LLVM_DUMP_METHOD LLVM_ATTRIBUTE_NOINLINE LLVM_ATTRIBUTE_USED

This ensures that dump methods are always available in assert builds, and
can be enabled explicitly in no-assert builds as needed (LLVM_ENABLE_DUMP
comes from configure or cmake), and when dump methods are disabled both
those methods and everything they use gets pruned out, with no warnings
about unused private methods etc.

What doesn't work with this approach?

I think it's a good suggestion, and it's what we should do.

(Not having the attribute at all seemed simpler, but the feedback is clear
that that doesn't work for some folks.)

Unless someone doesn't like Chandler's suggestion (and assuming he doesn't
beat me to it :slight_smile: ) I'll implement this proposal tomorrow afternoon.

Glad to see we reached a conclusion which works for everyone. Thanks for seeing this through.

Philip

HI Nico,

I've attached the fix we had for this in our product branch. As far as I can tell the other posts on this thread are missing the problem and just patching some of the symptoms.

The bug was that attribute 'used' was applied in public headers instead of the function definitions, causing missed dead stripping opportunities that were further compounded by inlining.

The commit message says it resolves two issues, "Missing dump() symbols embedding LLVM SDK" and "Debug functions included in release" so I'm guessing your use case is covered:

Only link dump() functions in debug builds

LLVM_DEBUG_HELPER marks debug functions definitions as 'used' when !NDEBUG or
LLVM_ENABLE_DUMP is defined, otherwise leaves them to be dead stripped by the
linker.

The macro also unconditionally marks the functions 'noinline' to (1) ensure
they're available to debuggers in debug builds and (2) assist linker dead
stripping in release builds.

The patches haven't yet been pushed upstream because they introduce an NDEBUG going against the LLVM 3.5 library-friendly headers goal.

In practice though I've checked and the macro isn't evaluated in any header file, plus it's a net LoC win, so I'm OK to land this if it works for you.

Alp.

debug-helper-clang.patch (11.5 KB)

debug-helper-llvm.patch (2.07 KB)

HI Nico,

I've attached the fix we had for this in our product branch. As far as I
can tell the other posts on this thread are missing the problem and just
patching some of the symptoms.

The bug was that attribute 'used' was applied in public headers instead of
the function definitions, causing missed dead stripping opportunities that
were further compounded by inlining.

I'm not sure why this is (necessarily) a bug? I can hypothesize instances
where it is a bug, but maybe you can explain the particular pattern you're
imagining?

The commit message says it resolves two issues, "Missing dump() symbols

embedding LLVM SDK" and "Debug functions included in release" so I'm
guessing your use case is covered:

Only link dump() functions in debug builds

LLVM_DEBUG_HELPER marks debug functions definitions as 'used' when !NDEBUG
or
LLVM_ENABLE_DUMP is defined, otherwise leaves them to be dead stripped by
the
linker.

The macro also unconditionally marks the functions 'noinline' to (1) ensure
they're available to debuggers in debug builds and (2) assist linker dead
stripping in release builds.

The patches haven't yet been pushed upstream because they introduce an
NDEBUG going against the LLVM 3.5 library-friendly headers goal.

I still don't understand or agree with trying to preclude the use of NDEBUG
in header files. While theoretically it would be nice for NDEBUG to not be
an ABI break for LLVM, that comes at a high cost and I've not seen a
compelling argument for paying that cost.

That said, I dislike this approach. I prefer actually *removing* the
functions in non-asserts builds because then we will get early warnings if
we miss some, and we don't have to rely on linker dead stripping to remove
all of them.

It also ensures that we don't grow accidental dependencies on functions
that were only ever intended to be used inside of assert() and in dump()
methods. Historically, this has happened quite a few times, and has taken
someone some time to track down a compile time performance regression and
separate out the functionality.

So, I would still prefer the pattern I suggested earlier. Also, this was
discussed previously on the llvm mailing list and reached much the same
conclusion.

    HI Nico,

    I've attached the fix we had for this in our product branch. As
    far as I can tell the other posts on this thread are missing the
    problem and just patching some of the symptoms.

    The bug was that attribute 'used' was applied in public headers
    instead of the function definitions, causing missed dead stripping
    opportunities that were further compounded by inlining.

I'm not sure why this is (necessarily) a bug? I can hypothesize instances where it is a bug, but maybe you can explain the particular pattern you're imagining?

    The commit message says it resolves two issues, "Missing dump()
    symbols embedding LLVM SDK" and "Debug functions included in
    release" so I'm guessing your use case is covered:

    Only link dump() functions in debug builds

    LLVM_DEBUG_HELPER marks debug functions definitions as 'used' when
    !NDEBUG or
    LLVM_ENABLE_DUMP is defined, otherwise leaves them to be dead
    stripped by the
    linker.

    The macro also unconditionally marks the functions 'noinline' to
    (1) ensure
    they're available to debuggers in debug builds and (2) assist
    linker dead
    stripping in release builds.

    The patches haven't yet been pushed upstream because they
    introduce an NDEBUG going against the LLVM 3.5 library-friendly
    headers goal.

I still don't understand or agree with trying to preclude the use of NDEBUG in header files. While theoretically it would be nice for NDEBUG to not be an ABI break for LLVM, that comes at a high cost and I've not seen a compelling argument for paying that cost.

I'm not sure what the cost is that you're talking about.

clang has been free of NDEBUG in headers since r196739, and the commits were a neat reduction in complexity touching no more than a couple dozen lines of code. If you remember the original patch was thousands of lines it's remarkable how light this ended up being.

The plugin API can now be used independently of configuration with one last pending patch to config.h.in on the LLVM side. That's a massive step forward for getting documented and supported features usable in the real world.

Before this, you had to rebuild everything, even third party plugins, just to debug an LLVM issue and track down problems. That's not practical -- even if you have a build farm, sometimes you just don't have all plugins and their dependencies and original build environments available.

So, It's great if you yourself just use LLVM / clang internally and don't have to deal with end users, embedding and plugins but these are all supported clang use cases that were broken previously and now more or less fixed.

I don't use the Makefile build system or the Apple buildit script but I appreciate others use them and even try to help out with their maintenance from time to time, certainly never blocking the people who actually do the work in areas I'm not active in :slight_smile:

Besides which, the whole debate seems moot because I actually suggested introducing carefully considered NDEBUG violation that temporarily goes against the 3.5 goal.

That said, I dislike this approach. I prefer actually *removing* the functions in non-asserts builds because then we will get early warnings if we miss some, and we don't have to rely on linker dead stripping to remove all of them.

That's very intrusive and goes beyond the scope of Nico's original problem.

The patches I've posted are incremental, solve the original bug by moving attributes from declaration to definition which was just a bug. They lower the line count and binary size and eliminate ifdef tangle without changing the way everyone uses dump() functions day to day.

That sets the bar pretty high for an alternative solution.

Alp.

That said, I dislike this approach. I prefer actually *removing* the functions in non-asserts builds because then we will get early warnings if we miss some, and we don't have to rely on linker dead stripping to remove all of them.

The benefit of relying on the linker is that a project can easily use any particular dump/print function from the llvm/clang libraries that it sees fit (e.g. for logging purposes) and since the linker has the ultimate knowledge of what gets used or not, it "just works" without unnecessary code-bloat.
For example, a project may just want to use "Decl::dump(raw_ostream &OS)" for logging; with your scheme it'd need to configure & build specially for this and accept the code bloat from _all_ dump functions across llvm/clang, just to use this one function.

But we don't need to hypothesize, we can see a similar example in clang with '-ast-dump'.
Do you really want to disable this option on release builds ? I would much prefer not, I've used it frequently enough and I'd like to continue to be able to use it from our release builds.

It also ensures that we don't grow accidental dependencies on functions that were only ever intended to be used inside of assert() and in dump() methods. Historically, this has happened quite a few times, and has taken someone some time to track down a compile time performance regression and separate out the functionality.

Could you elaborate with a specific example ? Was someone ever printing something by accident ?
I think this could apply to invariant-checking functions, but for dump/print functionality I honestly don't think that there is much "danger" in practice.

Ok, sounds like it’s not clear if folks want to have dump methods behind #ifdef !NDEBUG. Maybe we don’t have to resolve that part in this thread

Is there any opposition to replacing the attributes on just the dump methods with LLVM_DUMP_METHOD? That’s a subset of Chandler’s change and I think also what Alp’s diff is.

(I don’t understand the “This is a problem with 3.5’s goal of not using NDEBUG in headers” – I read the thread about NDEBUG 1, and it seems to be mostly about abi compat – and a) clang itself isn’t deadstripped itself because plugins b) if stuff in a core binary itself calls a dump method it won’t be stripped anyhows c) the thread at 1 sounded more about abi stuff like mvars and virtual functions, there’s no clear support for the assert stuff as far as I can tell. So just varying the presence or absence of attribute((used)) based on NDEBUG sounds fine to me.)

Ok, sounds like it's not clear if folks want to have dump methods behind
#ifdef !NDEBUG. Maybe we don't have to resolve that part in this thread

Is there any opposition to replacing the attributes on just the dump
methods with LLVM_DUMP_METHOD? That's a subset of Chandler's change and I
think also what Alp's diff is.

(I don't understand the "This is a problem with 3.5's goal of not using
NDEBUG in headers" – I read the thread about NDEBUG [1], and it seems to be
mostly about abi compat – and a) clang itself isn't deadstripped itself
because plugins b) if stuff in a core binary itself calls a dump method it
won't be stripped anyhows c) the thread at [1] sounded more about abi stuff
like mvars and virtual functions, there's no clear support for the assert
stuff as far as I can tell. So just varying the presence or absence of
attribute((used)) based on NDEBUG sounds fine to me.)

…which is effectively Alp's patch I suppose. So maybe we could land that
for now and then discuss whether to put dump() methods in !NDEBUG at some
other point?

>
> That said, I dislike this approach. I prefer actually *removing* the
functions in non-asserts builds because then we will get early warnings if
we miss some, and we don't have to rely on linker dead stripping to remove
all of them.

The benefit of relying on the linker is that a project can easily use any
particular dump/print function from the llvm/clang libraries that it sees
fit (e.g. for logging purposes) and since the linker has the ultimate
knowledge of what gets used or not, it "just works" without unnecessary
code-bloat.
For example, a project may just want to use "Decl::dump(raw_ostream &OS)"
for logging; with your scheme it'd need to configure & build specially for
this and accept the code bloat from _all_ dump functions across llvm/clang,
just to use this one function.

But we don't need to hypothesize, we can see a similar example in clang
with '-ast-dump'.
Do you really want to disable this option on release builds ? I would much
prefer not, I've used it frequently enough and I'd like to continue to be
able to use it from our release builds.

Sorry, terminology is getting a bit hard here. I wasn't at all referring to
the functionality of -ast-dump, or methods such as the one you mention
which might legitimately be part of the C++ API that a particular type
intends to expose. The AST is a great example here -- clearly it should
provide ways for library users to print it out in the same nice textual
form as used by -ast-dump.

I'm referring to the more narrow use of the term "dump method" which is
specifically a method that accepts no arguments and prints as much state to
stderr as might conceivably be useful from inside gdb. Decl::dump() is just
such a function, but Decl::dump(raw_ostream &OS) is different. The somewhat
consistent pattern in LLVM is to name the latter functions 'print' to avoid
confusion, but that's neither here nor there.

My stance is essentially that the things which are only used either a) in
GDB or b) inside of !NDEBUG regions (be they assert()s or the LLVM DEBUG
macro) are the things which should be compiled away when NDEBUG (or the
opt-in macro that several folks asked for in LLVM-land to get dump methods
in no-asserts builds) and should be marked as noinline and used to make
them easy to reach via the debugger.

When these routines can be implemented using the existing useful APIs like
in the case of Decl, awesome. In other cases, there are no real users of
the printing infrastructure and so all of it can be hidden in this way.

>
> It also ensures that we don't grow accidental dependencies on functions
that were only ever intended to be used inside of assert() and in dump()
methods. Historically, this has happened quite a few times, and has taken
someone some time to track down a compile time performance regression and
separate out the functionality.

Could you elaborate with a specific example ? Was someone ever printing
something by accident ?
I think this could apply to invariant-checking functions, but for
dump/print functionality I honestly don't think that there is much "danger"
in practice.

It was a traversal function. Specifically, it looked like a nice constant
time query function, but actually did linear work under the hood. That was
totally fine (and better than the complexity of other alternatives) when it
was used as a helper to implement dump functionality. But it got re-used a
few times, and ended up inside of an existing linear algorithm and caused
us to have N^2 explosions every now and then.

Still, not the end of the world. It required a bug, some test cases, and
some easy time with the profiler. But when it was suggested to hide all of
the helper functions the same way as the dump methods in LLVM, making these
compile errors was a really nice result.

I'd like to actually see the patch rather than talk in the hypothetical.
Maybe I misunderstood, but I thought it also moved the attribute to the
definition? I don't understand why that's relevant yet (which is why maybe
I've misunderstood what was happening). I've asked what the bug was that
this addressed. My reading of the spec for the attribute seems to indicate
that there is no difference....