[RFC] __attribute__((internal_linkage))

Hello,

Can I ask for some clarification?

Apparently, C++ doesn't allow to static class methods?

While in C one might write inside a header file:

static inline void mylib_foo_do_method(struct mylib_foo *foo) {
   // implementation here
}

In C++ the typical style would be to write something like:

namespace mylib {

void foo::do_method(void)
{
  // implementation here
}
}

Unfortunately, the C++ case then implies some linkage behaviour that
some might not want.

Apparently, one can't do things like:

namespace mylib {

static void foo::do_method(void)
{
  // implementation here
}
}

Or,

namespace {
namespace mylib {

static void foo::do_method(void)
{
  // implementation here
}
}
}

Also, if you wanted to an attribute to a whole namespace you should do
something like the following I think:

namespace mylib {
[[clang::internal_linkage]];

static void foo::do_method(void)
{
  // implementation here
}
}

Thank you,
Steven Stewart-Gallus

Sounds right. The proposed attribute is a rough equivalent of

static void foo::do_method(void)
{
  // implementation here
}

where "static" does not mean a static class member, but makes the
declaration local to the translation unit. C-style "static".

The patch in http://reviews.llvm.org/D13925 (waiting for review, btw!)
supports this attribute on classes and namespaces, with this syntax:

namespace __attribute__((internal_linkage)) {
}

I haven't been able to figure out from this thread why this attribute is necessary at all.

Anonymous or unnamed namespaces were added to C++ for this very purpose, but the ISO C++ Standard does not discuss "linkage" per-se because it is outside the scope of the language specification. But it does describes it in terms of having a hidden name which is "unique" to the translation unit, and under the "as if" rule, internal linkage is a common convention for achieving this.

This is just Standardese for dealing with what compiler implementers typically handle as "internal linkage" anyway; at the same time, the use of 'static' for this purpose was deprecated. This is specifically stated in C++98 section 7.3.1.1 and unnamed namespaces are still similarly defined in C++17 (Working Paper) although the specific reference to the deprecation of 'static' for this purpose is now gone.

The closest the Standard gets to talking about linkage is with Linkage Specifications, and even here it tries to avoid to avoid treading on the definitions things like internal versus external linkage.

So I am curious, what does this proposed attribute on namespaces achieve that cannot already be achieved with an anonymous or unnamed namespace?

Thanks,

  Martin O'Riordan - Movidius Ltd.

Hi Stewart,

I saw this get brought up at the LLVM devmtg in the context of improving the stack size of libc++ frames.

Have you guys considered a different approach to solving this problem? In the case of libc++, the _LIBCPP_INLINE_VISIBILITY macro currently expands out to "__attribute__ ((__always_inline__))”. It seems reasonable to me to have it also add the “nodebug” attribute as well. This should allow the allocas generated by inlining lots of frames to be promoted to registers (because there is no debug info to pessimize).

This would dramatically shrink the stack frames of code using libc++ at -O0, and would also make it go a lot faster.

-Chris

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of Chris
Lattner via llvm-dev
Sent: Saturday, October 31, 2015 2:58 PM
To: sstewartgallus00@mylangara.bc.ca
Cc: LLVM Developers
Subject: Re: [llvm-dev] [cfe-dev] [RFC] __attribute__((internal_linkage))

Hi Stewart,

I saw this get brought up at the LLVM devmtg in the context of improving
the stack size of libc++ frames.

Have you guys considered a different approach to solving this problem? In
the case of libc++, the _LIBCPP_INLINE_VISIBILITY macro currently expands
out to "__attribute__ ((__always_inline__))”. It seems reasonable to me
to have it also add the “nodebug” attribute as well. This should allow
the allocas generated by inlining lots of frames to be promoted to
registers (because there is no debug info to pessimize).

Are you suggesting that 'nodebug' should affect code generation?
It most definitely should not...
--paulr

Sorry, I totally missed updates to this thread.

Anonymous namespace does not work for libc++ because we need to hide
some class members, and export the rest. Namespace would hide all of
them. AFAIK, "static" is un-deprecated in C++11 for multiple reasons,
including this one.

I agree that nodebug should not affect codegen. It would also kill
debug info (ex. all debug locations in libc++ headers).

With respect only to '__attribute__((internal_linkage))', not 'nodebug' and other parts of this topic; does hiding "some" members and not others not introduce a violation of the ODR because some members of the class as it appears in one translation unit are not the same actual definitions as the apparently "same" members of the class in another translation unit? In particular the ODR requirement that would ensure that the address of a method was the same regardless of where the address was taken, including 'static' methods (no 'this' pointer)? Or hidden things like the guard variables for local static objects, default argument initialisers (which are instanced), etc.?

I must admit that having spent years trying to lock down the definition of the ODR, that this seems to fly in the opposite direction of the intent of that rule.

I can (sort of) see why for particular reasons of locality you might want multiple actual definitions, even though they might exhibit the same behaviour, but maybe the answer for locality optimisation would be for the linker technology to evolve so that multiple copies of the same definition (sic) could be made; perhaps even at the Basic Block level [Block Level Linking]. But this too would lie under the auspices of the "as if" rule and respect the ODR.

There is something I am missing in this that is not making me at all comfortable about the use of this attribute for namespaces (apart from the reopening issues raised by others), and in particular the application to parts of a class but not the whole class. I just feel that this is leading to semantic trouble.

Thanks,

  MartinO

In fact, during the code review we've decided against allowing this
attribute on namespaces: http://reviews.llvm.org/D13925

The intended use of this attribute in libc++ is on functions that
programs are not allowed to take address of, like all methods of
standard library classes. Visibility("hidden") attribute (which we are
replacing with internal_linkage) violates ODR the same way.

Also, this attribute (for us) is not about locality or performance.
Its primary purpose is to reliably exclude a class method from the
export table. This is the way libc++ manages ABI stability: all class
methods that are not part of the library ABI are defined in the
headers with a set of attributes that ensure that they become internal
to the translation unit. Always_inline does that in almost all cases,
but almost is not good enough. Internal_linkage is a straighforward
way to achieve the same.

Thanks Evgenii,

I hadn't realised that you had decided not to apply the attribute to namespaces, this certainly eases my mind about the issue.

And thanks also for explaining more clearly the rationale behind its application to symbols that are not intended to be exported.

  MartinO

Hi Evgenii,

I was wondering what the status is of your work to attach “internal_linkage” to methods of standard library classes in libc++. The following piece of code doesn’t link because a symbol (string::empty) is undefined and it sounds like your work might fix the linkage error I’m seeing.

$ cat test1.cpp

#include
#include

int main() {
using namespace std::placeholders;
auto b = std::bind(&std::string::empty, _1);
}

Hi,

it's been put on hold for other higher priority work. I still want to
get it done one day, but it's not clear when that would be.

OK, thanks.

One more question: which part of the standard says you aren’t allowed to take the addresses of methods of classes in the standard library?

Hi Evgenii,

I have one question about this (planned) change: what if a function is not inlined? The linker will not ODR merge them with this change, which isn’t great.

What makes “internal” linkage more desirable than "linkonce_odr + visibility hidden"?

The status quo isn't "linkonce_odr + visibility_hidden". The status quo is "linkonce_odr + visibility_hidden + always_inline". I guess you're suggesting removing the always_inline?

I believe the "always_inline" is protecting against some objects being built separately from (and with a slightly different libc++ than) others, which is common in the case of static archives. Using "internal" gives the same protection.

Aside from address-taken functions (where the address of functions will depend on the translation unit, instead of causing a link error), I think "internal" is a strict improvement over the status quo. Changing to "linkonce_odr + hidden" would be more fragile, although a possible tradeoff for some vendors.

Even with internal linkage, the linker will have the opportunity to de-duplicate the functions if they are optimized the same way.

+llvm-dev

The status quo isn't "linkonce_odr + visibility_hidden". The status quo is "linkonce_odr + visibility_hidden + always_inline". I guess you're suggesting removing the always_inline?

I believe the "always_inline" is protecting against some objects being built separately from (and with a slightly different libc++ than) others, which is common in the case of static archives. Using "internal" gives the same protection.

Aside from address-taken functions (where the address of functions will depend on the translation unit, instead of causing a link error), I think "internal" is a strict improvement over the status quo. Changing to "linkonce_odr + hidden" would be more fragile, although a possible tradeoff for some vendors.

Even with internal linkage, the linker will have the opportunity to de-duplicate the functions if they are optimized the same way.

It will? Don't the addresses still have to be distinct? (I could pass a pointer to an internal linkage function across a TU boundary to some other TU where it should compare inequal to an equivalent but distinct internal linkage function in that other TU, I would think?)

(& do any of our common linkers do that deduplication (rightly (by adding little thunks) or wrongly (well, MSVC does, but possibly with coordination from the compiler with the "ICF" feature))?)

ld64 does de-duplication, but I'm not entirely sure of when.

+cfe-dev, bcc:llvm-dev (added back the wrong list I think?)

+llvm-dev

The status quo isn't "linkonce_odr + visibility_hidden". The status quo is "linkonce_odr + visibility_hidden + always_inline". I guess you're suggesting removing the always_inline?

I believe the "always_inline" is protecting against some objects being built separately from (and with a slightly different libc++ than) others, which is common in the case of static archives. Using "internal" gives the same protection.

Aside from address-taken functions (where the address of functions will depend on the translation unit, instead of causing a link error), I think "internal" is a strict improvement over the status quo. Changing to "linkonce_odr + hidden" would be more fragile, although a possible tradeoff for some vendors.

Even with internal linkage, the linker will have the opportunity to de-duplicate the functions if they are optimized the same way.

It will? Don't the addresses still have to be distinct? (I could pass a pointer to an internal linkage function across a TU boundary to some other TU where it should compare inequal to an equivalent but distinct internal linkage function in that other TU, I would think?)

(& do any of our common linkers do that deduplication (rightly (by adding little thunks) or wrongly (well, MSVC does, but possibly with coordination from the compiler with the "ICF" feature))?)

ld64 does de-duplication, but I'm not entirely sure of when.

I looked into de-duplication more and talked to Nick. Here is the current state on Darwin:

1. If a function in LLVM has "linkonce_odr" linkage and its address is never taken, LLVM emits it with a special flag called "autohide".

2. After coalescing symbols, ld64 will internalize any that was marked with "autohide". This optimization reduces the number of weak symbols. (Since (a) the address wasn't taken, (b) the symbol is discardable, and (c) it's following the ODR rule, there is no semantic change from internalizing it.)

3. ld64 then de-duplicates any of these "autohidden" symbols by content. (Since the compiler only adds "autohide" to symbols whose address isn't taken, there is no semantic change from de-duplication.)

In theory, we could abuse the "autohide" flag by adding it to "internal" symbols, and teach ld64 to de-duplicate those. It's not clear if the link-time overhead would be worth the benefit in general, but it's technically feasible.

As a side note, this corresponds to the recently introduced function attribute “ local_unnamed_addr”, and I think lld is doing this “internalization” with LTO now using this attribute.

+cfe-dev, bcc:llvm-dev (added back the wrong list I think?)

+llvm-dev

The status quo isn't "linkonce_odr + visibility_hidden". The status quo is "linkonce_odr + visibility_hidden + always_inline". I guess you're suggesting removing the always_inline?

I believe the "always_inline" is protecting against some objects being built separately from (and with a slightly different libc++ than) others, which is common in the case of static archives. Using "internal" gives the same protection.

Aside from address-taken functions (where the address of functions will depend on the translation unit, instead of causing a link error

By splitting _LIBCPP_INLINE_VISIBILITY, I think we can work around this.

1. Leave _LIBCPP_INLINE_VISIBILITY unchanged.

2. #define _LIBCPP_INTERNAL_VISIBILITY __attribute__((internal_linkage))

3. Leave externally callable free functions marked with _LIBCPP_INLINE_VISIBILITY (no change). These functions rarely have significant code.

4. Change non-external API (i.e., starting with `__`) to use _LIBCPP_INTERNAL_VISIBILITY.

Results:

- Now the external API calls are the same as the status quo (always_inline + hidden). If you don't take their address, they'll (most likely) be inlined. If you do take their address, they'll get coalesced and address comparisons will work within the same linkage-unit.

- The internal functions will only get inlined if the optimizer thinks it's a good idea. Taking their address from two different translation units will give you a different answer... but users aren't allowed to call internal API anyway.

- Assuming an all-knowing, benevolent inliner, code size and performance should strictly improve over the status quo, with no C++ fragility regressions.

- We can consider, as a follow-up, replacing all uses of _LIBCPP_INLINE_VISIBILITY with _LIBCPP_HIDDEN_VISIBILITY. (I.e., do we need "always_inline" on external API? Or do we just need to protect against C++ fragility of internal functions?)

- If that follow-up is rejected, then: if an external API function is non-trivial, we can change it to a wrapper of an internal function to effectively outline it.

- Long term, if an internal function is never inlined because of large code size (and a copy-per-translation unit is big), we can consider designing a stable interface and moving it to the dylib. Also, we can try to convince linker/object-file owners to de-dup `local_unnamed_addr` functions with internal linkage.

Hi all,

Hi all,

+cfe-dev, bcc:llvm-dev (added back the wrong list I think?)

+llvm-dev

The status quo isn't "linkonce_odr + visibility_hidden". The status quo is "linkonce_odr + visibility_hidden + always_inline". I guess you're suggesting removing the always_inline?

I believe the "always_inline" is protecting against some objects being built separately from (and with a slightly different libc++ than) others, which is common in the case of static archives. Using "internal" gives the same protection.

Aside from address-taken functions (where the address of functions will depend on the translation unit, instead of causing a link error

By splitting _LIBCPP_INLINE_VISIBILITY, I think we can work around this.

1. Leave _LIBCPP_INLINE_VISIBILITY unchanged.

2. #define _LIBCPP_INTERNAL_VISIBILITY __attribute__((internal_linkage))

3. Leave externally callable free functions marked with _LIBCPP_INLINE_VISIBILITY (no change). These functions rarely have significant code.

4. Change non-external API (i.e., starting with `__`) to use _LIBCPP_INTERNAL_VISIBILITY.

Just as a clarification (seems relevant as I'm rereading this), I'm not suggesting that we add _LIBCPP_INTERNAL_VISIBILITY to functions that are already in the ABI, such as std::vector::__push_back_slow_path. Only to change the internal-API function currently using _LIBCPP_INLINE_VISIBILITY to use _LIBCPP_INTERNAL_VISIBILITY...

Results:

- Now the external API calls are the same as the status quo (always_inline + hidden). If you don't take their address, they'll (most likely) be inlined. If you do take their address, they'll get coalesced and address comparisons will work within the same linkage-unit.

- The internal functions will only get inlined if the optimizer thinks it's a good idea. Taking their address from two different translation units will give you a different answer... but users aren't allowed to call internal API anyway.

- Assuming an all-knowing, benevolent inliner, code size and performance should strictly improve over the status quo, with no C++ fragility regressions.

... otherwise we'd see code size regressions here.

- We can consider, as a follow-up, replacing all uses of _LIBCPP_INLINE_VISIBILITY with _LIBCPP_HIDDEN_VISIBILITY. (I.e., do we need "always_inline" on external API? Or do we just need to protect against C++ fragility of internal functions?)

- If that follow-up is rejected, then: if an external API function is non-trivial, we can change it to a wrapper of an internal function to effectively outline it.

- Long term, if an internal function is never inlined because of large code size (and a copy-per-translation unit is big), we can consider designing a stable interface and moving it to the dylib. Also, we can try to convince linker/object-file owners to de-dup `local_unnamed_addr` functions with internal linkage.

The thread died by itself after this proposal.
Should we move forward with this?

I'm in favour.