RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

Hi all,

I'd like to make a proposal to implement the new vtable ABI described in
PR26723, which I'll call the relative ABI. That bug gives more details and
justification for that ABI.

The user interface for the new ABI would be that -fwhole-program-vtables
would take an optional value indicating which aspects of the program have
whole-program scope. For example, the existing implementation of whole-program
vcall optimization allows external code to call into translation units
compiled with -fwhole-program-vtables, but does not allow external code to
derive from classes defined in such translation units, so you could request
the current behaviour with "-fwhole-program-vtables=derive", which means
that derived classes are not allowed from outside the program. To request
the new ABI, you can specify "-fwhole-program-vtables=call,derive",
which means that calls and derived classes are both not allowed from
outside the program. "-fwhole-program-vtables" would be short for
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".

I'll also make the observation that the new ABI does not require LTO or
whole-program visibility at compile time; to decide whether to use the new
ABI for a class, we just need to check that it and its bases are not in the
whole-program-vtables blacklist.

At the same time, I'd like to change how virtual calls are represented in
the IR. This is for a few reasons:

1) Would allow whole-program virtual call optimization to work well with the
   relative ABI. This ABI would complicate the IR at call sites and make it
   harder to do matching and rewriting.

2) Simplifies the whole-program virtual call optimization pass. Currently we
   need to walk uses in the IR in order to determine the slot and callees for
   each call site. This can all be avoided with a simpler representation.

3) Would make it easier to implement dead virtual function stripping. This would
   involve reshaping any vtable initializers and rewriting call
   sites. Implementing this correctly is harder than it needs to be because
   of the current representation.

My proposal is to add the following new intrinsics:

i32 @llvm.vtable.slot.offset(metadata, i32)

This intrinsic takes a bitset name B and an offset I. It returns the byte
offset of the I'th virtual function pointer in each of the vtables in B.

i8* @llvm.vtable.load(i8*, i32)

This intrinsic takes a virtual table pointer and a byte offset, and loads
a virtual function pointer from the virtual table at the given offset.

i8* @llvm.vtable.load.relative(i8*, i32)

This intrinsic is the same as above, but it uses the relative ABI.

{i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
{i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)

These intrinsics would be used to implement CFI. They are similar to the
unchecked intrinsics, but if the second element of the result is non-zero,
the program may call the first element of the result as a function pointer
without causing an indirect function call to any function other than one
potentially loaded from one of the constant globals of which %name is a member.

To minimize the impact on existing passes, the intrinsics would be lowered
early during the regular pipeline when LTO is disabled, or early in the LTO
pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset
intrinsic when LTO is disabled, as bitset information would be unavailable.

To give the optimizer permission to reshape vtable initializers for a
particular class, the vtable would be added to a special named metadata node
named 'llvm.vtable.slots'. The presence of this metadata would guarantee
that all loads beyond a given byte offset (this range would not include the
RTTI pointer for example) are done using the above intrinsics.

We will also take advantage of the ABI break to split the class's virtual
table group at virtual table boundaries into separate globals instead of
emitting all virtual tables in the group into a single global. This will
not only simplify the implementation of dead virtual function stripping,
but also reduce code size overhead for CFI. (CFI works best if vtables for
a base class can be laid out near vtables for derived class; the current
ABI makes this harder to achieve.)

Example (using the relative ABI):

struct A {
  virtual void f();
  virtual void g();
};

struct B {
  virtual void h();
};

struct C : A, B {
  virtual void f();
  virtual void g();
  virtual void h();
};

void fcall(A *a) {
  a->f();
}

void gcall(A *a) {
  a->g();
}

typedef void (A::*mfp)();

mfp getmfp() {
  return &A::g;
}

void callmfp(A *a, mfp m) {
  (a->*m)();
}

In IR:

@A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)}
@B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
@C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
@C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}

define void @fcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define void @gcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define {i8*, i8*} @getmfp() {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %slotp1 = add %slot, 1
  %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
  ret {i8*, i8*} %result
}

define @callmfp(%A* %a, {i8*, i8*} %m) {
  ; assuming the call is virtual and no this adjustment
  %slot = extractvalue i8* %m, 0
  %slotm1 = sub %slot, 1
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

!0 = {!"A", @A_vtable, 16}
!1 = {!"B", @B_vtable, 16}
!2 = {!"A", @C_vtable0, 16}
!3 = {!"B", @C_vtable1, 16}
!4 = {!"C", @C_vtable0, 16}
!llvm.bitsets = {!0, !1, !2, !3, !4}

!5 = {@A_vtable, 16}
!6 = {@B_vtable, 16}
!7 = {@C_vtable0, 16}
!8 = {@C_vtable1, 16}
!llvm.vtable.slots = {!5, !6, !7, !8}

Thanks,

Using relative offsets applies to more than just vtables. It would do wonders for constant strings too.

– Sean Silva

That's true, and in fact I found that many of the remaining dynamic relocations
in Chromium were for constant strings and data structures referencing them
(e.g. [1]). Unfortunately I couldn't see a good general way to transform a
program to use relative offsets automatically, as I imagine we would need to
consider every use of the data structure in the program. Possible in theory,
but very difficult to get right.

Peter

[1] https://code.google.com/p/chromium/codesearch#chromium/src/net/base/mime_util.cc&l=64&gs=cpp:net::kPrimaryMappings@chromium/../../net/base/mime_util.cc%257Cdef&gsn=kPrimaryMappings&ct=xref_usages

Based on discussion with John McCall in PR26723, I’d like to change the
user interface for -fwhole-program-vtables, and introduce an interface
specifically to enable the relative ABI. That interface would be based on a
whitelist rather than a blacklist, and together with
-fwhole-program-vtables would enable devirtualization, virtual const prop,
and virtual function stripping for those classes.

The new user interface is as follows:

We would introduce two new attributes, [[clang::unstable_abi]] and
[[clang::stable_abi]], which would be attached to a class and would enable
or disable the unstable ABI for that class. It is an ODR violation to use
[[clang::unstable_abi]] in two translation units compiled with different
versions of Clang (we may consider extending the object format to allow a
linker to diagnose this). Specifically, mixing different head revisions or
major releases is not allowed, but mixing different point releases is fine.
The attribute __declspec(uuid()) (which is used for COM classes on Windows)
would imply [[clang::stable_abi]].

A “dynamic-introducing” class is a class that declares new virtual member
functions or virtual bases, and has no dynamic bases or virtual bases. A
class that is dynamic but not dynamic-introducing would use the same ABI as
its dynamic base classes. The compiler will diagnose if a class has two or
more dynamic bases with different ABIs, or if the bases have a different
ABI from the one explicitly specified by an attribute.

The ABI for a dynamic-introducing class is determined from the attribute,
or if the class has no attribute, from the following flags:

-funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable
C++ ABI for all classes (the idea being that -funstable-c++-abi would also
cover any unrelated ABI breaks we may want to make in future).
-funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for
dynamic-introducing classes specified in the file at PATH.
The -fwhole-program-vtables-blacklist flag would be removed, and I'm no
longer proposing that -fwhole-program-vtables would take a value. The
whole-program blacklist would be replaced by either inference based on
visibility or a new [[clang::no_whole_program]] attribute.

It is effectively an ODR violation to define a class that uses the unstable
ABI in a translation unit compiled with a different set of
-funstable-c++-abi* flags. It is also a violation for a linkage unit other
than the one compiled with -fwhole-program-vtables to define any of the
classes that use the unstable ABI.

The format of the file is a series of lines ending in either * or **.
Preceding that is a namespace specifier delimited by double-colons followed
by “::”, or the empty string to denote the global namespace. Each entry in
the list indicates that dynamic-introducing classes in that namespace,
including nested classes, classes defined in enclosed anonymous namespaces,
and classes defined within member functions of those classes, use the
unstable ABI. If the line ends in “*” this applies to the given namespace
only, while if the line ends in “**” it applies to the given namespace and
any enclosed namespaces.

In Chromium for example, the contents of the file would look like this:

Hi all,

I'd like to make a proposal to implement the new vtable ABI described in
PR26723, which I'll call the relative ABI. That bug gives more details and
justification for that ABI.

The user interface for the new ABI would be that -fwhole-program-vtables
would take an optional value indicating which aspects of the program have
whole-program scope. For example, the existing implementation of whole-program
vcall optimization allows external code to call into translation units
compiled with -fwhole-program-vtables, but does not allow external code to
derive from classes defined in such translation units, so you could request
the current behaviour with "-fwhole-program-vtables=derive", which means
that derived classes are not allowed from outside the program. To request
the new ABI, you can specify "-fwhole-program-vtables=call,derive",
which means that calls and derived classes are both not allowed from
outside the program. "-fwhole-program-vtables" would be short for
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".

I'll also make the observation that the new ABI does not require LTO or
whole-program visibility at compile time; to decide whether to use the new
ABI for a class, we just need to check that it and its bases are not in the
whole-program-vtables blacklist.

At the same time, I'd like to change how virtual calls are represented in
the IR. This is for a few reasons:

1) Would allow whole-program virtual call optimization to work well with the
  relative ABI. This ABI would complicate the IR at call sites and make it
  harder to do matching and rewriting.

2) Simplifies the whole-program virtual call optimization pass. Currently we
  need to walk uses in the IR in order to determine the slot and callees for
  each call site. This can all be avoided with a simpler representation.

3) Would make it easier to implement dead virtual function stripping. This would
  involve reshaping any vtable initializers and rewriting call
  sites. Implementing this correctly is harder than it needs to be because
  of the current representation.

My proposal is to add the following new intrinsics:

Thanks, I'm really glad you're moving forward on improving the IR representation so fast after our previous discussion. The use of these intrinsics looks a lot more friendly to me! :slight_smile:
(even if I still does not make sense of the "bitset" terminology to represent the hierarchy for the metadata part)

i32 @llvm.vtable.slot.offset(metadata, i32)

This intrinsic takes a bitset name B and an offset I. It returns the byte
offset of the I'th virtual function pointer in each of the vtables in B.

i8* @llvm.vtable.load(i8*, i32)

Why is the vtable.load taking a byte offset instead of a slot index directly? (the IR could be simpler by not requiring to call @llvm.vtable.slot.offset() for every @llvm.vtable.load())

I decided to split these in order to support virtual member function
pointers correctly. In the Itanium ABI, member function pointers use a byte
offset. The idea is that llvm.vtable.slot.offset would be used to create a
member function pointer, while llvm.vtable.load would be used to call it
(see also the getmfp and callmfp examples).

Thanks,
Peter

Make sense! Thanks.

There's a subtlety about this aspect of the ABI that I should call
attention to. The virtual function references can only be resolved directly
by the static linker if they are defined in the same executable/DSO as the
virtual table. I expect this to be the overwhelmingly common case, as
classes are normally wholly defined within a single executable or DSO, so
our implementation should be optimized around that case.

If we expected cross-DSO references to be relatively common, we could make
vtable entries be relative to GOT entries, but that would introduce an
additional level of indirection and additional relocations, probably
costing us more in binary size and memory bandwidth than the current ABI.

However, it is technically possible to split the implementation of a
class's virtual functions between DSOs, and there are more practical cases
where we might expect to see cross-DSO references:

- one DSO could derive from a class defined in another DSO, and only
override some of its virtual functions
- the vtable could contain a reference to __cxa_pure_virtual which would be
defined by the standard library

We can handle these cases by having the vtable refer to a PLT entry for
each function that is not defined within the module. This can be done by
using a specific type of relative relocation that refers directly to the
symbol if defined within the current module, or to a PLT entry if not. This
is the same type of relocation that is needed to implement relative
branches on x86, so I'd expect it to be generally available on that
architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH,
COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1],
which is essentially the same thing as a PLT entry). It is also present on
ARM (R_ARM_PREL31, which was apparently added to support unwind tables).

We still need some way to create PLT relocations in the vtable's
initializer without breaking the semantics of a load from the vtable.
Rafael and I discussed this and we believe that if the target function is
unnamed_addr, this indicates that the function's address isn't observable
(this is true for virtual functions, as it isn't possible to take their
address), and so it could be substituted with the address of a PLT entry.

One complication is that on ELF the linker will still create a PLT entry if
the symbol has default visibility, in order to support symbol
interposition. We can mitigate against that by using protected visibility
for virtual functions if they would otherwise receive default visibility.

Thanks,
Peter

[1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305

Hi all,

I'd like to make a proposal to implement the new vtable ABI described in
PR26723, which I'll call the relative ABI. That bug gives more details and
justification for that ABI.

The user interface for the new ABI would be that -fwhole-program-vtables
would take an optional value indicating which aspects of the program have
whole-program scope. For example, the existing implementation of
whole-program
vcall optimization allows external code to call into translation units
compiled with -fwhole-program-vtables, but does not allow external code to
derive from classes defined in such translation units, so you could
request
the current behaviour with "-fwhole-program-vtables=derive", which means
that derived classes are not allowed from outside the program. To request
the new ABI, you can specify "-fwhole-program-vtables=call,derive",
which means that calls and derived classes are both not allowed from
outside the program. "-fwhole-program-vtables" would be short for
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".

Based on discussion with John McCall in PR26723, I’d like to change the
user interface for -fwhole-program-vtables, and introduce an interface
specifically to enable the relative ABI. That interface would be based on a
whitelist rather than a blacklist, and together with
-fwhole-program-vtables would enable devirtualization, virtual const prop,
and virtual function stripping for those classes.

The new user interface is as follows:

We would introduce two new attributes, [[clang::unstable_abi]] and
[[clang::stable_abi]], which would be attached to a class and would enable
or disable the unstable ABI for that class. It is an ODR violation to use
[[clang::unstable_abi]] in two translation units compiled with different
versions of Clang (we may consider extending the object format to allow a
linker to diagnose this). Specifically, mixing different head revisions or
major releases is not allowed, but mixing different point releases is fine.
The attribute __declspec(uuid()) (which is used for COM classes on Windows)
would imply [[clang::stable_abi]].

A “dynamic-introducing” class is a class that declares new virtual member
functions or virtual bases, and has no dynamic bases or virtual bases. A
class that is dynamic but not dynamic-introducing would use the same ABI as
its dynamic base classes. The compiler will diagnose if a class has two or
more dynamic bases with different ABIs, or if the bases have a different
ABI from the one explicitly specified by an attribute.

The ABI for a dynamic-introducing class is determined from the attribute,
or if the class has no attribute, from the following flags:

-funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable
C++ ABI for all classes (the idea being that -funstable-c++-abi would also
cover any unrelated ABI breaks we may want to make in future).
-funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for
dynamic-introducing classes specified in the file at PATH.
The -fwhole-program-vtables-blacklist flag would be removed, and I'm no
longer proposing that -fwhole-program-vtables would take a value. The
whole-program blacklist would be replaced by either inference based on
visibility or a new [[clang::no_whole_program]] attribute.

It is effectively an ODR violation to define a class that uses the
unstable ABI in a translation unit compiled with a different set of
-funstable-c++-abi* flags. It is also a violation for a linkage unit other
than the one compiled with -fwhole-program-vtables to define any of the
classes that use the unstable ABI.

The format of the file is a series of lines ending in either * or **.
Preceding that is a namespace specifier delimited by double-colons followed
by “::”, or the empty string to denote the global namespace. Each entry in
the list indicates that dynamic-introducing classes in that namespace,
including nested classes, classes defined in enclosed anonymous namespaces,
and classes defined within member functions of those classes, use the
unstable ABI. If the line ends in “*” this applies to the given namespace
only, while if the line ends in “**” it applies to the given namespace and
any enclosed namespaces.

In Chromium for example, the contents of the file would look like this:

*
app::**
base::**
browser::**
[...]
wm::**
zip::**

Wouldn't we want to say "use custom ABI for everything non-exported"
instead of manually tagging everything?

The problem with that is that system headers aren't necessarily tagged as
non-exported. We have a way of identifying system headers, but it seems
like a bad idea to implicitly generate different code for system headers
without a good reason.

One other approach I considered was a "whitelist minus blacklist" approach,
where the file would look like this:

This seems like the best way to handle it.

It would be nice if this could be requested of an arbitrary function without having to rely on it being unnamed_addr — that is, it would be nice to have “the address of an unnamed_addr function in this linkage unit which is equivalent when called to this other function that’s not necessary within this linkage unit”. It’s easy to make an unnamed_addr wrapper function, and maybe we could teach the backend to peephole that to a PLT function reference, but (1) that sounds like some serious backend heroics and (2) it wouldn’t work for variadic functions.

To clarify, this is just a performance problem and doesn’t actually break semantics or feasibility, right?

John.

@A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
@A::g - (@A_vtable + 16)}

There's a subtlety about this aspect of the ABI that I should call
attention to. The virtual function references can only be resolved directly
by the static linker if they are defined in the same executable/DSO as the
virtual table. I expect this to be the overwhelmingly common case, as
classes are normally wholly defined within a single executable or DSO, so
our implementation should be optimized around that case.

If we expected cross-DSO references to be relatively common, we could make
vtable entries be relative to GOT entries, but that would introduce an
additional level of indirection and additional relocations, probably
costing us more in binary size and memory bandwidth than the current ABI.

However, it is technically possible to split the implementation of a
class's virtual functions between DSOs, and there are more practical cases
where we might expect to see cross-DSO references:

- one DSO could derive from a class defined in another DSO, and only
override some of its virtual functions
- the vtable could contain a reference to __cxa_pure_virtual which would
be defined by the standard library

We can handle these cases by having the vtable refer to a PLT entry for
each function that is not defined within the module. This can be done by
using a specific type of relative relocation that refers directly to the
symbol if defined within the current module, or to a PLT entry if not. This
is the same type of relocation that is needed to implement relative
branches on x86, so I'd expect it to be generally available on that
architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH,
COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1],
which is essentially the same thing as a PLT entry). It is also present on
ARM (R_ARM_PREL31, which was apparently added to support unwind tables).

We still need some way to create PLT relocations in the vtable's
initializer without breaking the semantics of a load from the vtable.
Rafael and I discussed this and we believe that if the target function is
unnamed_addr, this indicates that the function's address isn't observable
(this is true for virtual functions, as it isn't possible to take their
address), and so it could be substituted with the address of a PLT entry.

This seems like the best way to handle it.

It would be nice if this could be requested of an arbitrary function
without having to rely on it being unnamed_addr — that is, it would be nice
to have “the address of an unnamed_addr function in this linkage unit which
is equivalent when called to this other function that’s not necessary
within this linkage unit”. It's easy to make an unnamed_addr wrapper
function, and maybe we could teach the backend to peephole that to a PLT
function reference, but (1) that sounds like some serious backend heroics
and (2) it wouldn’t work for variadic functions.

Yes, that seems like far too much for what is needed here.

One complication is that on ELF the linker will still create a PLT entry
if the symbol has default visibility, in order to support symbol
interposition. We can mitigate against that by using protected visibility
for virtual functions if they would otherwise receive default visibility.

To clarify, this is just a performance problem and doesn’t actually break
semantics or feasibility, right?

Performance and code size (I suspect that all the extra PLT entries would
negate most of the binary size savings associated with the relative ABI).
It doesn't break semantics of regular C++ programs, as long as you stick to
the ODR. It can only break things if a program relies on ELF symbol
interposition to replace a DSO's implementation of a virtual function with
something else (which I can't really imagine being very common at all,
given that ELF is the only object format that could support this). I think
the right response to that is "sorry, if you need symbol interposition,
then go use the platform ABI".

Thanks,

Be very careful with using protected visibility, since recent binutils
version have completely broken a bunch of basic use cases of protected
symbols :frowning:

Joerg

It’s a shame, though. We play a similar trick with GOT entries, and it ends up being quite elegant, at least in IR. I think there are similar problems with trying to eliminate the fake symbol in the backend, though.

I mean, I’ve never really liked ELF’s stance on symbol interposition, but taking it as given, I’m not sure I agree that it’s reasonable to carve out virtual functions as a general exception.

Isn’t there some global flag to make (non-weak?) definitions use protected visibility by default? Wouldn’t using that be an overall performance win anyway?

John.

No, you can link with -Bsymbolic, but that creates it own nasty set of
surprises.

Joerg

I mean, I’ve never really liked ELF’s stance on symbol interposition, but
taking it as given, I’m not sure I agree that it’s reasonable to carve out
virtual functions as a general exception.

Given that LLVM does IPO (inlining, funcattrs) on symbols that can be
interposed on ELF, we already don't support interposability very well:
https://llvm.org/bugs/show_bug.cgi?id=23501

Adding another exception for virtual functions, especially under an
off-by-default flag, doesn't seem like a big deal.

Isn’t there some global flag to make (non-weak?) definitions use protected
visibility by default? Wouldn’t using that be an overall performance win
anyway?

Joerg mentioned -Bsymbolic, but I'm surprised that I don't see people use
-Bsymbolic-functions more often. C++ has the ODR, so we can say that all
definitions of the same (C++!) function must be equivalent, and it doesn't
matter which you call. Programs very rarely rely on function pointer
identity across DSOs, and it already isn't preserved on a number of
platforms (MSVC does ICF). Typically only global data needs to be coalesced
across DSOs, so that things like static data members of templates and
static locals in inline functions work.

Okay, so, it sounds to me like LLVM basically treats strong definitions as protected, then. Should we just formalize that?

I guess the proposal here would be:

  1. Remove protected visibility from LLVM. (Or deprecate it as equivalent to default.)
  2. Teach the ELF emitter to emit non-hidden strong definitions using ELF protected visibility.
  3. Teach the ELF emitter to emit non-hidden weak definitions using ELF default visibility (unless they’re unnamed_addr?).

GCC must do some inlining between strong definitions. Do you know what their rule is? Is it just a C++ thing?

John.

Okay, so, it sounds to me like LLVM basically treats strong definitions as
protected, then. Should we just formalize that?

I guess the proposal here would be:
  1. Remove protected visibility from LLVM. (Or deprecate it as
equivalent to default.)
  2. Teach the ELF emitter to emit non-hidden strong definitions using ELF
protected visibility.
  3. Teach the ELF emitter to emit non-hidden weak definitions using ELF
default visibility (unless they’re unnamed_addr?).

LLVM definitely thinks that strong definitions are "ODR", so we could do 2.
I'm sure it would break somebody, somewhere though. We probably don't want
to do 1 and 3, so that users can still mark their weak definitions as
protected if they want to.

GCC must do *some* inlining between strong definitions. Do you know what

their rule is? Is it just a C++ thing?

GCC only inlines strong definitions when -fPIC is disabled, meaning the
code will be linked into the main executable. In this case, no
interposition can occur because dynamic symbol search always starts in the
executable. They also have -fno-semantic-interposition, which allows
inlining strong definitions:

I wouldn't go that far. At least no for now.

It is true that given llvm's view of what it can inline, producing ELF
files with protected visibility would seem reasonable. The problem is
that they don't work well in combination with copy relocations, so we
cannot just produce protected everywhere.

But, there are cases where they are useful, so we should be able to
propagate __attribute__((visibility,("protected"))) to the object
file.

Cheers,
Rafael

Yes, but what I think John was getting at with 3 could work (modulo
possible breakage, as you mention): "Teach the ELF emitter to emit
non-hidden weak unnamed_addr definitions using ELF protected visibility".

Thanks,