weak_odr constant versus weak_odr global

Hi,

We recently narrowed down a clang regression to the following testcase:

struct S {
  static const int x;
};
template<typename T> struct U {
  static const int k;
};
template<typename T> const int U<T>::k = T::x;

#ifdef TU1
extern int f();
const int S::x = 42;
int main() {
  return f() + U<S>::k;
}
#endif

#ifdef TU2
int f() { return U<S>::k; }
#endif

/* Steps to repro:

clang repro.cpp -DTU1 -c -o tu1.o
clang repro.cpp -DTU2 -c -o tu2.o
clang tu1.o tu2.o
./a.out

*/

This segfaults, because... in TU1, we get:

  @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4

and in TU2, we get:

  @_ZN1UI1SE1kE = weak_odr global i32 0, align 4

plus a global constructor which writes to @_ZN1UI1SE1kE. The linker then
selects the symbol from TU1, which ends up in a read-only section, resulting
in the binary crashing when TU2 tries to write to it.

Is this a clang bug (should we be generating a weak_odr global, and losing
optimization opportunities in TU1), or is this an llvm bug (should weak_odr
constants be banned from read-only sections, since another module might write
to them)?

Thanks,
Richard

Hi

[Adding cfe-dev to widen the net]

We recently narrowed down a clang regression to the following testcase:

struct S { static const int x; };
template<typename T> struct U { static const int k; };
template<typename T> const int U<T>::k = T::x;

#ifdef TU1
extern int f();
const int S::x = 42;
int main() { return f() + U<S>::k; }
#endif

#ifdef TU2
int f() { return U<S>::k; }
#endif

/* Steps to repro:

clang repro.cpp -DTU1 -c -o tu1.o
clang repro.cpp -DTU2 -c -o tu2.o
clang tu1.o tu2.o
./a.out

*/

This segfaults, because... in TU1, we get:

@_ZN1UI1SE1kE = weak_odr constant i32 42, align 4

and in TU2, we get:

@_ZN1UI1SE1kE = weak_odr global i32 0, align 4

plus a global constructor which writes to @_ZN1UI1SE1kE. The linker then
selects the symbol from TU1, which ends up in a read-only section, resulting
in the binary crashing when TU2 tries to write to it.

Is this a clang bug (should we be generating a weak_odr global, and losing
optimization opportunities in TU1), or is this an llvm bug (should weak_odr
constants be banned from read-only sections, since another module might write
to them)?

I think this is best fixed in llvm, since that gives maximum optimization
opportunities: it's fine to propagate weak_odr constant values, even if some
other module has the same object as a non-constant. Plus, with LTO, we'd want
to merge the weak_odr constant and weak_odr global symbols into a weak_odr
constant (and simultaneously remove any writes into the global).

Thanks,
Richard

Another solution you haven't mentioned is that we could make clang
generate a guard variable initialized to 1 for all relevant static
data members of class templates. The solutions you mention will
require that we can't put static data members of templates into a
read-only section unless we can prove every translation unit will
statically initialize the definition.

-Eli

I think that's an excellent idea. However, we do not seem to be able to
provide a guarantee that the object and its guard variable will be taken from
the same translation unit, so I'm not sure that it can be made to work
reliably.

Richard

Hi Richard,

This segfaults, because... in TU1, we get:

   @_ZN1UI1SE1kE = weak_odr constant i32 42, align 4

and in TU2, we get:

   @_ZN1UI1SE1kE = weak_odr global i32 0, align 4

here one version has an initial value of 42, and the other an initial
value of 0. Doesn't this break the One Definition Rule (the odr in
weak_odr)? I realize that the constantness is the real issue here,
but I thought I should mention this.

Is this a clang bug (should we be generating a weak_odr global, and losing
optimization opportunities in TU1), or is this an llvm bug (should weak_odr
constants be banned from read-only sections, since another module might write
to them)?

Dragonegg has the following comment in the analogous spot:

   // Allow loads from constants to be folded even if the constant has weak
   // linkage. Do this by giving the constant weak_odr linkage rather than
   // weak linkage. It is not clear whether this optimization is valid (see
   // gcc bug 36685), but mainline gcc chooses to do it, and fold may already
   // have done it, so we might as well join in with gusto.

Ciao, Duncan.

The attached patch fixes this issue by treating weak constants as non-constant
when choosing their section kind. This also fixes the LTO case, although we
generate a merged module containing a store to a weak_odr constant, and would
have to be careful not to optimize that to unreachable in the future (though
simply erasing such stores would be fine if the variable is marked odr).

Is this the right solution (and the right implementation of it)?

Thanks,
Richard

weak-const.diff (1.96 KB)

I tried a small variation:

struct S {
static const int x;
};
template<typename T> struct U {
static const int k;
};
template<typename T> const int U<T>::k = T::x;

#ifdef TU1
extern int f();
const int S::x = 42;
int main() {
  return f() + reinterpret_cast<long>(&U<S>::k);
}
#endif
#ifdef TU2
int f() { return U<S>::k; }
#endif

and it crashes with gcc too :frowning: The original testcase works because
gcc folds the value into main even at -O0.

If we are going to support it for real, I think Richard patch is going
on the right direction, but we should completely drop the idea of a
"weak_odr constant". If we cannot use it for instructing the linker to
put it an o ro section, the "constant" gives us nothing that the "odr"
doesn't.

We could always convert a "weak_odr global" into a plain constant in
LTO if the linker tells us we have all the parts.

Cheers,
Rafael

In cases where the C++ standard requires static initialization,
introducing a write violates the guarantees of the C++ standard for
static initialization. Therefore, I'm not sure the whole "make the
constant writable" approach is actually viable.

-Eli

I tried a small variation:

struct S { static const int x;
};
template<typename T> struct U { static const int k;
};
template<typename T> const int U<T>::k = T::x;

#ifdef TU1
extern int f(); const int S::x = 42; int main() { return f() +
reinterpret_cast<long>(&U<S>::k);
}
#endif
#ifdef TU2
int f() { return U<S>::k; } #endif

and it crashes with gcc too :frowning: The original testcase works because gcc
folds the value into main even at -O0.

Indeed, I filed gcc bug#50968 for this last week.

If we are going to support it for real, I think Richard patch is going
on the right direction, but we should completely drop the idea of a "weak_odr
constant". If we cannot use it for instructing the linker to put it an o ro
section, the "constant" gives us nothing that the "odr" doesn't.

Well, a weak_odr constant's value can be propagated in opt, which is an
optimization we should strive to avoid losing. (I'm not sure whether it's a
conforming optimization, though: if we end up dynamically initializing the
variable, we may propagate the constant into places which are required to see
its value being 0 if it's dynamically initialized).

In cases where the C++ standard requires static initialization,
introducing a write violates the guarantees of the C++ standard for static
initialization. Therefore, I'm not sure the whole "make the constant
writable" approach is actually viable.

There is another problem which afflicts all solutions presented thus far, for
the other kind of weak global values in C++. For a static variable defined
within an inline function, we can select the variable plus guard from a TU
with dynamic initialization, and select the function definition from a TU with
static initialization, with the result that the object doesn't get initialized
at all.

I have two new proposals for fixing this, which I believe actually work.

1) [Requires ABI change] We emit dynamic initialization code for weak globals
(even in TUs where static initialization is required to be performed), unless
we can prove that every translation unit will use static initialization. We
emit the global plus its guard variable as a single object so the linker can't
separate them (this is the ABI change). If we can perform static
initialization in any translation unit, then that TU emits a constant weak
object (in .rodata if we want) containing the folded value and with the guard
variable set to 1 (per Eli's proposal).

2) [No ABI change, but ugly and inefficient] We emit dynamic initialization
code for weak globals, unless we can prove that every translation unit will
use static initialization. In TUs where the value can be folded, we emit that
value as a weak_odr constant. The dynamic initialization code works like this:

  if (((char*)guard)[0] == 0) {
    if (__cxa_guard_acquire(guard)) {
      if (object is zero-initialized) {
        try {
          ... initialize object ...
        } catch (...) {
          __cxa_guard_abort(guard);
          throw;
        }
        ... enqueue dtor with __cxa_atexit ...
      }
      __cxa_guard_release(guard);
    }
  }

The zero-initialization check ensures that we never run both the static and
dynamic initialization, unless the static initialization produces the value 0
(which is equivalent to not having performed static initialization, so is OK).

One remaining issue with this is that the object still needs to be writable if
we fold its initializer to 0 -- we can either emit a weak_odr global instead
for such cases, or force weak_odr constants (at least ones with a zero
initializer) out of .rodata.

Thanks,
Richard

The ABI actually suggests doing exactly this, except using multiple
symbols linked with a COMDAT group. Unfortunately, LLVM doesn't
support that COMDAT feature yet, but it could certainly be taught to.
This guarantees correctness as long as every translation unit emits the
code the same way, which is exactly what we'd get from an ABI change,
except without actually breaking ABI conformance.

Mach-O doesn't support anything like COMDAT, but the Darwin linker
apparently gives significantly stronger guarantees about which object
files it will take symbols from, as long as all objects have all of the
symbols.

John.

Hi Richard,

and it crashes with gcc too :frowning: The original testcase works because gcc
folds the value into main even at -O0.

Indeed, I filed gcc bug#50968 for this last week.

are you sure this is the right gcc bug number?

Ciao, Duncan.

1) [Requires ABI change] We emit dynamic initialization code for weak globals
(even in TUs where static initialization is required to be performed), unless
we can prove that every translation unit will use static initialization. We
emit the global plus its guard variable as a single object so the linker can't
separate them (this is the ABI change). If we can perform static
initialization in any translation unit, then that TU emits a constant weak
object (in .rodata if we want) containing the folded value and with the guard
variable set to 1 (per Eli's proposal).

The ABI actually suggests doing exactly this, except using multiple
symbols linked with a COMDAT group. Unfortunately, LLVM doesn't
support that COMDAT feature yet, but it could certainly be taught to.
This guarantees correctness as long as every translation unit emits the
code the same way, which is exactly what we'd get from an ABI change,
except without actually breaking ABI conformance.

I like this. We already have basic support for COMDATs, but yes, it
needs to be extended. So far we just create trivial COMDATs in codegen
for weak objects.

We also need the IL linker itself needs to work on COMDATs too
otherwise this bug would still exist when doing LTO.

In the "extended" example we would output

@_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE

for TU1 and

@_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
...
define internal void @_GLOBAL__I_a() nounwind section ".text.startup"
comdat _ZN1UI1SE1kE {
....
}

for TU2.

Mach-O doesn't support anything like COMDAT, but the Darwin linker
apparently gives significantly stronger guarantees about which object
files it will take symbols from, as long as all objects have all of the
symbols.

John.

Cheers,
Rafael

Unfortunately, making the comdat be for the entire function is not
conformant with the ABI, which says that you either put the variable
and its guard in different comdats or you put them in a single comdat
named for the variable. It also doesn't actually help unless we disable
inlining.

So we still need to emit a guard variable (initialized to 1) into the
comdat for constant-initialized static locals, unless we can somehow
prove to our satisfaction that all translation units don't need this.
And we'd need LLVM to not throw away unused weak_odr globals
that are in a comdat with a used symbol.

John.

Nope.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50986

Jay.

Unfortunately, making the comdat be for the entire function is not
conformant with the ABI, which says that you either put the variable
and its guard in different comdats or you put them in a single comdat
named for the variable. It also doesn't actually help unless we disable
inlining.

I see. Using two comdats would still cause the same problem for us,
no? So the solution in the end is to emit:

TU1:

Exactly.

To sketch out the proposed IR extension a bit more:
1. We add 'comdat "name"' to the global variable and function
productions. I have the COMDAT name in quotes only because
there's no other precedent for a bare identifier in the IR grammar.
I don't think we want to allow this on aliases; I think I could
probably invent reasonable semantics, but it's really not worth
worrying about without cause.
2. A symbol with a COMDAT name must be a definition.
3. All symbols sharing the same COMDAT name are required to
share the same linkage and visibility. Conveniently, this lets us
talk about the COMDAT's linkage / etc.
4. A symbol with a COMDAT name is considered to be referenced
if any symbol with the same COMDAT name is referenced
(ignoring this rule).
5. It's undefined behavior if two modules are linked and they
export different sets of symbols with a given COMDAT name.
6. Otherwise, if two modules are linked and they both export
symbols with a given COMDAT name, all the symbols must be
taken from the same module.

I think that covers it.

The implementation can be optimized around the following
properties of the typical use patterns in the C++ ABI.
a) Most symbols do not need COMDAT names. Or they
don't need "non-trivial" COMDAT names, i.e. COMDAT names
containing other symbols or not matching their own name.
b) When symbols do need COMDAT names, we'll almost
always know exactly how many symbols are going in the group.
That number will usually be two.
c) It's frequently going to be convenient to be able to add
a COMDAT name to a GV after the GV was allocated.
d) Otherwise, symbols will probably never need to change
or remove their COMDAT name, and we probably don't even
need to add API for it.
e) Many clients are going to want to be able to efficiently test
whether a symbol is in a COMDAT group.
f) Those clients will also generally care about efficiently
iterating over all the symbols in that group.

I'd suggest having a bit on GlobalValue and a side-table
on the Module mapping from GVs to COMDAT objects,
where COMDAT objects are allocated as part of the
StringMapEntry on the Module and don't really contain
any data except their name and a list of GV*s, heavily
optimized for the two-element case.

John.

Mostly. How to work the current md node debug implementation into
this will be a bit tricky, but linking debug information is already a
task that needs to be looked at so…

-eric

I see. Using two comdats would still cause the same problem for us,
no? So the solution in the end is to emit:

TU1:
--------------------------------
@_ZN1UI1SE1kE = weak_odr constant i32 42, align 4, comdat _ZN1UI1SE1kE
@_ZGVN1UI1SE1kE = weak_odr global i64 1, comdat _ZN1UI1SE1kE
--------------------------------

TU2:
-----------------------------------
@_ZN1UI1SE1kE = weak_odr global i32 0, align 4, comdat _ZN1UI1SE1kE
@_ZGVN1UI1SE1kE = weak_odr global i64 0, comdat _ZN1UI1SE1kE
...
@llvm.global_ctors = ....
define internal void @_GLOBAL__I_a() nounwind section ".text.startup" ....
-----------------------------------

Restarting a really old thread now that we have comdat support in the IR.

While the above idea would work, there are two problems with it

* Existing compilers (clang and gcc) produce a comdat with just the
constant in TU1. Linking one of those with TU2 can still cause a crash
since the guard variable would be undefined.
* It requires always outputting the guard variable.

Since neither gcc nor clang implement this part of the ABI, I was
thinking if there was a better way to do it. One interesting option is
putting the .init_array of TU2 in the comdat. That is exactly what we
do for windows. In fact, just passing -Xclang -mllvm -Xclang
-enable-structor-comdat will avoids the crash in the above example.

Given that this has been broken since forever, waiting a bit more for
17350 – ld cannot handle .init_array mixed with comdat to be fixed and
then flipping -enable-structor-comdat might be the best way to fix
this. With that done we can add the function and the guard variable in
TU2 to the comdat to remove a bit of bloat (see attached patch).

What is the discussion list for the itanium abi? Should I propose a patch?

Cheers,
Rafael

t.patch (1.34 KB)

Even if this doesn't address the constant vs global crash issues, I think
adding the .init_array entry to the comdat is a great startup time
optimization. Everything continues to work as it does today, but less code
is run during dynamic initialization because all the new TU initializers
get deduplicated by the linker.

It sounds like you are proposing to emit this sort of IR, after the linker
bug is fixed:

// some header:
int f();
template <typename T> struct A { static int var; };
template <typename T> int A<T>::var = f();
template struct A<int>;

New TU1:

Given that this has been broken since forever, waiting a bit more for
17350 – ld cannot handle .init_array mixed with comdat to be fixed and
then flipping -enable-structor-comdat might be the best way to fix
this. With that done we can add the function and the guard variable in
TU2 to the comdat to remove a bit of bloat (see attached patch).

Even if this doesn't address the constant vs global crash issues, I think
adding the .init_array entry to the comdat is a great startup time
optimization. Everything continues to work as it does today, but less code
is run during dynamic initialization because all the new TU initializers get
deduplicated by the linker.

Excellent news: the output from bfd ld was valid. I got confused
because X86_64 uses a Rela relocation and the value in the relocated
location is not used. I reverted the patch disabling the fix and clang
now produces a testcase that doesn't crash :slight_smile:

However, if we wait until we don't care about old object file compatibility,
we can transition to linking new TU1 with this new TU2:
-------
$var = comdat
@var = weak_odr constant i32 42, comdat $var
; no guard, no initializer, no global_ctors entry

Is that what you're thinking?

Yes, in fact that what we get now. So, what do you think of the patch
attached to the previous email? With it we also put the guard variable
and __cxx_global_var_init in the _ZN1UI1SE1kE comdat, saving a bit of
space in the final binary.

What is the discussion list for the itanium abi? Should I propose a patch?

cxx-abi-dev@codesourcery.com

Cool, I will try to subscribe to it.

Cheers,
Rafael