"weak" attribute semantics on const variables

Hi,

with a variable declared (in C) as follows
const char x[] __attribute__ ((weak)) = "X";
clang/llvm will use initializer value for optimizations.

gcc does not.

In clang variables with weak attribute gets “weak” linkage, unless they are const then they get “weak_odr” linkage. (CodeGenModule::getLLVMLinkageForDeclarator)

And “weak” linkage is “interposable” while “weak_odr” isn’t. (GlobalValue::isInterposableLinkage). Allowing optimizations to use the value of the initializer in the “weak_odr” case.

Is this expected/correct behavior?
Is there some specification describing this behavior?

Unfortunately the “weak” attribute isn’t documented in AttrDocs.td. And gcc’s documentation doesn’t help much, but does use the wording “overriding symbol” which definitely alludes to interposition being allowed.

The closest thing to documentation in clang I can find is this comment in the testcase:
https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/global-init.c#L15

// Since this is marked const, it should get weak_odr linkage, since all
// definitions have to be the same.
// CHECK: @d = weak_odr constant i32 0
const int d __attribute__((weak))= 0;

I can easily accept that in case of multiple weak definitions of weak symbol all must have same value (odr).
However my intuition (which may be based on misunderstanding) is that a non-weak (strong) definition of that same symbol does not have to have the same value even if it is const. I would like to believe that the point of having weak symbols is that they provide a sort of default which can be overridden by a different value during linking (and that is how it works for non-const variables (and functions)).

Oh btw, the LangRef says

weak linkage has the same merging semantics as linkonce linkage, except that unreferenced globals with weak linkage may not be discarded. This is used for globals that are declared “weak” in C source code.

Which leaves out the part about it not being used it is a const variable.

The reward for reading this far is an example showing the behavior in compiler explorer: Compiler Explorer

I spoke with Anders about this at the devmtg. I don’t have any recollection of the original patch - it looks like it was back in the days of setting up the MC layer and cleaning up a bunch of linkage issues. It looks like we reject this construct entirely in C++ mode, and I don’t see a reason to preserve this behavior in C mode. I support dropping the "odr’ness’ of the generated global variable.

-Chris

For reference, the patch referred to is weak globals that are const should get weak_odr linkage. · llvm/llvm-project@f49573d · GitHub

We do support this construct in C++ though:
extern const char x[] __attribute__ ((weak)) = "X";
(const makes it internal linkage in C++ adding extern overrides that)

So I guess I’ll have make sure it still ends up as weak_odr in that case for C++. Or should the fact that it is declared extern there also imply that it can be interposed?

Yeah, I think it makes sense for that to be weak_odr in c++ mode. ‘extern’ with an initializer is crazy. :slight_smile:

1 Like

Switching from weak_odr to weak for both C and C++ makes sense to me. It should match the GCC semantics.

Based on the timeline here (2009), I imagine that there were Clang users who were attempting to port Windows code that used __declspec(selectany), which is very similar in function to C++17 inline variables. The definition of a const inline or selectany global variable can be used for optimization purposes, meaning an ODR linkage.

In the year 2022, C++ users now have better, more standards conforming ways to express their intention. We do not need to optimize code using __attribute__((weak)) anymore. I propose we just match GCC here, and drop the ODR linkage bit in both C and C++ modes. Trying to optimize such code is more trouble than it is worth.

Actually, I’m not so sure about GCC semantics any longer when digging deeper into this.

In my initial example const char x[] __attribute__ ((weak)) = "X"; gcc doesn’t use the initializer value for optimization.

But if it is changed to not be an array but rather a plain int like const int x __attribute__ ((weak)) = 42; gcc does actually use it.

And interestingly enough we have a testcase for that very thing. “Check for bug compatibility with gcc”
https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/weak_constant.c
(but can’t find any relevant testcases in gcc test suite, just ones with volatile weak, which is kinda on the other end of the spectrum)

(Updated Compiler Explorer link)

The plot thickens. The commit that added that weak_constant.c testcase ( https://github.com/llvm/llvm-project/commit/a09e0afe748ce5d14ceb7d948a2660f6360c845e ) has a commit message basically repeating the weak/weak_odr llvm linkage semantics “weak - value may change so don’t optimize based on that. If you want to optimize based on it use weak_odr”, which makes sense.

But then it adds that testcase with the “Check for bug compatibility with gcc” without further explanation. I can’t seem to find any bug in gcc’s bugzilla related to this.

To me the existence of that testcase (and its “bug compatibility” comment) suggests two things:

  • (bug) The behavior of using that initializer value is incorrect.
  • (compatibility) Someone is relying on this behavior. Making it important to have a testcase cementing this.

So, it was important in 2009, is it still important?

Another thing that I can’t understand is that @clattner’s commit (const+weak => weak_odr) is from Aug 2009, but the commit that adds weak_constant.c is from Mar 2009, so one would think that that testcase was failing between march and august?

As there is no real spec for this, I did some hunting in the wild, grepping in open source code. I can see two different uses of __attribute__ (weak) in combination with const:

  1. To provide some default/fallback config/identifier which is expected to be overridden by the user of the library.
  2. To avoid duplicate definitions of constants defined in header included in multiple translation units.

The first is the one that bit me, and prompted me to investigate this further. This is what I would like to call “true weak”. Here we use “weak” to allow a “strong” definition to replace it. So we definitely don’t want these type constants to be used for optimization, as they are only constant after linking.

The second one is real constants, and the use of weak attribute is a way to allow same symbol to be merged linktime. But as they are real constants allowing optimization to leverage them make sense.

So it seems in the wild there are two conflicting uses of the weak attribute. One where using initializer value is buggy, and one where it is a helpful optimization.

I’d argue that the linktime replacement case is the real one. The attribute is named “weak” which it is in relation to something else (a “strong” symbol). The other avoid-duplicate-definitions case still works with the optimization opportunity removed.

That said removing that optimization opportunity could mean a performance regression for code relying on this. And I don’t know any alternative to express this in C: “This constant should exist exactly once as a symbol in the program and compiletime individual translation units should allow to use the value for optimization”. Is there a way to express that?

This is not simply an optimization issue. Per the C++ standard, a constant-initialized variable of reference type, or of const-qualified integral or enumeration types, is usable in a constant expression, similar to if it were marked constexpr. E.g. extern const int x = 123; static_assert(x == 123); is valid.

So if extern const int x __attribute__ (weak) = 123; is not considered a constant expression in C++, that is kinda weird. Since it’s a custom attribute, we could define it that way, of course. But, so far, it hasn’t been defined that way. And changing the behavior now could cause existing code to stop building, not just get a performance regression.

@jyknight Very interesting example with static_assert you provided there.

Behavior in C and C++ is different. So I’ll try to be clear about when I’m talking about which langage.

In C++:

GCC seems to agree with you. extern const int x __attribute__ ((weak)) = 123; static_assert (x == 123); compiles.

clang however fails with

error: static_assert expression is not an integral constant expression
note: initializer of weak variable 'x' is not considered constant because it may be different at runtime

That behavior seems to be since 2011 so not exactly new, When constant-folding, don't look at the initializer of a global cons… · llvm/llvm-project@cecf184 · GitHub
@zygoloid

So gcc seems consistent for integral types. It does consider it constant in frontend (so static_assert thinks it is constant), as well as giving that information to backend so the value takes part in optimizations. (or maybe all that happens in frontend in gcc, dunno)

clang/llvm on the other hand seems to be inconsistent. Frontend thinks it is non-constant (so static_assert complains about it not being constant), but marks it as weak_odr, so that backend can use the value for optimization.

A weak non-integral type (e.g extern const int x[] = {123}) is not constant at C++ level (so can’t be used e.g in static_assert), and gcc doesn’t seem to use the initializer value for optimizations, while clang does. For a non-weak gcc uses the initializer value for optimizations.

In C

Neither clang nor gcc considers a constant-initialized variable constant in frontend (guess there is nothing in C standard mandating that, not even for integral types).

But both clang and gcc uses the initializer value of integral typed weak const variables in optimizations. Only clang does it for arrays.


So I’d say that clang’s behavior is inconsistent. Frontend side it considers const weak both for integral and non-integral types as non-constant (for C++). But then it tells backend that they are constant (weak_odr).

gcc is somewhat clearer. Weak const integral type is consider const in frontend and value used for optimization. Non-integral types are not const and not involved in optimazation.

Examples in Compiler Explorer

Oh, interesting, I thought I’d tested the example in Clang too, but I guess I messed up my test. Thanks for the corrections!

I don’t think you’ve described the current behavior quite right either, though. E.g. in this example:

extern const int xa[] __attribute__ ((weak)) = {123};
extern const int y = xa[0];

GCC does emits a constant initializer for y in both C and C++. I think it does consider non-integral weak globals as constant for constant evaluation, as long as it’s not required to enforce language rules around what a valid constant expression is.

I wonder if GCC may actually have precisely the opposite inconsistent behavior than Clang: it looks to me as if GCC may be using the value for weak variables of all types during constant-evaluation, but NOT during optimization.

E.g. as shown with:

extern const int x __attribute__ ((weak)) = 123;

int f() {
    return x;
}

int f2() {
    const int *n = &x;
    return *n;
}

Anyhow, exploration of GCC’s behavior aside…

Given that this is actually the current state in Clang, I retract my original concern – it seems like it’ll probably be okay to make the change from weak_odr to weak LLVM IR linkage for these variables in both C and C++.

Right, I shouldn’t have assumed that “static_assert” behavior is same as when used as initializer.

There are way more nuances here than I anticipated, between clang/gcc, between c/c++ and which context it is considered constant in. Would be a quite big table to map out everything.

I’ll finish my patch that does that, and we’ll see if it gets past review.

My two main worries is the weak_constant.c testcase - someone cared enough about that behavior to cement it in a testcase - and that I’m not aware of any alternative way to express the “constants in a header” case (i.e: constant should exist as real symbol, compiler is allowed to use it for optimization, multiple definitions should be merged linktime).

If anyone finds this thread in the future:
this change landed as [clang] Allow const variables with weak attribute to be overridden · llvm/llvm-project@dd2362a · GitHub / ⚙ D126324 [clang] Allow const variables with weak attribute to be overridden

Thanks!

(In the future, perhaps mention the patches on the discourse earlier so that subscribed folks can possibly chime in the patches before they saw the patched landed :slight_smile: )

1 Like