Weak symbol/alias semantics

Hi Mehdi, Peter and David (and anyone else who sees this),

I’ve been playing with some examples to handle the weak symbol cases we discussed in IRC earlier this week in the context of D28523. I was going to implement the support for turning aliases into copies in order to enable performing thinLTOResolveWeakForLinkerGUID on both aliases and aliasees, as a first step to being able to drop non-prevailing weak symbols in ThinLTO backends.

I was wondering though what happens if we have an alias, which may or may not be weak itself, to a non-odr weak symbol that isn’t prevailing. In that case, do we eventually want references via the alias to go to the prevailing copy (in another module), or to the original copy in the alias’s module? I looked at some examples without ThinLTO, and am a little confused. Current (non-ThinLTO) behavior in some cases seems to depend on opt level.

Example:

$ cat weak12main.c
extern void test2();
int main() {
test2();
}

$ cat weak1.c
#include <stdio.h>

void weakalias() attribute((weak, alias (“f”)));
void strongalias() attribute((alias (“f”)));

void f () attribute ((weak));
void f()
{
printf(“In weak1.c:f\n”);
}
void test1() {
printf(“Call f() from weak1.c:\n”);
f();
printf(“Call weakalias() from weak1.c:\n”);
weakalias();
printf(“Call strongalias() from weak1.c:\n”);
strongalias();
}

$ cat weak2.c
#include <stdio.h>

void f () attribute ((weak));
void f()
{
printf(“In weak2.c:f\n”);
}
extern void test1();
void test2()
{
test1();
printf(“Call f() from weak2.c\n”);
f();
}

If I link weak1.c before weak2.c, nothing is surprising (we always invoke weak1.c:f at both -O0 and -O2):

$ clang weak12main.c weak1.c weak2.c -O0
$ a.out
Call f() from weak1.c:
In weak1.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak1.c:f

$ clang weak12main.c weak1.c weak2.c -O2
$ a.out
Call f() from weak1.c:
In weak1.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak1.c:f

If I instead link weak2.c first, so it’s copy of f() is prevailing, I still get weak1.c:f for the call via weakalias() (both opt levels), and for strongalias() when building at -O0. At -O2 the compiler replaces the call to strongalias() with a call to f(), so it get’s the weak2 copy in that case.

$ clang weak12main.c weak2.c weak1.c -O2
$ a.out
Call f() from weak1.c:
In weak2.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak2.c:f
Call f() from weak2.c
In weak2.c:f

$ clang weak12main.c weak2.c weak1.c -O0
$ a.out
Call f() from weak1.c:
In weak2.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak2.c:f

I’m wondering what the expected/correct behavior is? Depending on what is correct, we need to handle this differently in ThinLTO mode. Let’s say weak1.c’s copy of f() is not prevailing and I am going to drop it (it needs to be removed completely, not turned into available_externally to ensure it isn’t inlined since weak isInterposable). If we want the aliases in weak1.c to reference the original version, then copying is correct (e.g. weakalias and strong alias would each become a copy of weak1.c’s f()). If we however want them to resolve to the prevailing copy of f(), then we need to turn the aliases into declarations (external linkage in the case of strongalias and external weak in the case of weakalias?).

I also tried the case where f() was in a comdat, because I also need to handle that case in ThinLTO (when f() is not prevailing, drop it from the comdat and remove the comdat from that module). Interestingly, in this case when weak2.c is prevailing, I get the following warning when linking and get a seg fault at runtime:

weak1.o:weak1.o:function test1: warning: relocation refers to discarded section

Presumably the aliases still refer to the copy in weak1.c, which is in the comdat that gets dropped by the linker. So is it not legal to have an alias to a weak symbol in a comdat (i.e. alias from outside the comdat)? We don’t complain in the compiler.

Thanks,
Teresa

‘strong_alias’ is resolved to the copy of ‘f’ in weak1.c and it does not have weak linkage. In other words weak symbol ‘f’ in weak1.c gets ‘promoted’ to be non-weak through name ‘strong_alias’. Given the above, -O0’s behavior is more correct. It is also consistent with what GCC does (both O0 and O2).

David

Hi Teresa,

I think that to answer your question correctly it is helpful to consider what is going on at the object file level. For your test1.c we conceptually have a .text section containing the body of f, and then three symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except that they happen to point to the same place. If f is overridden by a symbol in another object file, it does not affect the symbols strongalias and weakalias, so we still need to make them point to .text. I don’t think it would be right to make strongalias and weakalias into copies of f, as that would be observable through function pointer equality. Most likely all you need to do is to internalize f and keep strongalias and weakalias as aliases of f.

If we’re resolving strongalias to f at -O2, that seems like a bug to me. We can probably only resolve an alias to the symbol it references if we are guaranteed that both symbols will have the same resolution, i.e. we must check at least that both symbols have strong or internal linkage. If we cared about symbol interposition, we might also want to check that both symbols have non-default visibility, but I think that our support for that is still a little fuzzy at the moment.

Thanks,
Peter

Thanks, David and Peter. Some responses to Peter’s email below. Teresa

Thanks, David and Peter. Some responses to Peter's email below. Teresa

Hi Teresa,

I think that to answer your question correctly it is helpful to consider
what is going on at the object file level. For your test1.c we conceptually
have a .text section containing the body of f, and then three symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except
that they happen to point to the same place. If f is overridden by a symbol
in another object file, it does not affect the symbols strongalias and
weakalias, so we still need to make them point to .text. I don't think it
would be right to make strongalias and weakalias into copies of f, as that
would be observable through function pointer equality. Most likely all you
need to do is to internalize f and keep strongalias and weakalias as
aliases of f.

Good point on wanting function pointer equality. However, we can't simply
internalize f(). We'll also need to rename the internalized copy. The
reason is that we want the original f() references to resolve to the
prevailing copy in the other module.Summarizing what we just talked about
on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
alias point to it,

except for non-prevailing weak aliases.

we need to:
1) Rename and internalize f()
2) Create a new external decl f()
3) RAUW existing references (other than from the aliases) with the new
local created in 1)

Should be be 'with new external decl f in 2) ' ?

I think if it is however a weak_odr/linkonce_odr we can simplify the
process since all copies will be the same. We can make f()
available_externally (to enable inlining), and simply convert references to
aliases of f() into direct references to f() and drop the aliases - does
that sound right?

Sounds right.

Another tricky thing is if the weak symbol was a variable that is
initialized via a __cxx_global_var_init function in the global_ctors list.
If we have an alias to that symbol, presumably we'll want the new
internalized/renamed version to get initialized instead?

If the initializer references the aliased symbol, then yes. If the original
weak symbol is referenced, I don't see why the prevailing one should not be
used.

Now in the case where we have an alias that is itself a weak
non-prevailing symbol, how we handle will I think depend on what it is
aliased to:
a) aliased to a weak/linkonce non-prevailing symbol -> handle as described
earlier
b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
described earlier
c) aliased to a strong symbol or a prevailing symbol -> convert to
external decl (I think this case is only possible if the alias is a non-odr
weak/linkonce)

Does that sound right?

non-prevailing weak aliases can probably be safely discarded. The
prevailing symbol may or may not be an alias itself.

David

Thanks, David and Peter. Some responses to Peter's email below. Teresa

Hi Teresa,

I think that to answer your question correctly it is helpful to consider
what is going on at the object file level. For your test1.c we conceptually
have a .text section containing the body of f, and then three symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except
that they happen to point to the same place. If f is overridden by a symbol
in another object file, it does not affect the symbols strongalias and
weakalias, so we still need to make them point to .text. I don't think it
would be right to make strongalias and weakalias into copies of f, as that
would be observable through function pointer equality. Most likely all you
need to do is to internalize f and keep strongalias and weakalias as
aliases of f.

Good point on wanting function pointer equality. However, we can't
simply internalize f(). We'll also need to rename the internalized copy.
The reason is that we want the original f() references to resolve to the
prevailing copy in the other module.Summarizing what we just talked about
on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
alias point to it,

except for non-prevailing weak aliases.

Right, this is just dealing with aliased symbols, not the alias itself
(which is discussed below).

we need to:
1) Rename and internalize f()
2) Create a new external decl f()
3) RAUW existing references (other than from the aliases) with the new
local created in 1)

Should be be 'with new external decl f in 2) ' ?

Yep, right I switched that when I wrote it in the email!

I think if it is however a weak_odr/linkonce_odr we can simplify the
process since all copies will be the same. We can make f()
available_externally (to enable inlining), and simply convert references to
aliases of f() into direct references to f() and drop the aliases - does
that sound right?

Sounds right.

Another tricky thing is if the weak symbol was a variable that is
initialized via a __cxx_global_var_init function in the global_ctors list.
If we have an alias to that symbol, presumably we'll want the new
internalized/renamed version to get initialized instead?

If the initializer references the aliased symbol, then yes. If the
original weak symbol is referenced, I don't see why the prevailing one
should not be used.

I was thinking of the case where the aliased symbol is the original weak
symbol. I.e. you have something like:

@fv = weak global i8 0, align 8
@strongalias = weak alias i8, i8* @fv

(@fv is the aliased weak symbol). If @fv is initialized via an initializer
in the global_ctors list, and therefore we follow the procedure described
above and convert it to a renamed local like:

@fv.llvm.1 = internal global i8 0, align 8
@strongalias = weak alias i8, i8* @fv.llvm.1

and the original converted to an external decl like:

@fv = external global i8

Then presumably the prevailing copy of @fv will be initialized elsewhere,
and @fv.llvm.1 needs to be initialized here.

Now in the case where we have an alias that is itself a weak
non-prevailing symbol, how we handle will I think depend on what it is
aliased to:
a) aliased to a weak/linkonce non-prevailing symbol -> handle as
described earlier
b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
described earlier
c) aliased to a strong symbol or a prevailing symbol -> convert to
external decl (I think this case is only possible if the alias is a non-odr
weak/linkonce)

Does that sound right?

non-prevailing weak aliases can probably be safely discarded. The
prevailing symbol may or may not be an alias itself.

Meaning just convert it to an external decl?

Teresa

Hi,

I believe we should get it right (and simpler) if (when…) we move to the representation we discussed last spring: https://llvm.org/bugs/show_bug.cgi?id=27866

Our conclusion was that we should always have alias pointing to private anonymous and nothing else, so when we currently have:

@a = global i32 0
@b = alias @a

It should always become:

@0 = private i32 0
@a = alias @0
@b = alias @0

Thanks, David and Peter. Some responses to Peter's email below. Teresa

Hi Teresa,

I think that to answer your question correctly it is helpful to
consider what is going on at the object file level. For your test1.c we
conceptually have a .text section containing the body of f, and then three
symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except
that they happen to point to the same place. If f is overridden by a symbol
in another object file, it does not affect the symbols strongalias and
weakalias, so we still need to make them point to .text. I don't think it
would be right to make strongalias and weakalias into copies of f, as that
would be observable through function pointer equality. Most likely all you
need to do is to internalize f and keep strongalias and weakalias as
aliases of f.

Good point on wanting function pointer equality. However, we can't
simply internalize f(). We'll also need to rename the internalized copy.
The reason is that we want the original f() references to resolve to the
prevailing copy in the other module.Summarizing what we just talked about
on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
alias point to it,

except for non-prevailing weak aliases.

Right, this is just dealing with aliased symbols, not the alias itself
(which is discussed below).

we need to:
1) Rename and internalize f()
2) Create a new external decl f()
3) RAUW existing references (other than from the aliases) with the new
local created in 1)

Should be be 'with new external decl f in 2) ' ?

Yep, right I switched that when I wrote it in the email!

I think if it is however a weak_odr/linkonce_odr we can simplify the
process since all copies will be the same. We can make f()
available_externally (to enable inlining), and simply convert references to
aliases of f() into direct references to f() and drop the aliases - does
that sound right?

Sounds right.

Another tricky thing is if the weak symbol was a variable that is
initialized via a __cxx_global_var_init function in the global_ctors list.
If we have an alias to that symbol, presumably we'll want the new
internalized/renamed version to get initialized instead?

If the initializer references the aliased symbol, then yes. If the
original weak symbol is referenced, I don't see why the prevailing one
should not be used.

I was thinking of the case where the aliased symbol is the original weak
symbol. I.e. you have something like:

@fv = weak global i8 0, align 8
@strongalias = weak alias i8, i8* @fv

(@fv is the aliased weak symbol). If @fv is initialized via an initializer
in the global_ctors list, and therefore we follow the procedure described
above and convert it to a renamed local like:

@fv.llvm.1 = internal global i8 0, align 8
@strongalias = weak alias i8, i8* @fv.llvm.1

and the original converted to an external decl like:

@fv = external global i8

Then presumably the prevailing copy of @fv will be initialized elsewhere,
and @fv.llvm.1 needs to be initialized her.

Right, the global initialization is part of the object definition.

Now in the case where we have an alias that is itself a weak
non-prevailing symbol, how we handle will I think depend on what it is
aliased to:
a) aliased to a weak/linkonce non-prevailing symbol -> handle as
described earlier
b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
described earlier
c) aliased to a strong symbol or a prevailing symbol -> convert to
external decl (I think this case is only possible if the alias is a non-odr
weak/linkonce)

Does that sound right?

non-prevailing weak aliases can probably be safely discarded. The
prevailing symbol may or may not be an alias itself.

Meaning just convert it to an external decl?

I believe so.

David

Hi,

I believe we should get it right (and simpler) if (when…) we move to the
representation we discussed last spring: https://llvm.org/bugs/
show_bug.cgi?id=27866

Our conclusion was that we should always have alias pointing to private
anonymous and nothing else, so when we currently have:

@a = global i32 0
@b = alias @a

It should always become:

@0 = private i32 0
@a = alias @0
@b = alias @0

Yes that has some nice properties. I think it makes at least one case
harder though.

Consider:

1) Original:
define linkonce_odr void @a() { ... }
@b = linkonce_odr alias void (), void ()* @a

2) In the proposed canonical form this becomes:
define private void @0() { ... }
@a = linkonce_odr alias void (), void ()* @0
@b = linkonce_odr alias void (), void ()* @0

And let's say @a and @b are both non-prevailing in this module. I think we
lose inlining ability for these functions when we start dropping
non-prevailing copies.

In 1) we could transform this (as described in my earlier email) to:

define available_externally void @a() { ... }
/* RAUW @b with @a */

and we can inline @a into all callsites (which includes calls original via
@b), before dropping it.

With 2), I think we would need to transform this to:

define private void @0() { ... }
declare void @a()
declare void @b()

and we no longer can inline calls to @a and @b. I suppose we could
intervene and special case this scenario so that it is transformed as for
1) when there are non-prevailing odr aliases.

I need to think through the various permutations of linkage types and see
how they get transformed in the current and proposed canonical forms to see
if there are other cases that need special handling.

Teresa

The complication is that a cloneFunction() step would be needed right?

Alternatively I can imagine us allowing available_externally alias, I’m not sure why they are forbidden today other than because we map the object-level limitations in the IR, but I can miss something (we don’t have alias on Darwin either…).

Thanks, David and Peter. Some responses to Peter's email below. Teresa

Hi Teresa,

I think that to answer your question correctly it is helpful to consider
what is going on at the object file level. For your test1.c we conceptually
have a .text section containing the body of f, and then three symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except
that they happen to point to the same place. If f is overridden by a symbol
in another object file, it does not affect the symbols strongalias and
weakalias, so we still need to make them point to .text. I don't think it
would be right to make strongalias and weakalias into copies of f, as that
would be observable through function pointer equality. Most likely all you
need to do is to internalize f and keep strongalias and weakalias as
aliases of f.

Good point on wanting function pointer equality. However, we can't simply
internalize f(). We'll also need to rename the internalized copy. The
reason is that we want the original f() references to resolve to the
prevailing copy in the other module.Summarizing what we just talked about
on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
alias point to it, we need to:
1) Rename and internalize f()
2) Create a new external decl f()
3) RAUW existing references (other than from the aliases) with the new
local created in 1)

I think if it is however a weak_odr/linkonce_odr we can simplify the
process since all copies will be the same. We can make f()
available_externally (to enable inlining), and simply convert references to
aliases of f() into direct references to f() and drop the aliases - does
that sound right?

I think you are right about the _odr -- we should assume that even if the
symbol as we see it is an alias it will be replaced with something with the
same semantics. I think we can also take that into account in the logic for
replacing the alias with its aliasee, as follows:

- if the alias is internal, strong external or odr (i.e. not
isInterposableLinkage): may replace the global with an available_externally
copy of its aliasee
- if both the alias and aliasee are not isInterposableLinkage: may replace
the global reference with a reference to the aliasee

Another tricky thing is if the weak symbol was a variable that is

initialized via a __cxx_global_var_init function in the global_ctors list.
If we have an alias to that symbol, presumably we'll want the new
internalized/renamed version to get initialized instead?

I believe that the purpose of the global reference in global_ctors is to
control which comdat the .init_array entry appears in. So yes, we will need
to rewrite the reference in global_ctors to point to the renamed global.
However, I think we also need to think about whether to remove the
init_array entry entirely. There are a couple of cases:

1) All symbols in the comdat are weak and have been overridden by strong
symbols in another object file, which may not necessarily be in a comdat.
In that case we need to keep the init_array entry so that we call the
initializer function.
2) The linker has selected another comdat entirely. That means that this
object file's init_array entry has not been chosen and we need to drop it.

The interesting thing about these two cases is that they are
indistinguishable at the LTO API level because the linker will report all
of the comdat symbols as non-prevailing.

I don't think it is possible to arrive at case 1 with regular C++, so maybe
we'd be fine assuming case 2 if all comdat symbols are non-prevailing. But
then again it would seem to make the implementation simpler if we
communicate which comdat has prevailed at the LTO API level.

Now in the case where we have an alias that is itself a weak non-prevailing

symbol, how we handle will I think depend on what it is aliased to:
a) aliased to a weak/linkonce non-prevailing symbol -> handle as described
earlier
b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
described earlier
c) aliased to a strong symbol or a prevailing symbol -> convert to
external decl (I think this case is only possible if the alias is a non-odr
weak/linkonce)

Does that sound right?

I think the logic just needs to depend on the linkage of the alias itself,
as I described above.

If we're resolving strongalias to f at -O2, that seems like a bug to me. We

Teresa Johnson via llvm-dev <llvm-dev@lists.llvm.org> writes:

I was wondering though what happens if we have an alias, which may or may
not be weak itself, to a non-odr weak symbol that isn't prevailing. In that
case, do we eventually want references via the alias to go to the
prevailing copy (in another module), or to the original copy in the alias's
module? I looked at some examples without ThinLTO, and am a little
confused. Current (non-ThinLTO) behavior in some cases seems to depend on
opt level.

Example:

$ cat weak12main.c
extern void test2();
int main() {
  test2();
}

$ cat weak1.c
#include <stdio.h>

void weakalias() __attribute__((weak, alias ("f")));
void strongalias() __attribute__((alias ("f")));

void f () __attribute__ ((weak));
void f()
{
  printf("In weak1.c:f\n");
}
void test1() {
  printf("Call f() from weak1.c:\n");
  f();
  printf("Call weakalias() from weak1.c:\n");
  weakalias();
  printf("Call strongalias() from weak1.c:\n");
  strongalias();
}

$ cat weak2.c
#include <stdio.h>

void f () __attribute__ ((weak));
void f()
{
  printf("In weak2.c:f\n");
}
extern void test1();
void test2()
{
  test1();
  printf("Call f() from weak2.c\n");
  f();
}

If I link weak1.c before weak2.c, nothing is surprising (we always invoke
weak1.c:f at both -O0 and -O2):

$ clang weak12main.c weak1.c weak2.c -O0
$ a.out
Call f() from weak1.c:
In weak1.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak1.c:f

$ clang weak12main.c weak1.c weak2.c -O2
$ a.out
Call f() from weak1.c:
In weak1.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak1.c:f

If I instead link weak2.c first, so it's copy of f() is prevailing, I still
get weak1.c:f for the call via weakalias() (both opt levels), and for
strongalias() when building at -O0. At -O2 the compiler replaces the call
to strongalias() with a call to f(), so it get's the weak2 copy in that
case.

$ clang weak12main.c weak2.c weak1.c -O2
$ a.out
Call f() from weak1.c:
In weak2.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak2.c:f
Call f() from weak2.c
In weak2.c:f

$ clang weak12main.c weak2.c weak1.c -O0
$ a.out
Call f() from weak1.c:
In weak2.c:f
Call weakalias() from weak1.c:
In weak1.c:f
Call strongalias() from weak1.c:
In weak1.c:f
Call f() from weak2.c
In weak2.c:f

I'm wondering what the expected/correct behavior is? Depending on what is
correct, we need to handle this differently in ThinLTO mode. Let's say
weak1.c's copy of f() is not prevailing and I am going to drop it (it needs
to be removed completely, not turned into available_externally to ensure it
isn't inlined since weak isInterposable). If we want the aliases in weak1.c
to reference the original version, then copying is correct (e.g. weakalias
and strong alias would each become a copy of weak1.c's f()). If we however
want them to resolve to the prevailing copy of f(), then we need to turn
the aliases into declarations (external linkage in the case of strongalias
and external weak in the case of weakalias?).

I also tried the case where f() was in a comdat, because I also need to
handle that case in ThinLTO (when f() is not prevailing, drop it from the
comdat and remove the comdat from that module). Interestingly, in this case
when weak2.c is prevailing, I get the following warning when linking and
get a seg fault at runtime:

weak1.o:weak1.o:function test1: warning: relocation refers to discarded
section

Presumably the aliases still refer to the copy in weak1.c, which is in the
comdat that gets dropped by the linker. So is it not legal to have an alias
to a weak symbol in a comdat (i.e. alias from outside the comdat)? We don't
complain in the compiler.

The rule should be that the alias to aliasee link is never broken. The
reason being that an alias at the file level is just another symbol with
the same value.

So if foo is an alias to bar, accessing that foo will always be the same
as accessing that bar, regardless of either of them being weak. I say
*that* foo and *that* bar because symbol resolution may pick another foo
and another bar.

Cheers,
Rafael

Mehdi Amini via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi,

I believe we should get it right (and simpler) if (when…) we move to the representation we discussed last spring: https://llvm.org/bugs/show_bug.cgi?id=27866

Our conclusion was that we should always have alias pointing to private anonymous and nothing else, so when we currently have:

@a = global i32 0
@b = alias @a

It should always become:

@0 = private i32 0
@a = alias @0
@b = alias @0

Yes, that would be awesome :slight_smile:

Cheers,
Rafael

Mehdi Amini via llvm-dev <llvm-dev@lists.llvm.org> writes:

The complication is that a cloneFunction() step would be needed right?

Alternatively I can imagine us allowing available_externally alias, I’m not sure why they are forbidden today other than because we map the object-level limitations in the IR, but I can miss something (we don’t have alias on Darwin either…).

We can probably make it work. The meaning is just that we know that in
another file there is an alias with the given definition.

Darwin has a special case alias for swift functions with a prolog, no?

Cheers,
Rafael

I know there is something that is used in Swift, I don’t know the details unfortunately.

Thanks, David and Peter. Some responses to Peter's email below. Teresa

Hi Teresa,

I think that to answer your question correctly it is helpful to consider
what is going on at the object file level. For your test1.c we conceptually
have a .text section containing the body of f, and then three symbols:

.weak f
f = .text
.globl strongalias
strongalias = .text
.weak weakalias
weakalias = .text

Note that f, strongalias and weakalias are not related at all, except
that they happen to point to the same place. If f is overridden by a symbol
in another object file, it does not affect the symbols strongalias and
weakalias, so we still need to make them point to .text. I don't think it
would be right to make strongalias and weakalias into copies of f, as that
would be observable through function pointer equality. Most likely all you
need to do is to internalize f and keep strongalias and weakalias as
aliases of f.

Good point on wanting function pointer equality. However, we can't
simply internalize f(). We'll also need to rename the internalized copy.
The reason is that we want the original f() references to resolve to the
prevailing copy in the other module.Summarizing what we just talked about
on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
alias point to it, we need to:
1) Rename and internalize f()
2) Create a new external decl f()
3) RAUW existing references (other than from the aliases) with the new
local created in 1)

I think if it is however a weak_odr/linkonce_odr we can simplify the
process since all copies will be the same. We can make f()
available_externally (to enable inlining), and simply convert references to
aliases of f() into direct references to f() and drop the aliases - does
that sound right?

I think you are right about the _odr -- we should assume that even if the
symbol as we see it is an alias it will be replaced with something with the
same semantics. I think we can also take that into account in the logic for
replacing the alias with its aliasee, as follows:

- if the alias is internal, strong external or odr (i.e. not
isInterposableLinkage): may replace the global with an available_externally
copy of its aliasee

That doesn't match what we decided for aliases to a weak symbol (e.g. the
strongalias case from above). I thought we decided that these should end up
aliased to an internalized copy of its aliasee (assuming we're still
talking about the aliasee being weak and non-prevailing). If alias becomes
instead an available_externally copy, what happens when that copy is
eliminated - there may not be an external def to resolve the references to
at link time?

- if both the alias and aliasee are not isInterposableLinkage: may replace
the global reference with a reference to the aliasee

What if alias and aliasee are both non-prevailing and the prevailing defs
for each are different? E.g.

@x = weak global ...
@y = weak alias @x

and the prevailing def for @x is in moduleX with a different value than the
prevailing def for @y which comes from moduleY. Just because they are
aliased in this module doesn't mean they must be aliased elsewhere, right?
For this case (weak non-prevailing alias to a weak non-prevailing def) I
think it should eventually become:

@x = external global
@y = external global

which is what we would get as I proposed:
- first by following the above transformation to make @y alias with a
renamed and internalized @x, converting @x to an external decl
- second by following case c) further down (since @y now aliases with a
strong symbol), converting @y to an external decl

Another tricky thing is if the weak symbol was a variable that is

initialized via a __cxx_global_var_init function in the global_ctors list.
If we have an alias to that symbol, presumably we'll want the new
internalized/renamed version to get initialized instead?

I believe that the purpose of the global reference in global_ctors is to
control which comdat the .init_array entry appears in. So yes, we will need
to rewrite the reference in global_ctors to point to the renamed global.
However, I think we also need to think about whether to remove the
init_array entry entirely. There are a couple of cases:

1) All symbols in the comdat are weak and have been overridden by strong
symbols in another object file, which may not necessarily be in a comdat.
In that case we need to keep the init_array entry so that we call the
initializer function.
2) The linker has selected another comdat entirely. That means that this
object file's init_array entry has not been chosen and we need to drop it.

Related: D28737 ([ThinLTO] Don't create a comdat group for a dropped def
with initializer)

The interesting thing about these two cases is that they are
indistinguishable at the LTO API level because the linker will report all
of the comdat symbols as non-prevailing.

I don't think it is possible to arrive at case 1 with regular C++, so
maybe we'd be fine assuming case 2 if all comdat symbols are
non-prevailing. But then again it would seem to make the implementation
simpler if we communicate which comdat has prevailed at the LTO API level.

Yes another thing I realized is that we will drop the comdat for a
non-prevailing weak that we convert to available_externally (or to a decl
with D28806: [ThinLTO] Drop non-prevailing non-ODR weak to declarations),
but we won't drop any other members of the same comdat group from the
comdat. E.g. the comdat could contain GVs with internal or strong external
linkage. That also needs to be fixed, so we don't end up with incomplete
comdat groups.

We can deduce that the comdat is not selected by the linker when it
contains a weak symbol, since we know whether that is prevailing or not,
but not when a comdat doesn't contain any weak. In that latter case (comdat
doesn't have any weak symbols), I haven't thought through where this could
get us into trouble, since we don't typically drop any non-weak symbols in
ThinLTO compilation (so we wouldn't end up with an incomplete comdat
group), but I wonder if there are potential issues especially in the
distributed case where we do 2 separate links.

Now in the case where we have an alias that is itself a weak

non-prevailing symbol, how we handle will I think depend on what it is
aliased to:
a) aliased to a weak/linkonce non-prevailing symbol -> handle as
described earlier
b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
described earlier
c) aliased to a strong symbol or a prevailing symbol -> convert to
external decl (I think this case is only possible if the alias is a non-odr
weak/linkonce)

Does that sound right?

I think the logic just needs to depend on the linkage of the alias itself,
as I described above.

See my comment on this above.

Thanks,
Teresa

Teresa Johnson via llvm-dev <llvm-dev@lists.llvm.org> writes:

> I was wondering though what happens if we have an alias, which may or may
> not be weak itself, to a non-odr weak symbol that isn't prevailing. In
that
> case, do we eventually want references via the alias to go to the
> prevailing copy (in another module), or to the original copy in the
alias's
> module? I looked at some examples without ThinLTO, and am a little
> confused. Current (non-ThinLTO) behavior in some cases seems to depend on
> opt level.
>
> Example:
>
> $ cat weak12main.c
> extern void test2();
> int main() {
> test2();
> }
>
> $ cat weak1.c
> #include <stdio.h>
>
> void weakalias() __attribute__((weak, alias ("f")));
> void strongalias() __attribute__((alias ("f")));
>
> void f () __attribute__ ((weak));
> void f()
> {
> printf("In weak1.c:f\n");
> }
> void test1() {
> printf("Call f() from weak1.c:\n");
> f();
> printf("Call weakalias() from weak1.c:\n");
> weakalias();
> printf("Call strongalias() from weak1.c:\n");
> strongalias();
> }
>
> $ cat weak2.c
> #include <stdio.h>
>
> void f () __attribute__ ((weak));
> void f()
> {
> printf("In weak2.c:f\n");
> }
> extern void test1();
> void test2()
> {
> test1();
> printf("Call f() from weak2.c\n");
> f();
> }
>
> If I link weak1.c before weak2.c, nothing is surprising (we always invoke
> weak1.c:f at both -O0 and -O2):
>
> $ clang weak12main.c weak1.c weak2.c -O0
> $ a.out
> Call f() from weak1.c:
> In weak1.c:f
> Call weakalias() from weak1.c:
> In weak1.c:f
> Call strongalias() from weak1.c:
> In weak1.c:f
> Call f() from weak2.c
> In weak1.c:f
>
> $ clang weak12main.c weak1.c weak2.c -O2
> $ a.out
> Call f() from weak1.c:
> In weak1.c:f
> Call weakalias() from weak1.c:
> In weak1.c:f
> Call strongalias() from weak1.c:
> In weak1.c:f
> Call f() from weak2.c
> In weak1.c:f
>
> If I instead link weak2.c first, so it's copy of f() is prevailing, I
still
> get weak1.c:f for the call via weakalias() (both opt levels), and for
> strongalias() when building at -O0. At -O2 the compiler replaces the call
> to strongalias() with a call to f(), so it get's the weak2 copy in that
> case.
>
> $ clang weak12main.c weak2.c weak1.c -O2
> $ a.out
> Call f() from weak1.c:
> In weak2.c:f
> Call weakalias() from weak1.c:
> In weak1.c:f
> Call strongalias() from weak1.c:
> In weak2.c:f
> Call f() from weak2.c
> In weak2.c:f
>
> $ clang weak12main.c weak2.c weak1.c -O0
> $ a.out
> Call f() from weak1.c:
> In weak2.c:f
> Call weakalias() from weak1.c:
> In weak1.c:f
> Call strongalias() from weak1.c:
> In weak1.c:f
> Call f() from weak2.c
> In weak2.c:f
>
> I'm wondering what the expected/correct behavior is? Depending on what is
> correct, we need to handle this differently in ThinLTO mode. Let's say
> weak1.c's copy of f() is not prevailing and I am going to drop it (it
needs
> to be removed completely, not turned into available_externally to ensure
it
> isn't inlined since weak isInterposable). If we want the aliases in
weak1.c
> to reference the original version, then copying is correct (e.g.
weakalias
> and strong alias would each become a copy of weak1.c's f()). If we
however
> want them to resolve to the prevailing copy of f(), then we need to turn
> the aliases into declarations (external linkage in the case of
strongalias
> and external weak in the case of weakalias?).
>
> I also tried the case where f() was in a comdat, because I also need to
> handle that case in ThinLTO (when f() is not prevailing, drop it from the
> comdat and remove the comdat from that module). Interestingly, in this
case
> when weak2.c is prevailing, I get the following warning when linking and
> get a seg fault at runtime:
>
> weak1.o:weak1.o:function test1: warning: relocation refers to discarded
> section
>
> Presumably the aliases still refer to the copy in weak1.c, which is in
the
> comdat that gets dropped by the linker. So is it not legal to have an
alias
> to a weak symbol in a comdat (i.e. alias from outside the comdat)? We
don't
> complain in the compiler.

The rule should be that the alias to aliasee link is never broken. The
reason being that an alias at the file level is just another symbol with
the same value.

So if foo is an alias to bar, accessing that foo will always be the same
as accessing that bar, regardless of either of them being weak. I say
*that* foo and *that* bar because symbol resolution may pick another foo
and another bar.

Are you just talking about the comdat case? If this also applies to the
non-comdat case, I'm not sure how this works in the following situation
(copied from an example in my response just now to pcc):

If a module contains the following, and both @x and @y are non-prevailing
in that module:

@x = weak global ...
@y = weak alias @x

and the prevailing def for @x is in moduleX with a different value than the
prevailing def for @y which comes from moduleY. Just because they are
aliased in this module doesn't mean they must be aliased elsewhere, right?
For this case (weak non-prevailing alias to a weak non-prevailing def) I
think it should eventually become:

@x = external global
@y = external global

Trying to understand how this fits with your comment that "The rule should
be that the alias to aliasee link is never broken", unless that is just
referring to the case I mentioned just above about when they are both in
comdats.

Thanks,
Teresa

Hi,

I believe we should get it right (and simpler) if (when…) we move to the
representation we discussed last spring: https://llvm.org/bugs/
show_bug.cgi?id=27866

Our conclusion was that we should always have alias pointing to private
anonymous and nothing else, so when we currently have:

@a = global i32 0
@b = alias @a

It should always become:

@0 = private i32 0
@a = alias @0
@b = alias @0

Yes that has some nice properties. I think it makes at least one case
harder though.

Consider:

1) Original:
define linkonce_odr void @a() { ... }
@b = linkonce_odr alias void (), void ()* @a

2) In the proposed canonical form this becomes:
define private void @0() { ... }
@a = linkonce_odr alias void (), void ()* @0
@b = linkonce_odr alias void (), void ()* @0

And let's say @a and @b are both non-prevailing in this module. I think we
lose inlining ability for these functions when we start dropping
non-prevailing copies.

In 1) we could transform this (as described in my earlier email) to:

define available_externally void @a() { ... }
/* RAUW @b with @a */

and we can inline @a into all callsites (which includes calls original via
@b), before dropping it.

With 2), I think we would need to transform this to:

define private void @0() { ... }
declare void @a()
declare void @b()

and we no longer can inline calls to @a and @b. I suppose we could
intervene and special case this scenario so that it is transformed as for
1) when there are non-prevailing odr aliases.

The complication is that a cloneFunction() step would be needed right?

I hadn't thought about that level of implementation detail, but in any case
a cloneFunction step will be needed with both the current alias format and
with your proposed new form (just in different cases though). My point was
that the proposed format doesn't always make the handling simpler.

Alternatively I can imagine us allowing available_externally alias, I’m
not sure why they are forbidden today other than because we map the
object-level limitations in the IR, but I can miss something (we don’t have
alias on Darwin either…).

Allowing available_externally alias helps simplify the case where we have a
non-prevailing weak alias. We would still need to address the case where
the aliasee is non-prevailing, which I think is the trickier situation to
handle.

Teresa

What if alias and aliasee are both non-prevailing and the prevailing defs
for each are different? E.g.

@x = weak global ...
@y = weak alias @x

and the prevailing def for @x is in moduleX with a different value than the
prevailing def for @y which comes from moduleY. Just because they are
aliased in this module doesn't mean they must be aliased elsewhere,
right?

Correct.

Cheers,
Rafael

The rule should be that the alias to aliasee link is never broken. The
reason being that an alias at the file level is just another symbol with
the same value.

So if foo is an alias to bar, accessing that foo will always be the same
as accessing that bar, regardless of either of them being weak. I say
*that* foo and *that* bar because symbol resolution may pick another foo
and another bar.

Are you just talking about the comdat case?

No, that is always the case.

If this also applies to the
non-comdat case, I'm not sure how this works in the following situation
(copied from an example in my response just now to pcc):

If a module contains the following, and both @x and @y are non-prevailing
in that module:

@x = weak global ...
@y = weak alias @x

and the prevailing def for @x is in moduleX with a different value than the
prevailing def for @y which comes from moduleY. Just because they are
aliased in this module doesn't mean they must be aliased elsewhere,
right?

Correct.

For this case (weak non-prevailing alias to a weak non-prevailing def) I
think it should eventually become:

@x = external global
@y = external global

When dropping @x and @y during IR linking? I agree.

Trying to understand how this fits with your comment that "The rule should
be that the alias to aliasee link is never broken", unless that is just
referring to the case I mentioned just above about when they are both in
comdats.

In the above example the "link/connection" is not broken. The original
@x and @y still refer to the same data, it is just that neither is used
in the final linked object.

Cheers,
Rafael