Changing the design of GlobalAliases to represent what is actually possible in object files.

Bringing the discussion to llvmdev.

For the purposed of this discussion, object files can be thought as
having just a few thing we care about: data, labels and relocations.

Data is what at llvm ir would be just the contents of variables or functions.

Relocations are utilities to compute the data at link time when it is
not possible to do so earlier. For example, to compute a pcrel
relocation we need to know the offset of a given symbol to the current
position.

Relocations at the llvm IR are represented with ConstantExpr. There is
a point to be made that that representation could be better, but I
thing that is not too important for this discussion. Whatever we turn
ConstantExpr into it will always have to be able to represent the
relocations we want to create.

Now for the main part of this proposal: The labels.

Some labels are implicitly created for other llvm constructs. A
Function or a GlobalVariable will have one for example.

But labels at the object files are not constrained to point to what at
the LLVM level is a Function or a GlobalVariable. We need a way to ask
for other labels. We need to

* Be able to create labels with an absolute value.
* Be able to create labels inside a GlobalVarible or pointing to the
start of a GlobalVariable or Function.

Note that it is still just a label. No relocations are involved.

The tool we have in llvm for creating these extra labels is the GlobalAlias.

One way of representing the above uses is to be explicit: a label is
created with a specific value or at an offset from another.

Another way of representing it is with a ConstantExpr, since those two
cases are a subset of what a ConstantExpr can represent.

My preference is for having an explicit offset that is just an
integer. Using an ConstantExpr seems conflation of two different
things: labels and relocations. The fact that some relocations are as
simple as a label plus an offset seems incidental.

From an implementation perspective having a representation that uses

just a GlobalObject and an offset seems beneficial too. Any attempt to
create a non representable label (GlobalAlias) will fail immediately,
instead of leaving the IR in a state that will fail down the line.

It also makes general IR operation like rauw easier to reason about.
Since ConstantExpr are uniqued, they have a more complex replace
implementation where they have to be replaced one level at a time. We
would have to wait until the replacement reaches the GlobalAlias to
see if it still is one of the ConstanExprs that happen to be just a
label and an offset, and if it is not we would have not easy way of
knowing what went wrong.

Cheers,
Rafael

Bringing the discussion to llvmddv.

Thanks for doing this.

For the purposed of this discussion, object files can be thought as
having just a few thing we care about: data, labels and relocations.

Data is what at llvm ir would be just the contents of variables or functions.

Relocations are utilities to compute the data at link time when it is
not possible to do so earlier. For example, to compute a pcrel
relocation we need to know the offset of a given symbol to the current
position.

Relocations at the llvm IR are represented with ConstantExpr. There is
a point to be made that that representation could be better, but I
thing that is not too important for this discussion. Whatever we turn
ConstantExpr into it will always have to be able to represent the
relocations we want to create.

Now for the main part of this proposal: The labels.

Some labels are implicitly created for other llvm constructs. A
Function or a GlobalVariable will have one for example.

But labels at the object files are not constrained to point to what at
the LLVM level is a Function or a GlobalVariable. We need a way to ask
for other labels. We need to

* Be able to create labels with an absolute value.
* Be able to create labels inside a GlobalVarible or pointing to the
start of a GlobalVariable or Function.

I agree that this accurately summarizes both (1) what’s expressible in
current object file formats and (2) what we’re likely to want to need from
global aliases.

The tool we have in llvm for creating these extra labels is the GlobalAlias.

One way of representing the above uses is to be explicit: a label is
created with a specific value or at an offset from another.

Also important: in this model, the label has its own LLVM type, which is
permitted to differ from the LLVM type of the aliasee (if present).

I will note that this model does require absolute symbols to be literal values.
That eliminates a lot of things that are at least theoretically useful.

For example, it would not be possible to define an absolute symbol to be
the offset between two symbols. In some restricted circumstances —
if both symbols are global variables in the same section and defined
in the same translation unit — this could be worked around.

But I’ll gladly admit that I don’t have a use case in mind for that feature.
Absolute symbols are useful, and storing offsets between symbols into
global memory is useful, but I don’t know why you’d combine them.

Another way of representing it is with a ConstantExpr, since those two
cases are a subset of what a ConstantExpr can represent.

My preference is for having an explicit offset that is just an
integer. Using an ConstantExpr seems conflation of two different
things: labels and relocations. The fact that some relocations are as
simple as a label plus an offset seems incidental.

I don’t think I accept that ConstantExpr just means “relocation” in IR,
either in principal or as a description of reality. A constant used only as
an instruction operand is definitely not limited to what’s expressible
with relocations.

From an implementation perspective having a representation that uses
just a GlobalObject and an offset seems beneficial too. Any attempt to
create a non representable label (GlobalAlias) will fail immediately,
instead of leaving the IR in a state that will fail down the line.

It also makes general IR operation like rauw easier to reason about.
Since ConstantExpr are uniqued, they have a more complex replace
implementation where they have to be replaced one level at a time. We
would have to wait until the replacement reaches the GlobalAlias to
see if it still is one of the ConstanExprs that happen to be just a
label and an offset, and if it is not we would have not easy way of
knowing what went wrong.

Is this not still true under the global-and-offset model? If you replace
the target of a GlobalAlias with a ConstantExpr, RAUW will have to
evaluate the expression down to a global and an offset in exactly the
way that you’re worried about the backend having to do. Except,
of course, RAUW has to worry about working with a module that
lacks data layout.

John.

I agree that this accurately summarizes both (1) what’s expressible in
current object file formats and (2) what we’re likely to want to need from
global aliases.

The tool we have in llvm for creating these extra labels is the GlobalAlias.

One way of representing the above uses is to be explicit: a label is
created with a specific value or at an offset from another.

Also important: in this model, the label has its own LLVM type, which is
permitted to differ from the LLVM type of the aliasee (if present).

I will note that this model does require absolute symbols to be literal values.
That eliminates a lot of things that are at least theoretically useful.

For example, it would not be possible to define an absolute symbol to be
the offset between two symbols. In some restricted circumstances —
if both symbols are global variables in the same section and defined
in the same translation unit — this could be worked around.

But I’ll gladly admit that I don’t have a use case in mind for that feature.
Absolute symbols are useful, and storing offsets between symbols into
global memory is useful, but I don’t know why you’d combine them.

That is funny. I, on the other hand, think that this is the best
argument I have seen for keeping aliases pointing to ConstantExpr so
far.

While labels and relocations are very different things at the object
level, llvm is not currently in a position to know when a relocation
is needed or not. I would like for that not to be the case, but that
is a far bigger change. It also points out that an expression being a
valid label definition or not can change in a way that is hard to see
during the change itself: We can have an arbitrarily nested expression
that goes from evaluatable to requiring a relocation when the section
of a global object is changed. That in turn puts the validity check in
the verifier, even we constraint ConstantExprs.

In other words, another possible representation would be

* GlobalsAlias point to ConstantExpr
* The expression is completely unconstrained in the current
implementation of ConstantExpr.
* There is no notion of an aliased symbol. Things like detecting
cycles go from "A == A->getAliasedSymbol()" to
"A->getAliasee().uses(A)", but even that seems questionable outside of
special case like clang that knows the types of alias it creates.

This would greatly diminish our ability to report invalid uses, since
the first thing to noticed they are invalid is MC. It would also
require the alias to weak alias problem to be handled directly in the
IR linker. In here we would have to approximate: do our best to
evaluate it, but if the expression still has discarded globals, error.

In other words, painful but reasonable. I will have to look at the
code to see how painful it is to generalize every user of the "aliased
symbol" to work with an arbitrary expression. I will experiment with
it tomorrow and report.

I don’t think I accept that ConstantExpr just means “relocation” in IR,
either in principal or as a description of reality. A constant used only as
an instruction operand is definitely not limited to what’s expressible
with relocations.

Yes. I know there are disagreements about ConstantExpr, but think we
all agree that they are *at least* as general as any relocation we
want to represent.

It also makes general IR operation like rauw easier to reason about.
Since ConstantExpr are uniqued, they have a more complex replace
implementation where they have to be replaced one level at a time. We
would have to wait until the replacement reaches the GlobalAlias to
see if it still is one of the ConstanExprs that happen to be just a
label and an offset, and if it is not we would have not easy way of
knowing what went wrong.

Is this not still true under the global-and-offset model? If you replace
the target of a GlobalAlias with a ConstantExpr, RAUW will have to
evaluate the expression down to a global and an offset in exactly the
way that you’re worried about the backend having to do. Except,
of course, RAUW has to worry about working with a module that
lacks data layout.

But in here RAUW is seeing the actual replacement. It is seeing the
GlobalObject that is directly used by a GlobalAlias being replaced
with an expression. If the alias points to a Constant, it is seeing
the result of a perfectly valid run of replaceUsesOfWithOnConstant
which may or may not be a valid aliasee.

Cheers,
Rafael

IMO if we want to support defining symbols at absolute addresses, we should
add a separate construct for this. So far everyone has gotten by with
linker flags and scripts, though.

> For example, it would not be possible to define an absolute symbol to be
> the offset between two symbols. In some restricted circumstances —
> if both symbols are global variables in the same section and defined
> in the same translation unit — this could be worked around.
>
> But I’ll gladly admit that I don’t have a use case in mind for that
> feature.
> Absolute symbols are useful, and storing offsets between symbols into
> global memory is useful, but I don’t know why you’d combine them.

That is funny. I, on the other hand, think that this is the best
argument I have seen for keeping aliases pointing to ConstantExpr so
far.

IMO if we want to support defining symbols at absolute addresses, we should
add a separate construct for this. So far everyone has gotten by with
linker flags and scripts, though.

Well, if we support an arbitrary ConstantExtpr it is hard not to
support absolute symbols.

Attached is a work in progress on trying to see how hard it would be
to implement support for arbitrary ConstantExpr. I is fairly
incomplete. In particular, the linker need to be updated. But it can
already codegen things like

t.patch (32.5 KB)

I'm not there yet, but at some point I'm going to need the notion of a
global callable function like symbol that's resolved at runtime. I've
not given it much thought but I may need a new callable entity here
(this is for the gnu ifunc stuff).

Don't even know if this fits into the discussion, but since we were
talking about weird symbols...

-eric

I'm not there yet, but at some point I'm going to need the notion of a
global callable function like symbol that's resolved at runtime. I've
not given it much thought but I may need a new callable entity here
(this is for the gnu ifunc stuff).

Don't even know if this fits into the discussion, but since we were
talking about weird symbols...

It is a symbol or a value that is loaded? If it is an symbol, what does it point to? That is, what is the value that shows up in the .o?

It's an IFUNC symbol. It is a relocation resolved by the loader at load
time by calling a designated function and replacing the relocation with the
returned function pointer (roughly).

Sent from my iPhone

I'm not there yet, but at some point I'm going to need the notion of a
global callable function like symbol that's resolved at runtime. I've
not given it much thought but I may need a new callable entity here
(this is for the gnu ifunc stuff).

Don't even know if this fits into the discussion, but since we were
talking about weird symbols...

It is a symbol or a value that is loaded? If it is an symbol, what does it point to? That is, what is the value that shows up in the .o?

Relocation resolved at compile time to one of N symbols that will be
the call that's made.

-eric

According to

http://www.airs.com/blog/archives/403

From the compiler & assembler point of view this is just a special symbol type.

Given that it cannot be used directly I would say it is not a
ConstantExpr, it is a property of a label.

Given that the special IFUNC label has the same value as the resolver,
it sounds like it is not a GlobalObject, just a special type of
GlobalAlias.

In summary, from a very quick look it looks like it is independent
from this discussion: All that is needed is a bit saying it is a
"IFUNC GlobalAlias", not a plain GlobalAlias, regardless of how we end
up representing GlobalAliases.

Judging from what I have seen so far I would be OK with

* Supporting only symbol + offset and representing that directly.
* Supporting "any" ConstantExpr.

Most of the uses of getAliasee were (or are) bugs and missing
features. With those out of the way, the tradeoff of the above options
becomes having the ability to represent any symbol value that the
assembler can at the cost of far weaker checking for the expressions
defining the global alias.

I reported the most annoying IR X object file mismatch as pr19848.
Since it is present in both representation we can handle it after
this.

The attached patches are on top of the patch I emailed for pr19844.

Is everyone sufficiently satisfied with this direction? Can I resend
the patches to llvm-commits and cfe-commits?

Cheers,
Rafael

llvm.patch (48.7 KB)

clang.patch (13 KB)

According to

Airs – Ian Lance Taylor » STT_GNU_IFUNC

From the compiler & assembler point of view this is just a special symbol type.

Given that it cannot be used directly I would say it is not a
ConstantExpr, it is a property of a label.

Given that the special IFUNC label has the same value as the resolver,
it sounds like it is not a GlobalObject, just a special type of
GlobalAlias.

In summary, from a very quick look it looks like it is independent
from this discussion: All that is needed is a bit saying it is a
"IFUNC GlobalAlias", not a plain GlobalAlias, regardless of how we end
up representing GlobalAliases.

Sounds fine. I hadn't thought about it more than "do we want an alias
or a new symbol type" but since the discussion was ongoing here I
thought I'd bring it up.

-eric

This direction looks great to me. Thank you, Rafael!

Not a full review, but typo:

+Since aliasess are only

John.

This direction looks great to me. Thank you, Rafael!

Not a full review, but typo:

+Since aliasess are only

Fixed. I will send the patches to llvm-conmmit once the dependencies are in.

Cheers,
Rafael