How to represent __attribute__((fastcall)) functions in the IL

Functions with __attribute__((fastcall)) pop their arguments and take
up to two arguments in ecx and edx. Currently we represent them by
just setting the x86_fastcallcc calling convention. The problem is
that the ABI has some strange conventions on when a register is used
or not. For example:

void __attribute__((fastcall)) foo1(int y);

will take 'y' in ecx, but

struct S1 {
  int x;
};
void __attribute__((fastcall)) foo2(struct S1 y);

will use the stack. Even more surprising is that

void __attribute__((fastcall)) foo8(struct S1 a, int b);

will take 'a' in the stack but 'b' in *edx*. That is, the first
argument consumed ecx but didn't use it. The implement this the IL
needs to be able to represent that
* an argument will go on registers
* a register was consumed but unused

A way to do this is to make clang produce byval for anything that goes
on the stack, but that produces worse code as the caller now has to
use an alloca instead of a simple value.

Another option (which I implemented in a patch just sent to
llvm-commits) is use the inreg attribute which we already use for the
regparm C attribute. The above requirements are handled by putting
inreg in the arguments that should go in register and creating a inreg
padding argument when a register should be consumed but not used.

The above examples would be irgened to:

declare x86_fastcallcc void @foo1(i32 inreg

declare x86_fastcallcc void @foo2(i32 inreg /*dummy*/ , i32)

declare x86_fastcallcc void @foo8(i32 inreg /*dummy*/, i32, i32 inreg)

As Eli pointed out on llvm-commits, this would change the meaning of
existing x86_fastcallcc IL files, but given that they are only
produced for fastcall C functions and those are fairly broken without
this fix, I assume that is OK.

Cheers,
Rafael

Personally, I’d love to see a setup where instead of LLVM implementing each calling convention and ABI hack, we provide a means of actually describing this. Specifically, I’d love to see a design for how to specify in the IR which register(s) (if any register(s)) a particular value should be placed into.

Don’t get me wrong, I don’t have any good ideas about how to do this, I’m just hoping someone does. End result might allow something like:

declare void @foo(double inreg <eax,edx> %x)

Hi Chandler,

We were discussing about this on the Cambridge LLVM Social a while
ago, and we felt that there were far too much target dependent stuff
in the procedure call standards as it is.

Our approach would be to have a PCS layer, either in the front-end or
as a pass, that would know about both language and target to be able
to construct calls as the target is expecting (ABI-wise).

David Chisnall proposed a PCSBuilder (similar to IRBuilder, for
building function calls), where you just pass the basic info (return
type, arguments, flags, name) and it builds the function for you,
mangling names, changing parameters and assigning things to registers
when the ABI is less than helpful, possibly having an inreg syntax
like you describe.

My idea was to make it a pass, so you could delay the PCS mess up to a
later stage, when possibly you'd already have more information about
the target, but that's not necessarily true and might open the can of
worms that is having a multi-layered IR. For simplicity, we might
consider David's approach first, and move the code later, if the idea
of a multi-layered IR gets on.

Would that make sense in Clang? It should be a matter of code movement
more than new code, and would provide a common place for more
front-ends to use.

Duncan,

Would that make sense in dragonegg?

Hi Renato,

Could we have some sort of metadata input for the front-end, where "fastcall" would be explicitly defined, together with the required targets that implement it?
This way users could provide their own calling convention definitions, and use them in a similar fashion to fastcall, i.e. __attribute__((my_convention)).

The bitcode could only specify which calling convention is being used, and each target lowering would need to translate it properly.

-Krzysztof

Don't get me wrong, I don't have any good ideas about how to do this, I'm
just hoping someone does. End result might allow something like:

declare void @foo(double inreg <eax,edx> %x)

Not sure I would go all the way to specifying registers in the IL
(although I liked it at some point). What I like most right now is
something along the lines of
http://llvm.org/pr12193. That makes it explicit if something is on the
stack or in registers and that information is correct for both the
caller and callee. What is left for codegen is counting the register
arguments and computing stack offsets.

Implementing that requires way more time than I have right now, but I
think this proposal is a small step in the right direction as it makes
clang the one responsible for marking an argument as memory or
register.

Cheers,
Rafael

Hi Rafael,

Don't get me wrong, I don't have any good ideas about how to do this, I'm
just hoping someone does. End result might allow something like:

declare void @foo(double inreg <eax,edx> %x)

Not sure I would go all the way to specifying registers in the IL
(although I liked it at some point). What I like most right now is
something along the lines of
http://llvm.org/pr12193. That makes it explicit if something is on the
stack or in registers and that information is correct for both the
caller and callee. What is left for codegen is counting the register
arguments and computing stack offsets.

I'm 100% in favour of having "onstack" as a complement to "inreg". I'm
not so happy about the more funky changes you suggested in the PR, namely
having the callee no longer match the caller, but "onstack" itself makes
a lot of sense to me.

Implementing that requires way more time than I have right now, but I
think this proposal is a small step in the right direction as it makes
clang the one responsible for marking an argument as memory or
register.

Ciao, Duncan.

Makes a lot of sense to me too.

-Chris

In this case, I think it's even worse than "aapcs" or "fastcall", that
are target dependent, but at a higher level.

Proposing at which register each variable will be, forces the
front-ends to know all about all targets LLVM supports (the register
names for x86_64 will be different than for x86, which will be
different than ARM, Thumb, MIPS, etc). Which is not just a
language/ABI issue, but hardware architecture one.

Having the PCSBuilder / PCS pass, would decouple the front-end of the
back-end, at least on PCS matters.

However, I agree with you that we should not have function signatures
that are different than their calls.

That said, I also don't like the idea of filling the IR with tons of target
specific stuff.

In this case, I think it's even worse than "aapcs" or "fastcall", that
are target dependent, but at a higher level.

Proposing at which register each variable will be, forces the
front-ends to know all about all targets LLVM supports (the register
names for x86_64 will be different than for x86, which will be
different than ARM, Thumb, MIPS, etc). Which is not just a
language/ABI issue, but hardware architecture one.

I don't really disagree with anyone who finds this a bit distasteful,
but understand that we are already in an even worse situation.

The frontend must in fact reason in extreme detail about the targets
that LLVM supports, and these frontends already have the exact
register positioning (if not naming) encoded in them.

In addition, frontends must currently reason about exactly how each
target in LLVM will choose to lower various arguments and return
values of various types, and try to conjure up an LLVM type and
parameter distribution which each target will lower into the exact
registers which are mandated by the ABI for that target.

So the situation today is that frontends have all of this
target-specific information *and* there is an implicit, unspecified
contract between the frontend, LLVM's IR, and each target backend
about exactly how the lowering through these will occur in order to
ensure that the actual code generated matches the ABI requirements the
frontend is trying to encode. :: sigh ::

While in theory, I would love it if the frontends could be unaware of
such details, it seems impractical. The ABI is really tho combination
of language and target, whether we like it or not. I think we just
need a model for explicitly describing the required lowering, and
hopefully in a way orthogonal to the LLVM IR type system so that we
don't have to waste large amounts of IR complexity on shoving bits
into and out of peculiar IR types.

Unfortunately, I have no such concrete design in mind, and I certainly
still think that the onstack thing is a step in the right direction.

-Chandler

That said, I also don't like the idea of filling the IR with tons of target
specific stuff.

In this case, I think it's even worse than "aapcs" or "fastcall", that
are target dependent, but at a higher level.

Proposing at which register each variable will be, forces the
front-ends to know all about all targets LLVM supports (the register
names for x86_64 will be different than for x86, which will be
different than ARM, Thumb, MIPS, etc). Which is not just a
language/ABI issue, but hardware architecture one.

I don't really disagree with anyone who finds this a bit distasteful,
but understand that we are already in an even worse situation.

The frontend must in fact reason in extreme detail about the targets
that LLVM supports, and these frontends already have the exact
register positioning (if not naming) encoded in them.

In addition, frontends must currently reason about exactly how each
target in LLVM will choose to lower various arguments and return
values of various types, and try to conjure up an LLVM type and
parameter distribution which each target will lower into the exact
registers which are mandated by the ABI for that target.

So the situation today is that frontends have all of this
target-specific information *and* there is an implicit, unspecified
contract between the frontend, LLVM's IR, and each target backend
about exactly how the lowering through these will occur in order to
ensure that the actual code generated matches the ABI requirements the
frontend is trying to encode. :: sigh ::

While in theory, I would love it if the frontends could be unaware of
such details, it seems impractical. The ABI is really tho combination
of language and target, whether we like it or not. I think we just
need a model for explicitly describing the required lowering, and
hopefully in a way orthogonal to the LLVM IR type system so that we
don't have to waste large amounts of IR complexity on shoving bits
into and out of peculiar IR types.

Unfortunately, I have no such concrete design in mind, and I certainly
still think that the onstack thing is a step in the right direction.

+1

This is a good description of the situation. Adding more information
to the IR is just making the current use of an implicit contract
explicit. At the same time, the addition of this contract should give
the frontend much more leverage to be explicit and get the right ABI
without abusing the IR. In practice, that could result in across the
board wins for compile time and for execution time in the cases our
abusive IR doesn't get optimized well.

The other piece of this is that currently the mid-level optimizers are
really inhibited as far as what they can do to rewrite function calls
arguments. Allowing things to be more explicit in the IR would go some
ways towards allowing the mid-level optimizers to rewrite function
arguments without changing the ABI.

- Daniel

Chandler Carruth-2 wrote

That said, I also don't like the idea of filling the IR with tons of
target
specific stuff.

In this case, I think it's even worse than "aapcs" or "fastcall", that
are target dependent, but at a higher level.

Proposing at which register each variable will be, forces the
front-ends to know all about all targets LLVM supports (the register
names for x86_64 will be different than for x86, which will be
different than ARM, Thumb, MIPS, etc). Which is not just a
language/ABI issue, but hardware architecture one.

I don't really disagree with anyone who finds this a bit distasteful,
but understand that we are already in an even worse situation.

The frontend must in fact reason in extreme detail about the targets
that LLVM supports, and these frontends already have the exact
register positioning (if not naming) encoded in them.

In addition, frontends must currently reason about exactly how each
target in LLVM will choose to lower various arguments and return
values of various types, and try to conjure up an LLVM type and
parameter distribution which each target will lower into the exact
registers which are mandated by the ABI for that target.

So the situation today is that frontends have all of this
target-specific information *and* there is an implicit, unspecified
contract between the frontend, LLVM's IR, and each target backend
about exactly how the lowering through these will occur in order to
ensure that the actual code generated matches the ABI requirements the
frontend is trying to encode. :: sigh ::

While in theory, I would love it if the frontends could be unaware of
such details, it seems impractical. The ABI is really tho combination
of language and target, whether we like it or not. I think we just
need a model for explicitly describing the required lowering, and
hopefully in a way orthogonal to the LLVM IR type system so that we
don't have to waste large amounts of IR complexity on shoving bits
into and out of peculiar IR types.

Unfortunately, I have no such concrete design in mind, and I certainly
still think that the onstack thing is a step in the right direction.

Why not move the ABI reasoning to the lowering stage instead, and simply
specify which calling convention should be used in the function declaration?

The complex reasoning that's currently in clang to determine the placement
of each parameter could be moved to LLVM and called in the lowering phase.

One problem as you say of course is that the ABI is tied to the language,
and so some information is lost in the generation of the IR from the AST;
i.e. it isn't obvious how to get an LLVM function back to what function /
method / constructor it came from in the source language.
However it wouldn't be impossible to attatch this needed information to
the LLVM function when it's created so that the lowering code *can* have
all the information it needs.

The big win for this approach is lowering the bar for new frontends using
ABIs which have lowering code already written. Another is that language
interoperability is made significantly easier: if I want to call some code
from language X, then all I have to do is tell LLVM to use language X's
calling convention for this function and specify what the declaration in X
looks like. Instead, at the moment, I'd have to duplicate the ABI logic
for X to determine what the declaration looks like for the particular target
I'm compiling on. Similarly, if I'm in language X and want to expose
some functions as a C functions, I don't need to know the details of the
C ABI.

Food for thought anyway.
James

I think we just
need a model for explicitly describing the required lowering, and
hopefully in a way orthogonal to the LLVM IR type system so that we
don't have to waste large amounts of IR complexity on shoving bits
into and out of peculiar IR types.

We're not against that. The proposal is to move away the complexity
that front-ends need to handle by moving the code out to a PCSBuilder,
that would do the bridge between the language and the target parts of
the ABIs.

The IR would probably still have inreg, onstack and others, but at
least in a consistent way. We're not advocating for removing
information from the IR, but to make it consistent across all
front-ends, and hopefully easing the life of front-end engineers.

This PCSBuilder could have several versions (or a collation of
versions, via policies), so you could join the front and back parts of
ABIs on the fly, but that's irrelevant for the discussion at this
point.

Unfortunately, I have no such concrete design in mind, and I certainly
still think that the onstack thing is a step in the right direction.

No arguments here. I'm happy with the 'onstack' flag.

Not sure I would go all the way to specifying registers in the IL
(although I liked it at some point). What I like most right now is
something along the lines of
http://llvm.org/pr12193. That makes it explicit if something is on the
stack or in registers and that information is correct for both the
caller and callee. What is left for codegen is counting the register
arguments and computing stack offsets.

I'm 100% in favour of having "onstack" as a complement to "inreg". I'm
not so happy about the more funky changes you suggested in the PR, namely
having the callee no longer match the caller, but "onstack" itself makes
a lot of sense to me.

Makes a lot of sense to me too.

It is way too late for a formal for a BOF, but maybe we could meet
during the hacker lab to discuss onstack, explicit regs, and other
ideas regarding how to represent calling conventions?

-Chris

Cheers,
Rafael

Sure, sounds good. There is also a rumor that the devmtg may have an "UnBOF" section, which is specifically for organized-on-demand BOF like content. More details to come,

-Chris