Argument Lowering

We all know that LLVM codegen doesn't handle struct arguments...er...at
all. Frontend code has to do a bunch of ABI lowering to make it work.

Can anyone explain why LLVM can't do it? I've read vague hints of, "not
all the information is there," but I'd really like to understand this
better. Is there something we can do to make proper argument handling
possible within LLVM? It would save a lot of duplicated work for a lot
of people.

                            -David

Hi Hal,

Can anyone explain why LLVM can't do it? I've read vague hints of, "not
all the information is there," but I'd really like to understand this
better.

The most compelling example I've heard is the x86_64 ABI. It refers to
POD types, unions and structs with non-aligned fields.

I think POD is a red herring because Clang would have to deal with
that anyway and the solution is essentially a byval pointer. Even the
non-aligned fields thing looks similar: any struct with such gets
passed in MEMORY, so a simple front-end test and a byval pointer
should work.

Unions are a much bigger problem. I can't see how a front-end could
decide how to pass them without implementing the given rules itself
(some unions have to have SSE type, some INTEGER, ...). At that stage
you've got quite large duplication between front-end and back-end
anyway. Probably more than is required by the current register
counting front-ends.

I still think the model could work well for ARM-style procedure call
standards (unions are only distinguished by size at the PCS level
there).

Is there something we can do to make proper argument handling
possible within LLVM?

At a purely implementation level, the first problem at the moment is
that IR-level types are discarded by the time call lowering happens.
Structures are split up into their constituent fields and those are
all a backend has available for its decisions.

But, because of the above concerns, the only properly viable solution
would probably be to reintroduce union types to LLVM IR. Which is more
complexity for optimisations to handle. I understand they never really
worked properly in the first place.

Cheers

Tim.

Sorry, not Hal. Don't know why I'd confused you David.

Tim.

Rather than trying to have LLVM codegen take care of ABI issues (which means
passing all kinds of extra information in the IR), another possibility is to
have LLVM provide a helper library for generating correct IR. You would say
to the library: my parameter is a union type with these fields (described
using C/C++ language concepts such as unions, POD etc), and it would tell you
what IR to output (or output it for you). This library could then be used by
clang and every front-end confronted with ABI issues. There is a bug report
about this somewhere.

Ciao, Duncan.

Tim Northover <t.p.northover@gmail.com> writes:

Can anyone explain why LLVM can't do it? I've read vague hints of, "not
all the information is there," but I'd really like to understand this
better.

The most compelling example I've heard is the x86_64 ABI. It refers to
POD types, unions and structs with non-aligned fields.

I think POD is a red herring because Clang would have to deal with
that anyway and the solution is essentially a byval pointer. Even the
non-aligned fields thing looks similar: any struct with such gets
passed in MEMORY, so a simple front-end test and a byval pointer
should work.

This makes sense to me. These kinds of things are special cases and I
think that's ok to leave to the language expert (the frontend).

Unions are a much bigger problem. I can't see how a front-end could
decide how to pass them without implementing the given rules itself
(some unions have to have SSE type, some INTEGER, ...). At that stage
you've got quite large duplication between front-end and back-end
anyway. Probably more than is required by the current register
counting front-ends.

Hmm. Yes, unions are a problem and it seems like the frontend would
have to handle these. At the very least it has to pick a type to
represent each of the overlapping regions. Dang, I'll have to think
on this more. :slight_smile:

At a purely implementation level, the first problem at the moment is
that IR-level types are discarded by the time call lowering happens.
Structures are split up into their constituent fields and those are
all a backend has available for its decisions.

When you say, "Structures are split up into their constituent fields,"
what do you mean, exactly?

But, because of the above concerns, the only properly viable solution
would probably be to reintroduce union types to LLVM IR. Which is more
complexity for optimisations to handle. I understand they never really
worked properly in the first place.

Yeah, I don't think that's an option.

I ask because I just spent several days diagnosing an ABI problem here.
It struck me that a lot of time probably gets wasted fixing these kinds
of bugs over and over again.

                            -David

Duncan Sands <baldrick@free.fr> writes:

Rather than trying to have LLVM codegen take care of ABI issues (which means
passing all kinds of extra information in the IR), another possibility is to
have LLVM provide a helper library for generating correct IR. You would say
to the library: my parameter is a union type with these fields (described
using C/C++ language concepts such as unions, POD etc), and it would tell you
what IR to output (or output it for you). This library could then be used by
clang and every front-end confronted with ABI issues. There is a bug report
about this somewhere.

That sounds like an excellent idea. I am basically at the point of
rewriting our argument handling due to some past sub-optimal design
choices. I can't promise anything at the moment (gotta go through
corporate) but it would be in my interest to contribute that code to
such a library.

                         -David

Hi David,

At a purely implementation level, the first problem at the moment is
that IR-level types are discarded by the time call lowering happens.
Structures are split up into their constituent fields and those are
all a backend has available for its decisions.

When you say, "Structures are split up into their constituent fields,"
what do you mean, exactly?

By the time the backend gets to decide where arguments are going,
instead of seeing a prototype like

declare @foo({i32, i8, i64} %arg)

it gets presented with (essentially)

declare @foo(i32 %arg.1, i8 %arg.2, i64 %arg.3)

There's no hint that a structure was ever involved, let alone
information on its layout. I think that would be a fairly simple
matter to resolve though, if a little risky if not handled carefully.

Tim.

Tim Northover <t.p.northover@gmail.com> writes:

At a purely implementation level, the first problem at the moment is
that IR-level types are discarded by the time call lowering happens.
Structures are split up into their constituent fields and those are
all a backend has available for its decisions.

When you say, "Structures are split up into their constituent fields,"
what do you mean, exactly?

By the time the backend gets to decide where arguments are going,
instead of seeing a prototype like

declare @foo({i32, i8, i64} %arg)

it gets presented with (essentially)

declare @foo(i32 %arg.1, i8 %arg.2, i64 %arg.3)

Ok, that's what I thought you meant, but wanted to make sure.

There's no hint that a structure was ever involved, let alone
information on its layout. I think that would be a fairly simple
matter to resolve though, if a little risky if not handled carefully.

It's this loss of information I was hoping to avoid by passing the
structs and having CodeGen lower the passing correctly. But unions are
a problem.

Damnable unions.

                        -David

Wasn't Dave Abrahams (CC'd) exploring in this direction?

-- Sean Silva

I was, and I may yet pursue that job further. I'll have to talk to my
new colleagues about that in a couple weeks.