struct passing on X86-64

It appears the X86-64 backend doesn't currently respect the X86-64 ABI
with respect to struct passing (by value).

For example:
  %struct.tiny = type { i32, i32 }
  call void @foo(%struct.tiny %1) nounwind

Will actually pass two i32 register arguments (%edi,%esi) instead of a
single i64 register argument (%rdi).

The frontends (llvm-gcc, clang) appear to do the argument munging
themselves in order to compensate.

Any plans or suggestions for implementing this properly in the backend?

Thanks,
  David M

David Meyer <pdox@google.com> writes:

The frontends (llvm-gcc, clang) appear to do the argument munging
themselves in order to compensate.

Yep. There are lots of corner cases that the frontend MUST handle
because LLVM does not have the necessary infrastructure.

Any plans or suggestions for implementing this properly in the backend?

I don't think anyone has signed up to do the work.

                         -Dave

Hi Dave,

The frontends (llvm-gcc, clang) appear to do the argument munging
themselves in order to compensate.

Yep. There are lots of corner cases that the frontend MUST handle
because LLVM does not have the necessary infrastructure.

I think it's more like: because LLVM doesn't have the necessary information.
Due to LLVM using structural equivalence, all kinds of types that are different
in the original language end up being the same type in LLVM. Chris once
explained to me, IIRC, that the ABI is defined in terms of the original language
types, and unfortunately some ABI's require different handling for types that
are structurally equivalent. In short, it's not always possible to know how a
LLVM type should be passed, because you don't know which of several possible
front-end types it came from. So you would either have to annotate the IR with
all kinds of front-end type information, or require front-ends to do the ABI
conformance stuff. The current solution is the latter.

See PR4246 for a plan to have generic helper codes for ABI lowering.

Ciao, Duncan.

Duncan Sands <baldrick@free.fr> writes:

Yep. There are lots of corner cases that the frontend MUST handle
because LLVM does not have the necessary infrastructure.

I think it's more like: because LLVM doesn't have the necessary information.

True.

Due to LLVM using structural equivalence, all kinds of types that are
different in the original language end up being the same type in LLVM.
Chris once explained to me, IIRC, that the ABI is defined in terms of
the original language types, and unfortunately some ABI's require
different handling for types that are structurally equivalent.

Just so I understand this better, for which ABIs is this the case? It's
not for x86_64. ARM, perhaps?

See PR4246 for a plan to have generic helper codes for ABI lowering.

It seems a bit complicated to me. Wouldn't it be simpler to just encode
the language type information in metadata, or, if metadata isn't
appropriate because it can be dropped, in some language information
piece within Type? Then the target ABI stuff can know exactly what to
do and we won't have various other pieces all trying to figure out the
same stuff.

                             -Dave

Duncan Sands <baldrick@free.fr> writes:

Yep. There are lots of corner cases that the frontend MUST handle
because LLVM does not have the necessary infrastructure.

I think it's more like: because LLVM doesn't have the necessary information.

True.

Due to LLVM using structural equivalence, all kinds of types that are
different in the original language end up being the same type in LLVM.
Chris once explained to me, IIRC, that the ABI is defined in terms of
the original language types, and unfortunately some ABI's require
different handling for types that are structurally equivalent.

Just so I understand this better, for which ABIs is this the case? It's
not for x86_64. ARM, perhaps?

Take, for example, _Complex long double on x86-64. Or unions on x86-64. :slight_smile:

-Eli

Eli Friedman <eli.friedman@gmail.com> writes:

Just so I understand this better, for which ABIs is this the case? It's
not for x86_64. ARM, perhaps?

Take, for example, _Complex long double on x86-64. Or unions on x86-64. :slight_smile:

Grr...long double is evil in general. Just use __float128. :wink:

                               -Dave

There's a similar problem I've been grappling with in the Pure compiler. Pure can load LLVM bitcode files (and even produce them on the fly, using clang or llvm-gcc, if you inline C/C++/Fortran code in Pure scripts) and make the functions in those files callable from Pure without having to explicitly declare the call interfaces.

This is very convenient and works reasonably well, except that the bitcode loader can't distinguish between void* and char* which both end up as i8* in bitcode. (This needs to be done so that Pure strings can be properly marshalled to C and back.)

To solve this, it would be *very* nice to have some kind of annotation in the bitcode which tells me whether an i8* is actually void* or char* on the C side (this would then have to be generated by the C/C++ frontend).

The only way to get that kind of information that I currently see is to somehow make clang also spit out the AST along with the bitcode, but that seems wasteful. (If anyone has a better idea, please let me know.)

So this kind of type annotation would be useful for more than ABI lowering, and I'd really welcome that addition.

Albert

David A. Greene wrote:

Duncan Sands<baldrick@free.fr> writes:

Yep. There are lots of corner cases that the frontend MUST handle
because LLVM does not have the necessary infrastructure.

I think it's more like: because LLVM doesn't have the necessary information.

True.

Due to LLVM using structural equivalence, all kinds of types that are
different in the original language end up being the same type in LLVM.
Chris once explained to me, IIRC, that the ABI is defined in terms of
the original language types, and unfortunately some ABI's require
different handling for types that are structurally equivalent.

Just so I understand this better, for which ABIs is this the case? It's
not for x86_64. ARM, perhaps?

See PR4246 for a plan to have generic helper codes for ABI lowering.

It seems a bit complicated to me. Wouldn't it be simpler to just encode
the language type information in metadata, or, if metadata isn't
appropriate because it can be dropped, in some language information
piece within Type? Then the target ABI stuff can know exactly what to
do and we won't have various other pieces all trying to figure out the
same stuff.

Why should the backends know about the frontend language? It seems sensible to me that if I create a new language and a new ABI for my language then I can expect to need to teach the backend about my new ABI.

We already have the default ABI for a target and the per-function calling conventions. Let's assume that neither of those are good choices for solving my problem because the ABI matches the default with only a few special cases. We could add ABI notes to the llvm::Function which specify that parameter 3 is an arm-abi-name "foo-style-passing", and the ARM backend would have to be taught how to handle that. (It would also be nice if TargetData could tell you whether a given ABI note applies to your current target.)

What's nice about this is that separates the concerns of lowering the IR and the workings of the ABI. What's not so hot is that now we'll have to standardize these notes in LangRef.

That's orthogonal to the concern of PR4246 which talks about providing a generic layer for lowering from the C type system to the ABI.

Does this sound like a sensible start?

Nick

Nick Lewycky <nicholas@mxc.ca> writes:

Why should the backends know about the frontend language? It seems
sensible to me that if I create a new language and a new ABI for my
language then I can expect to need to teach the backend about my new
ABI.

And so the backend has to be taught about the language. To me, it is
about conveying the necessary information in a more portable way so the
mapping code only has to be written once.

with only a few special cases. We could add ABI notes to the
llvm::Function which specify that parameter 3 is an arm-abi-name
"foo-style-passing", and the ARM backend would have to be taught how
to handle that. (It would also be nice if TargetData could tell you
whether a given ABI note applies to your current target.)

How are those ABI notes different from source language information?

That's orthogonal to the concern of PR4246 which talks about providing
a generic layer for lowering from the C type system to the ABI.

Does this sound like a sensible start?

Comment 1 of PR4246 is something like what I was getting at. What's not
specified in the bug is how the type mapping is represented. I'm with
Eli in that it should be a dirt-simple structure to convey the necessary
information. Something that just says, for example, this use of i8 * is
a char * and this other use of i8 * is a void *.

                             -Dave

David A. Greene wrote:

Nick Lewycky<nicholas@mxc.ca> writes:

Why should the backends know about the frontend language? It seems
sensible to me that if I create a new language and a new ABI for my
language then I can expect to need to teach the backend about my new
ABI.

And so the backend has to be taught about the language.

Only if that's what the ABI says. If my hypothetical ABI says that in a Z-context, the CR6 register controls whether floating point values are passed in registers or in memory, then the ABI-note says whether we're in a Z-context or not.

   To me, it is

about conveying the necessary information in a more portable way so the
mapping code only has to be written once.

Making which part portable? Your proposed solution is specific to C/C++ isn't it? I'd like LLVM to continue abstracting away the high-level language. We even added TBAA without baking in any frontend knowledge into LLVM, it's the frontend that defines the alias sets.

with only a few special cases. We could add ABI notes to the
llvm::Function which specify that parameter 3 is an arm-abi-name
"foo-style-passing", and the ARM backend would have to be taught how
to handle that. (It would also be nice if TargetData could tell you
whether a given ABI note applies to your current target.)

How are those ABI notes different from source language information?

In every ABI I know of, that's what they'll be.

I guess what I'm objecting to is tying the ABI decision to a front-end language type system specifically. I also think that storing the ABI details as separate from the rest of the IR (ie., getting rid of "zeroext" -- just use the right (eg. i8) type please, and leave the fact it's passed in a 32-bit register to the ABI note) is a good idea in general. I meant to throw this out there and collect some feedback, not make a proposal to be implemented as-is.

Nick

Nick Lewycky <nicholas@mxc.ca> writes:

David A. Greene wrote:

And so the backend has to be taught about the language.

Only if that's what the ABI says. If my hypothetical ABI says that in
a Z-context, the CR6 register controls whether floating point values
are passed in registers or in memory, then the ABI-note says whether
we're in a Z-context or not.

Of course.

  To me, it is

about conveying the necessary information in a more portable way so the
mapping code only has to be written once.

Making which part portable? Your proposed solution is specific to
C/C++ isn't it? I'd like LLVM to continue abstracting away the
high-level language. We even added TBAA without baking in any frontend
knowledge into LLVM, it's the frontend that defines the alias sets.

Oh, absolutely the source language should be abstracted as much as
possible. If we can represent C/C++ ABI requirements without talking
about char * and void *, I'm all for it! I was responding to cases
where the LLVM type is not enough to distinguish between two or more
source language types that are handled differently by the ABI. In those
cases the language/ABI requirements have to be conveyed somehow. Maybe
it's metadata (or something equivalent) that specifies struct field
offsets, for example.

with only a few special cases. We could add ABI notes to the
llvm::Function which specify that parameter 3 is an arm-abi-name
"foo-style-passing", and the ARM backend would have to be taught how
to handle that. (It would also be nice if TargetData could tell you
whether a given ABI note applies to your current target.)

How are those ABI notes different from source language information?

In every ABI I know of, that's what they'll be.

Ok, yep.

I guess what I'm objecting to is tying the ABI decision to a front-end
language type system specifically. I also think that storing the ABI
details as separate from the rest of the IR (ie., getting rid of
"zeroext" -- just use the right (eg. i8) type please, and leave the
fact it's passed in a 32-bit register to the ABI note) is a good idea
in general. I meant to throw this out there and collect some feedback,
not make a proposal to be implemented as-is.

Sure. I think everything you've said makes sense. We're in violent
agreement. :slight_smile:

                               -Dave