structs get decomposed when shouldn't

Hi all,

I'm new on the list, so I want to say hello for everybody!

I'm from Hungary and writing a LLVM backend for Tile64 processor as my
master's thesis. It's a big time pressure on me, so the thesis will
probably describe a backend only providing an assembly printer, but the
development is likely to be continued beyond the thesis.

For now, I've run into a very annoying problem while implementing the
calling convention of Tilera architecture. The ABI says that a struct
can be passed in registers if it fits in them. Moreover, if a struct
gets passed in registers, it must be stored well aligned, i.e. just
like as it resides in memory. A padding register must be maintained
before a double-word aligned value if needed, and more than one value
can be stored in a single register, e.g. two i16 in a i32 register.

As I can understand, LLVM is trying to decompose datatypes into smaller
components in some circumstances. E.g. decomposing a double into two
i32 argument automatically is very useful for me because the processor
consists of only i32 registers. However, this decomposition is a
nightmare in the case of structs should passed inside registers.
Speaking of function arguments, the problem can be mitigated by using a
pointer tagged with byval attribute and catch such an argument in a
custom CC function. On the other hand, when a function should return a
struct, byval can't be used.

Of course, there is no problem in case sret-demotion taking place,
automatically for too big structs or forced by sret attribute.
However, smaller structs get decomposed by default into component
elements as returned values. I googled the net all the day,
but, unfortunately, can't find a solution.

Is there any way to disable this feature of LLVM and get structures
as they are when returning them?

Besides solutions, any suggestions and ideas are well appreciated :slight_smile:

All the best,
David

Hi David,

I'm new on the list, so I want to say hello for everybody!

hello!

I'm from Hungary and writing a LLVM backend for Tile64 processor as my
master's thesis. It's a big time pressure on me, so the thesis will
probably describe a backend only providing an assembly printer, but the
development is likely to be continued beyond the thesis.

For now, I've run into a very annoying problem while implementing the
calling convention of Tilera architecture. The ABI says that a struct
can be passed in registers if it fits in them. Moreover, if a struct
gets passed in registers, it must be stored well aligned, i.e. just
like as it resides in memory. A padding register must be maintained
before a double-word aligned value if needed, and more than one value
can be stored in a single register, e.g. two i16 in a i32 register.

As I can understand, LLVM is trying to decompose datatypes into smaller
components in some circumstances.

Can you please explain more what you are referring to here. LLVM itself
shouldn't be changing function parameters or return types unless the
function has local (internal) linkage (since in that case ABI requirements
don't matter). But perhaps you mean that clang is producing IR with these
kinds of transformations already in it? If so, that's normal: front-ends
are required to produce ABI conformant IR, so probably clang is producing
IR for some other ABI (eg: x86). If so, you will need to teach clang about
your ABI.

Ciao, Duncan.

  E.g. decomposing a double into two

> As I can understand, LLVM is trying to decompose datatypes into smaller
> components in some circumstances.

Can you please explain more what you are referring to here. LLVM itself
shouldn't be changing function parameters or return types unless the
function has local (internal) linkage (since in that case ABI requirements
don't matter).

This is in the backend of LLVM itself. When converting the LLVM IR to its DAG
representation prior to selection, CodeGen asks the target to take care of
function parameters. Unfortunately the only interface it presents for the
target code to make that decision is a sequence of MVTs: iN, float, double,
vNiM, vNfM. Structs are split into their component members with no indication
that they were originally more than that.

This has affected a couple more people recently (including me):

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048203.html
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-
Mon-20120326/055577.html

If this interface could be improved, I believe clang simply apply a function
to its QualType and produce an LLVM type which does the right thing. Without
that improvement clang will have to use a context-sensitive model to map the
whole sequence of arguments.

At least, that's the ARM situation. I'm not sure Ivan's can even be solved
without an improved interface (well, he could probably co-opt byval pointers
too, but that's Just Wrong).

This most recent one, I'm not sure about. Whether a struct can be mapped to a
sane sequence of iN types probably hinges on the various alignment constraints
and whether an argument can be split between regs and memory. (If a split is
allowed then you can probably use [N x iM] where the struct has size N*M and
alignment M (assuming iM has alignment M), otherwise that would be wrong).

And Juhasz David wrote:

the problem can be mitigated by using a
pointer tagged with byval attribute and catch such an argument in a
custom CC function.

That's the approach I've currently adopted for some of my work, but It's
incomplete for my needs and I'm rather concerned about the performance of what
does work: unless we reimplement mem2reg in the backend too, it introduces
what amounts to an argument alloca with associated load/store very late on.

Tim.

Hi Tim,

As I can understand, LLVM is trying to decompose datatypes into smaller
components in some circumstances.

Can you please explain more what you are referring to here. LLVM itself
shouldn't be changing function parameters or return types unless the
function has local (internal) linkage (since in that case ABI requirements
don't matter).

This is in the backend of LLVM itself. When converting the LLVM IR to its DAG
representation prior to selection, CodeGen asks the target to take care of
function parameters. Unfortunately the only interface it presents for the
target code to make that decision is a sequence of MVTs: iN, float, double,
vNiM, vNfM. Structs are split into their component members with no indication
that they were originally more than that.

yup, front-ends have to take care of more complicated ABI details. For example
the front-end should currently use "byval" for any (parts of) structs that need
to be passed on the stack, and explicit scalars for struct bits that should go
in registers.

This has affected a couple more people recently (including me):

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048203.html
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-
Mon-20120326/055577.html

If this interface could be improved, I believe clang simply apply a function
to its QualType and produce an LLVM type which does the right thing.

I don't think this is possible, for example I doubt you can handle the x86-64
ABI in a context free way.

  Without

that improvement clang will have to use a context-sensitive model to map the
whole sequence of arguments.

At least, that's the ARM situation. I'm not sure Ivan's can even be solved
without an improved interface (well, he could probably co-opt byval pointers
too, but that's Just Wrong).

I must have missed that discussion, since I don't know what Ivan's problem is.

This most recent one, I'm not sure about. Whether a struct can be mapped to a
sane sequence of iN types probably hinges on the various alignment constraints
and whether an argument can be split between regs and memory. (If a split is
allowed then you can probably use [N x iM] where the struct has size N*M and
alignment M (assuming iM has alignment M), otherwise that would be wrong).

And Juhasz David wrote:

the problem can be mitigated by using a
pointer tagged with byval attribute and catch such an argument in a
custom CC function.

That's the approach I've currently adopted for some of my work, but It's
incomplete for my needs and I'm rather concerned about the performance of what
does work: unless we reimplement mem2reg in the backend too, it introduces
what amounts to an argument alloca with associated load/store very late on.

Byval is designed for the situation in which the callee takes the address of
the struct. Thus it provides a pointer to a block of memory. However there
is also the situation in which the struct is not addressable (just like a
virtual register) and just needs to have bits of it passed on the stack because
the ABI says so (also like virtual registers: the first ones are passed in
registers, the rest on the stack). To make this easier, maybe there should be
an "onstack" parameter attribute (kind of the opposite to "inreg"), which says
that an argument should be passed on the stack. Then you can break your struct
up into bits that should be passed in registers ("inreg" attribute), bits that
should be passed transparently (i.e. not addressably) on the stack ("onstack"
attribute) and bits that should be passed addressably on the stack ("byval").

Ciao, Duncan.

Hi Duncan,

yup, front-ends have to take care of more complicated ABI details. For
example the front-end should currently use "byval" for any (parts of)
structs that need to be passed on the stack, and explicit scalars for
struct bits that should go in registers.

> If this interface could be improved, I believe clang simply apply a
> function to its QualType and produce an LLVM type which does the right
> thing.

I don't think this is possible, for example I doubt you can handle the
x86-64 ABI in a context free way.

Interesting. I've taken a look at http://www.x86-64.org/documentation/abi.pdf.

The complex rule appears to be reversion to stack if not all of an aggregate
would fit in registers. Couldn't this be implemented rather easily if LLVM
*did* take note of structs as a whole? Clang would pass pass its aggregate as
an LLVM struct with each field being an appropriate eightword (or larger if
SSEUP fields were involved). LLVM would decide whether enough registers are
free for the whole thing and either use them or shove it on the stack.

Though, having just seen the X86_64 implementation in clang, it's nowhere near
as horrific as I'd feared it would be. In fact, it's probably significantly
less code than would be involved in moving it to LLVM, even if it does seem
wrong for a front-end to care about registers. I think I'll drop this crusade.

I must have missed that discussion, since I don't know what Ivan's problem
is.

Essentially that pointers need to go in different registers to integers (and,
presumably, all other valid MVTs).

Thanks Duncan!

Tim.

Hi guys,

Thanks for the replies.

My basic problem was what Tim referred that the backend legalizes
structs into a sequence of MVTs while building the DAG. However, my
situation is not so complicated here than the example you wrote. Values
can't be split between registers and memory according to Tilera ABI,
and all the arguments need to be passed one after the other as they
come in the signature of the function. So I need only the size and
alignment of the struct to decide what to do with it.

However, the problem was two-fold, as I want to use the backend to
compile C programs through LLVM IR, and clang made some wrong modifications to the program because it compiled against x86 ABI, just
as Duncan said. So I made my own clang target for my backend, and found
out a working solution for the problem of broken up structs.

As I wrote, byval is a viable workaround for me. When a struct goes into
the argument area of the caller, callee simply grabs the address. When
the struct fits in registers, the caller copies it to its own stack
frame and use that address.

As for returning a struct, I checked out what other clang targets
generate in such a case. Eventually, the solution is to bitcast structs
to appropriate integer types when returning them, so preventing LLVM
from hacking anything inside structs. It's not a problem to split a
returning struct into register-size parts because there is only one
returning value anyway. The important thing was for me to keep the
content of the struct padded and aligned.

David