ABI: how to let the backend know that an aggregate should be allocated on stack

Hi All,

I am trying to handle the Homogeneous Aggregate for ARM-VFP according to the spec:

C.1.vfp If the argument is a VFP CPRC and there are sufficient consecutive VFP registers of the appropriate type unallocated then the argument is allocated to the lowest-numbered sequence of such registers.

C.2.vfp If the argument is a VFP CPRC then any VFP registers that are unallocated are marked as unavailable. The NSAA is adjusted upwards until it is correctly aligned for the argument and the argument is copied to the stack at the adjusted NSAA. The NSAA is further incremented by the size of the argument. The argument has now been allocated.

We currently expand the Homogeneous Aggregate in Clang, but that does not conform to the standard when we have a few VFP registers available but not enough.

In that case, the beginning members of HA will be allocated to VFP, and the rest will go on stack.

To fix the problem, it will be great if we can let the backend know the HA will be on stack and later VPF CPRCs will be on stack as well.
There are some discussions on this, at least from the comments in TargetInfo.cpp:
// This assumption is optimistic, as there could be free registers available
// when we need to pass this argument in memory, and LLVM could try to pass
// the argument in the free register. This does not seem to happen currently,
// but this code would be much safer if we could mark the argument with
// ‘onstack’. See PR12193.

I am just wondering whether it is necessary to add onstack flag and is there any issue related to that?

Another option, suggested by Daniel, is to convert HA to a convenient similar type the backend won’t pass in registers.
I tried to pass a struct with vector types, but the backend will expand the struct
See llvm::ComputeValueVTs
// Given a struct type, recursively traverse the elements.

I tried to use indirect in Clang, it does not work out as I wish.

Any suggestion on how to fix this is highly appreciated!

Thanks,
Manman

See MipsABIInfo::getPaddingType; a similar sort of approach should
work here. (Granted, onstack would be more convenient, but it doesn't
exist at the moment.)

-Eli

In llvm-gcc, this decision was handled near llvm-arm.cpp:2737 in llvm_arm_aggregate_partially_passed_in_regs(). Basically, available registers would be counted up and if the HA didn’t fit, it went byval instead.

I agree that we should unify this sort of logic in one place. I’m not sure that onstack is the best interim step toward that. Does byval work here?

Alex

Byval does not work for me, it will try to split the struct to fit into available core registers and the rest on stack.

I will look into that, thanks

At the time, the ARM target didn't actually handle byval. Now it does.

You should be able to get the old struct passing capability if you don't apply an attribute at all.

Alex

At the time, the ARM target didn't actually handle byval. Now it does.

You should be able to get the old struct passing capability if you don't apply an attribute at all.

Indirect with byval being false will pass the whole struct via stack, but it will occupy one Core register for passing the address.

Thanks,
Manman

That is a strange byval implementation. Maybe the llvm ARM backend
should be changed to always pass byval on the stack? Clang can create
regular (integer, fp) arguments for the registers.

Cheers,
Rafael

That is a strange byval implementation. Maybe the llvm ARM backend
should be changed to always pass byval on the stack? Clang can create
regular (integer, fp) arguments for the registers.

The problem is that the ABI says the argument *should* be split
between registers and stack. The relevant callbacks in clang only get
to suggest one type (+ a padding dummy going before if they want);
they can't (currently) say "put the first 4 bytes here and the rest
there".

Given that constraint "byval" is probably the sanest option since it's
special anyway.

That could be changed of course, but I'm not convinced Clang would be
improved for it.

Tim.

The problem is that the ABI says the argument *should* be split
between registers and stack. The relevant callbacks in clang only get
to suggest one type (+ a padding dummy going before if they want);
they can't (currently) say "put the first 4 bytes here and the rest
there".

Given that constraint "byval" is probably the sanest option since it's
special anyway.

That could be changed of course, but I'm not convinced Clang would be
improved for it.

I see. Clang would then have to split the register and stack parts
itself. I also realized you would still need a padding argument
anyway, in case a following argument does fit in vfp but should go in
the stack because the vfp registers are marked unavailable.

Tim.

Cheers,
Rafael

Byval does not work for me, it will try to split the struct to fit into
available core registers and the rest on stack.

That is a strange byval implementation. Maybe the llvm ARM backend
should be changed to always pass byval on the stack? Clang can create
regular (integer, fp) arguments for the registers.

The current definition of the byval attribute in LangRef says nothing about requiring passing the argument on the stack. It just says it "should really be passed by value". When discussing the alignment, it does refer to a stack slot, but it isn't at all clear that it is required to be on the stack.

From looking at the PowerPC backend, I got the impression that it does not interpret the byval attribute to mean that an argument must go on the stack. It could be entirely in registers or split between registers and stack. For Intel, on the other hand, there seem to be many cases where byval is intentionally used as a substitute for the "on stack" attribute that Manman was looking for.

It would be good to clarify the intention of this in the docs.