[cfe-commits] Fix handling of ARM homogenous aggregates

Hi,

(Forward from cfe-commits, where some backend stuff has come up).

This is an issue I've been thinking about quite a bit recently, and I agree that the biggest problem is the one below:

* The big thing still missing here is that there is no logic to check how many VFP registers have already been used for other arguments. When deciding whether to pass an argument as a homogeneous aggregate, one of the criteria is that the entire aggregate has to fit into the remaining unused argument registers, right?

I tend to think that if every front-end has to implement the entire VFP PCS to decide how to pass an HFA, something has gone wrong. So I've come to the conclusion that the real flaw is LLVM not exposing enough information to the target-dependent backend code for it to do the right thing. By the time the target is involved, all that remains of any composite type is:
  * The fields completely separated if it was naturally by value. {float, float} just gives you two "float" parameters for example.
  * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g. "{float, float}* byval".

Even in the first case there's no indication of where a composite type begins and ends. The latter could be bludgeoned to mean "this is an HFA, put it in VFP regs", but it would be unspeakably ugly.

I believe that if the LLVM original Type* pointer is exposed to TargetLowering (perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do with both Small Structures and HFAs in a sane manner: writing a front-end which adheres to the PCS would be much easier for any source language. The worry is the apparent layering violation by passing a Type* further down. But I'd argue that the TargetLowering functions involved are constructing a DAG from nothing rather than transforming an existing DAG; giving them LLVM source-level information is justifiable.

Given that, the simpler implementation is via byval pointers, but they have some issues with efficiency (phases like ScalarRepl can't get to work replacing getelementptrs with extracts since the implicit alloca happens during DAG construction -- just look at what happens to mips small structs now). With more work, the truly natural equivalence would be possible and a front-end could simply "call void @foo({float, float} %val)" and everything would work.

Of course, while the second approach is nice in isolation, it may not exactly fit in with what other backends do.

Any thoughts?

Tim.

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Hi Tim,

So I've come to the conclusion that the real flaw is LLVM
not exposing enough information to the target-dependent
backend code for it to do the right thing.

We also had this problem. You might find this patch useful as a starting point:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048266.html

/Patrik Hägglund

Thanks. I'd considered using MachineFunction fiddling purely from a
LowerFormalArguments perspective (I hadnn't noticed the subtlety that
LowerCall needs this info to be passed in).

Doesn't this mean you have to replicate all the machinations of
SelectionDAGBuilder to work out which argument you're dealing with at any
given moment, though? I'm thinking of how it splits structs and implicitly
adds sret parameters in particular, though there may be more I don't know of.

How does this information get handled by the TableGen calling-conv code in
your situation? The only way I can think of is a custom CCState which gets
told about each argument as it passes by and allows CCCustom functions to
access its special information (or, possibly, a CCIf with a cast).
    
    CCCustom<"TellCCStateAboutArg">,
    [...]
    CCIf<"cast<MyCCState>(State).isPointerArg()">, CCAssignToReg<[P1, P2]>>,

Putting that information in the InputArg/OutputArg and incorporating it the
CCAssignFn interface allows a more straightforward implementation in the
targets, in my view (for both our uses). It's also information that's readily
available when InputArg/OutputArgs are being constructed. In your case:

    CCIf<"SourceTy->isPointerTy()", CCAssignToReg<[P1, P2]>>;

I've got a patch which implements it for ARM and X86 (though not HFAs using
the features yet, I'm still musing on the best interface to present there --
"HFA* byval" for target simplicity or "HFA" for user simplicity), I'll see if
I can clean it up for other targets and send it for comparison.

The main issue with my approach is that split struct args are still tricky:
they get identical types and another custom CCState is needed to handle them
en-masse (to find out where we are in the struct). Optimal for that case might
be an extra flag similar to isSplit(), but for structs.

Thoughts?

Tim.

Doesn't this mean you have to replicate all the machinations of
SelectionDAGBuilder to work out which argument you're dealing with at any
given moment, though?

We have some logic for matching flattened arguments back to original arguments.

How does this information get handled by the TableGen calling-conv code in
your situation?

We only use CCCustom for our target.

/Patrik Hägglund

Here's the patch I was talking about. It's probably not quite ready for
committing yet, but I think it's a step towards supporting the HFA ABI without
excessive work from the front-end or duplication of SelectionDAGBuilder code.

If this is the right approach, it still needs to be decided whether the extra
complexity and inconsistency (with MIPS etc) of passing an argument via
"{float, float}" vs "{float, float}* byval" is worth the simplified interface
and possible extra optimisation opportunities.

Opinions, anyone? (Hint hint).

Tim.

callingconv-sourcetype.diff (39.7 KB)

Tim,

Opinions, anyone? (Hint hint).

I think here stuff should be thought of from different points. While
providing source type for argument might be beneficial, it might cause
moving the code from frontend to backend. Consider e.g. passing struct
by value including crazy padding inside. The ABI might specify that
padding should be removed and struct is passed field-by-field.

Also, note that in many cases the ABI rules are worded in terms of
source language which might now be preserved during IR generation,
so...

Hi all,

I think that ABI of LLVM IR level is different from ABI on real architecture
such as ARM or x86. ABI of LLVM IR level doesn't consider about register
usage. It just describes parameters and padding information related to
alignment of parameters. As Anton mentioned, LLVM have expressed ABI
information on bitcode using front-end. If someone wants to maintain information
from high level language like C or C++ on ABI of LLVM IR level, backend must
be modify. In order to change parameter information from high level language
to target specific information, I recommend to modify llvm::TargetLowering::LowerCallTo().
You will be able to make padding information related to target.

Good luck,
Jin-Gu Kang

I think that ABI of LLVM IR level is different from ABI on real architecture
such as ARM or x86. ABI of LLVM IR level doesn't consider about register
usage. It just describes parameters and padding information related to
alignment of parameters.

I'm not sure what you mean here. LLVM's IR certainly doesn't care about
registers and so on, but the LLVM backends have to, and front-ends have to
know to a greater or lesser degree how the backends actually do it so that
they can create ABI compliant code.

My view (possibly biased by the ARM ABI) is that LLVM's primary goal should be
to make writing an ABI-compliant front-end as easy as possible. After that it
should aim to have a sane ABI for hand-written LLVM code, and finally it
should try to follow the ABI itself where possible (the last two are possibly
interchangeable, but the first is primary).

The current situation with HFAs is that, without changes to make the backend
aware of the concept, the front-end needs to know the entire sequence of
previous arguments and how LLVM lowers them to work out how to pass an HFA
correctly.

The goal I'd like to see reached is that a front-end should be able to map one
of its types to an LLVM type and know that if it uses that LLVM type then LLVM
will do the right thing. As far as I'm aware, this is what happens for other
targets already (we *are* a bit weird with the HFAs). I think this is
achievable for the ARM ABI too: LLVM's type system is certainly rich enough to
capture the distinctions necessary.

From Anton:
I think here stuff should be thought of from different points. While
providing source type for argument might be beneficial, it might cause
moving the code from frontend to backend.

That could certainly go too far, but conceptually it's not necessarily a
massive problem: if multiple front-ends implement the same ABI calling
conventions, then perhaps the shared backend is the right place to put that
common code.

And conversely, I think that if a front-end is worrying about the allocation
of register numbers then something is a little awry.

But I suppose there will be a substantial cost to implementing this, wherever
we put it.

Consider e.g. passing struct
by value including crazy padding inside. The ABI might specify that
padding should be removed and struct is passed field-by-field.

To me that would still be a prime candidate for the front-end doing the work:
it still seems to have an essentially context-free representation as a
(sequence of) LLVM types.

Also, note that in many cases the ABI rules are worded in terms of
source language which might now be preserved during IR generation,
so...

I'm not sure I follow this point. Is preserving the source language a bad
thing for some reason I'm missing? Certainly, if it affects optimisation it
would be.

Tim.

Hi Tim

I'm not sure I follow this point. Is preserving the source language a bad
thing for some reason I'm missing? Certainly, if it affects optimisation it
would be.

Let's consider one example:

union {
  float foo[4];
  int bar[3];
};

This is definitely not a HFA. However, such a union can be represented
via several different things in LLVM IR: [4 x float], [4 x i32], [32 x
i8] (all involving bitcasts to access one of the fields of a union).
And here we have a problem: 4 x float can be thought as HFA at IR
level, however it's certainly not since the HFA rules are worded using
C-level constructs and not IR-level.

So, my point is that IR is not expressible enough to capture all
source information necessary to model ABI properly. Do you have good
solution for this problem?

Hi Tim

> I'm not sure I follow this point. Is preserving the source language a bad
> thing for some reason I'm missing? Certainly, if it affects optimisation
> it would be.

Let's consider one example:

union {
  float foo[4];
  int bar[3];
};

This is definitely not a HFA. However, such a union can be represented
via several different things in LLVM IR: [4 x float], [4 x i32], [32 x
i8] (all involving bitcasts to access one of the fields of a union).
And here we have a problem: 4 x float can be thought as HFA at IR
level, however it's certainly not since the HFA rules are worded using
C-level constructs and not IR-level.

I'd say the bulk of the ABI is specified in simpler terms than the C language,
much closer to LLVM's IR (in fact, in at least one respect higher level than
C: arrays can be first-class argument types).

Only after the actual rules have been given does the ABI say what the C/C++
mapping to these concepts is. Presumably other languages that want to be
compatible will define their own mapping. It's a two-phase approach which
seems fairly well-suited to LLVM's IR and structure.

So, my point is that IR is not expressible enough to capture all
source information necessary to model ABI properly. Do you have good
solution for this problem?

I think it's expressive enough to provide an interface for each category the
ABI cares about though:
    + Integer types of various widths and alignments.
    + Floating types, similarly.
    + Vectors as above.
    + Composite types that are HFAs.
    + (In the 64-bit case) Composite types less than 16 bytes in size.
    + Non-HFA, non-small composite types.

In this example I'd say clang's job (ideally) would be to represent the union
using some type in the final category ([4 x i32] is probably sufficient in the
32-bit world right now, because it turns out the ABI doesn't care about
splitting between registers and stack).

This kind of issue is always going to be present: any front-end is going to
have to lower its internal representation to some LLVM type and discard
information doing so, but I think it's neater if that's all it has to do.

We should have a chat about this at the conference later. I'm in favour of the
backend solution, but could certainly live with the other. I think deciding
the correct approach is the most important thing.

Tim.