Controlling parameter alignment

Currently we don't have a way to express in LLVM IR the alignment
requirements of the function arguments. The `align` attribute is
applicable to pointers only, and only for some special ways of passing
arguments (e..g `byval`). When implementing AAPCS32/AAPCS64, clang
resorts to dubious hacks of coercing to types, which naturally have
the needed alignment.

But we don't have enough types to cover all the cases.

The problem appeared when passing over-aligned Homogeneous
Floating-Point Aggregates (HFAs). When we pass a type with increased
alignment requirements, for example

struct S {
  __attribute__ ((__aligned__(16))) double v[4];
};

Clang uses `[4 x double]` for the parameter, which is passed on the stack
at alignment 8, whereas it should be at alignment 16.

The trick of coercing to, say, `[i128, i128]` does not work, because
the argument may end up in GP registers. A hypothetical coercion to
`[f128, f128]` won't work either, because argument needs to be
expanded to consecutive FP registers and they aren't overlapping on
AArch64 (e.g. Q1 does not overlap D2/D3).

There was a deficiency in the ABI
which is addressed in a proposed fix
(https://github.com/ARM-software/abi-aa/pull/67), which matches GCC
behaviour and the original intent.

With this ABI fix, we would need alignments of 8 and 16 to pass HFA
arguments, although we should be ideally looking at a generic
solution.

There are similar issues with AAPCS32.

One proposal was to adopt the `stackalign` attribute to convey these
alignment requirement, https://reviews.llvm.org/D75903.

Another option is to extend the `align` attribute semantics to apply
to non-pointer arguments (I have a patch for that, which looks very
much as the one above).

To which Reid Kleckner commented like:

Mostly I think I meant that this will be a big change in the meaning
of either the align or the alignstack attributes, and that should be
hashed out on llvm-dev.

Right now align is kind of a hybrid between an optimization annotation
attribute, like dereferenceable or nonnull, and an ABI attribute, like
byval or inreg. When align is used with byval, it affects argument
memory layout. When byval is not present, it is just an optimizer
hint. IMO, ideally, we should separate those two roles.

I should be able to control the alignment of the memory used to pass a
pointer argument, at the same time that I annotate which low bits of
the pointer are known to be zero.

Thoughts?

~chill

Ping?

+Johannes, are there other folks who care about LLVM IR attributes that we can add?

WDYT about trying to separate actual argument memory alignment from pointer argument alignment?

To double check, the problem is the last argument here
https://godbolt.org/z/Gqva4Y does not carry the align(16)
information anymore, correct?

If so, I don't see an immediate problem with allowing an "alignment"
be applied to non-pointer values. The semantics would probably say
that if the value is demoted to memory/stack, the associated alignment
has to be guaranteed. As noted before [0], `align` is actually a hint
and not a requirement. I'm unsure if we could not make it binding TBH,
will think about that, interested in opinions.

I take it we can't/don't want to legalize the constraints in the
backend, right? I mean, use `%sturct.S* align(16) byval(%sturct.S) %p`
in the IR. On that thought, what happens if we have a `struct S *`
argument in source which we then promote to pass it by value?

~ Johannes

[0] https://reviews.llvm.org/D75903#2549569

To double check, the problem is the last argument here
https://godbolt.org/z/Gqva4Y does not carry the align(16)
information anymore, correct?

Basically, yes.

If so, I don’t see an immediate problem with allowing an “alignment”
be applied to non-pointer values. The semantics would probably say
that if the value is demoted to memory/stack, the associated alignment
has to be guaranteed. As noted before [0], align is actually a hint
and not a requirement. I’m unsure if we could not make it binding TBH,
will think about that, interested in opinions.

I take it we can’t/don’t want to legalize the constraints in the
backend, right? I mean, use %sturct.S* align(16) byval(%sturct.S) %p
in the IR. On that thought, what happens if we have a struct S *
argument in source which we then promote to pass it by value?

Yes, we wouldn’t want to use byval in the frontend because it is best to use IR arguments that are lowered correctly regardless of their position in the argument list. Using byval usually forces the argument to be passed in memory, and that isn’t correct if the six double arguments are removed from the original example. This has the surprising consequence that we can’t, for example, remove arguments from vararg functions:

https://reviews.llvm.org/rGe6e88f99b35f2fde5008e36f0d93f5231564b7d8
And, generally, byval results in worse code at the call site than passing SSA values does.

If LLVM is able to do argument promotion, usually that means there is no ABI compatibility concern, and the arguments don’t have to be aligned or arranged in any particular way. I think argument promotion in this case would pass each field individually.

To double check, the problem is the last argument here
https://godbolt.org/z/Gqva4Y does not carry the align(16)
information anymore, correct?

Basically, yes.

If so, I don't see an immediate problem with allowing an "alignment"
be applied to non-pointer values. The semantics would probably say
that if the value is demoted to memory/stack, the associated alignment
has to be guaranteed. As noted before [0], `align` is actually a hint
and not a requirement. I'm unsure if we could not make it binding TBH,
will think about that, interested in opinions.

I take it we can't/don't want to legalize the constraints in the
backend, right? I mean, use `%sturct.S* align(16) byval(%sturct.S) %p`
in the IR. On that thought, what happens if we have a `struct S *`
argument in source which we then promote to pass it by value?

Yes, we wouldn't want to use byval in the frontend because it is best to
use IR arguments that are lowered correctly regardless of their position in
the argument list. Using byval usually forces the argument to be passed in
memory, and that isn't correct if the six double arguments are removed from
the original example. This has the surprising consequence that we can't,
for example, remove arguments from vararg functions:
https://reviews.llvm.org/rGe6e88f99b35f2fde5008e36f0d93f5231564b7d8
  And, generally, byval results in worse code at the call site than passing
SSA values does.

If LLVM is able to do argument promotion, usually that means there is no
ABI compatibility concern, and the arguments don't have to be aligned or
arranged in any particular way. I think argument promotion in this case
would pass each field individually.

Right, the [4xdouble] should not be exposed as first class value by arg
promotion.

----

I think what we need here is a new attribute, or a repurposed attribute, to
control argument alignment. The original patch uses alignstack(n) for this
purpose, which to date is used to control the stack pointer alignment for
the whole call frame, and not for arguments. Do people like that, or is it
confusing? If it's OK to use alignstack(n) on an argument, should we start
using it on byval arguments in addition to align(), or is that just churn?
Is it useful to have this kind of strong separation between ABI attributes
and optimization hint attributes?

alignstack seems OK to me. We can't (easily) reuse align because it can
be dropped elsewhere and we might not want to have special rules.

Using alignstack for byval would also eliminate the special rule (for
align) there I guess.

~ Johannes

Just to be clear, the suggestion is to introduce `alignstack` to
affect argument alignment, while retaining
the current semantics for `align`?

Thus, a pointer argument having both `align(A)` and `alignstack(B)`
would itself be allocated at B boundary (if it happens to be passed in
memory),
while it would contain an A-aligned address?

Using alignstack for byval would also eliminate the special rule (for
align) there I guess.

So, `byval` arguments adopt the `stackalign` attribute with the same
semantics as the current `align`
and then `align` is invalid for `byval` arguments ?

Looks good to me.

Just to be clear, the suggestion is to introduce alignstack to
affect argument alignment, while retaining
the current semantics for align?

Thus, a pointer argument having both align(A) and alignstack(B)
would itself be allocated at B boundary (if it happens to be passed in
memory),
while it would contain an A-aligned address?

Yes, that’s the proposal as I understand it.

Using alignstack for byval would also eliminate the special rule (for
align) there I guess.

So, byval arguments adopt the stackalign attribute with the same
semantics as the current align
and then align is invalid for byval arguments ?

I think there is a backwards compatibility consideration here. It also seems reasonable to allow align to appear with byval as an optimization hint: the optimizer can assume the low bits of such a pointer are zero.

So, `byval` arguments adopt the `stackalign` attribute with the same
semantics as the current `align`
and then `align` is invalid for `byval` arguments ?

I think there is a backwards compatibility consideration here. It also seems reasonable
to allow `align` to appear with `byval` as an optimization hint: the optimizer can assume
the low bits of such a pointer are zero.

Updating the FE to emit `stackalign` for `byval` arguments would cause quite a bit of churn in the testsuite, but
the updates can be done automatically.

Then in the BE, `byval` arguments will use `stackalign`, missing that, fall back to uisng `align` to affect
stack slot alignment.

~chill

Something is not quite right here.

We have up to three relevant alignment properties for a parameter:

  • the alignment of the parameter itself (if it happenes to be passed in memory)
  • if it’s a pointer, the actual alignment of the pointed to memory (as an optimisation aid)
  • if it’s a byval or a byref argument, the minimum alignment of the storage, allocated
    for the original argument value (ABI affecting).

For non-pointer arguments alignstack(N) gives stack slot alignment.
For pointer arguments, we retain that use of alignstack(N) and also have align(M) to give
the actual alignment of the contained pointer.

Now when we add byval or byref to the above, there is no attribute left to give the alignment
of the allocated memory. We thought of using alignstack(N), but that would leave us without a way
to specify the pointer alignment itself.

~chill

I hope I’m not missing something obvious, and if I don’t here’s an idea:

Extend the byval(Ty) attribute to byval(Ty [, Align]) (same for byref).

(Most of the attributes take zero or one parameters, but there’s a precedent with allocsize(<EltSizeParam>[, <NumEltsParam>]))

So we end up with :

  • align(N) for pointer content
  • stackalign(N) for the minimum alignment for of the actual argument, if it ends up in memory
  • byval(Ty[, N]) and byref(Ty[, N]) for the original argument value

Thus something like call %f(%struct.S * alignstack(8) align(32) byval(%struct.S, 16) p);
would mean:

  • the caller has allocated a slot for struct S, that slot is at least 16 bytes aligned
  • the caller is passing a pointer to that slot, which pointer itself should be 8 bytes aligned, if it ends up in memory
  • that pointer happens to have the lower five bits clear

~chill

Hello,

Any comments, suggestions, ideas ?

~chill

I have put up for an initial review a draft/prototype implementation of this proposal above, for byval only.

https://reviews.llvm.org/D100397