Illegal vector type replacement for ARM

I was recently updating the Android sources for Clang/LLVM and noticed that some ARM vector types were being altered by Clang (and passed differently than in prior releases of LLVM 3.x). I tracked this down to the following changelist (r166043):

http://llvm.org/viewvc/llvm-project?rev=166043&view=rev

Author: Manman Ren <mren@apple.com>

ARM ABI: passing illegal vector types as varargs.

We expand varargs in clang and the call site is handled in the back end, it is
hard to match exactly how illegal vectors are handled in the backend. Therefore,
we legalize the illegal vector types in clang:
if (Size <= 32), legalize to i32.
if (Size == 64), legalize to v2i32.
if (Size == 128), legalize to v4i32.
if (Size > 128), use indirect.

rdar://12439123

The most peculiar part of this ABI change (as it is definitely an ABI change from what was shipped in Clang/LLVM 3.x), is that some vec3 types are now being passed as indirect parameters, while the previous behavior was to pass the argument directly. I attached an example file that demonstrates the different behavior from Clang. If compiled for ARMv7, the <3 x i64> parameter will become a <3 x i64>* indirect parameter. Note that <4 x i64> will still be passed as the standard <4 x i64> (same for <2 x i64> and i64). Note that other vec3 types (like <3 x i32>) still keep their original shape with no translation to indirect parameter passing.

Considering that vec3 types are fairly commonly treated as vec4 types, I don’t see why <3 x i64> and <3 x double> should be passed differently than they were before (nor inconsistently with how vec4 works for the same primitive type). This change unfortunately breaks existing Android code that has been compiled with Clang, and I don’t think we are the only users of vector types on ARM that will have trouble with this.

The second major difference that I noticed (also in my sample file) was that that we are now losing semantic information about the incoming arguments to a function with a vector parameter that requires less than 32 bits of storage space (like <2 x i8>, <3 x i8>). In these cases, they are being explicitly coerced to a single i32 parameter, which is later used in the function body to extract the individual vector components. This ends up being disruptive to optimizers that want to analyze the actual input data shape. I also am not sure that this actually simplifies or improves the ARM backend.

I have experimented with removing both of these behaviors from the changelist and I can still generate working ARM code using this modified Clang. Is it possible to revert this ABI change and apply the illegal vector indirection possibly only on larger (non-power-of-2) vector types? I want to get some opinions from the rest of the community before I upload a potential patch, in case there is some other relevant information that I am missing here. I looked through cfe-dev and cfe-commits and didn’t see any comments related to the original change at all.

Thanks,
Steve

t.c (348 Bytes)

Hi Stephen,

Chucking in my tuppence,

If compiled for ARMv7, the <3 x i64> parameter will become a <3 x i64>*
indirect parameter.

This is odd from an AAPCS view. Every other type (except where
required to obey C++ semantics) gets passed directly in ARM. I can't
really see any justification for the pointer and would support
reverting this part.

The second major difference that I noticed (also in my sample file) was that
that we are now losing semantic information about the incoming arguments to
a function with a vector parameter that requires less than 32 bits of
storage space (like <2 x i8>, <3 x i8>). In these cases, they are being
explicitly coerced to a single i32 parameter, which is later used in the
function body to extract the individual vector components.

Whether by coincidence or not, I believe i32 is the only justifiable
way to pass that type (unless you decide it gets promoted to an <8 x
i8> and goes in vector registers, which also loses size information).

In particular, using i16 is incorrect in the big-endian case. For
example, passing the following object in big-endian:

struct { char c1, c2; } my_obj = {1, 2};

needs to result in r0 = 0x01020000, which is not a valid i16.

Cheers.

Tim.

Hi Stephen,

Chucking in my tuppence,

> If compiled for ARMv7, the <3 x i64> parameter will become a <3 x i64>*
> indirect parameter.

This is odd from an AAPCS view. Every other type (except where
required to obey C++ semantics) gets passed directly in ARM. I can't
really see any justification for the pointer and would support
reverting this part.

> The second major difference that I noticed (also in my sample file) was
that
> that we are now losing semantic information about the incoming arguments
to
> a function with a vector parameter that requires less than 32 bits of
> storage space (like <2 x i8>, <3 x i8>). In these cases, they are being
> explicitly coerced to a single i32 parameter, which is later used in the
> function body to extract the individual vector components.

Whether by coincidence or not, I believe i32 is the only justifiable
way to pass that type (unless you decide it gets promoted to an <8 x
i8> and goes in vector registers, which also loses size information).

In particular, using i16 is incorrect in the big-endian case. For
example, passing the following object in big-endian:

struct { char c1, c2; } my_obj = {1, 2};

needs to result in r0 = 0x01020000, which is not a valid i16.

I am not actually advocating a switch to i16 here. I actually do suggest
keeping the original <2 x i8> so that the type/size is preserved
completely. This makes it more consistent with the other vector types and
provides greater flexibility for other bitcode consumers. It also keeps the
ABI consistent with the previous 3.x releases of LLVM, which is something I
thought we were really striving for.

Steve

Whether by coincidence or not, I believe i32 is the only justifiable
way to pass that type (unless you decide it gets promoted to an <8 x
i8> and goes in vector registers, which also loses size information).

Actually, I suppose you *could* argue for <2 x i8> as well, which
would mean it goes in two general purpose registers when LLVM splits
up the illegal vector. Inefficient, but not invalid.

Tim.

Hi Stephen,

Chucking in my tuppence,

If compiled for ARMv7, the <3 x i64> parameter will become a <3 x i64>*
indirect parameter.

This is odd from an AAPCS view. Every other type (except where
required to obey C++ semantics) gets passed directly in ARM. I can't
really see any justification for the pointer and would support
reverting this part.

We can try to match how back-end legalizes these or try to legalize them in front-end by increasing the number of elements.

Thanks,
Manman