Vectors in structures

Yes, there are multiple issues but they all involve source compatibility.

Hi Bob, than this is a completely different matter altogether.

I do not have access to ARM's compiler(s) but I'm assuming that the first example will not compile because vadd_u32 expects arguments of type uint32x2_t. Using structs in llvm does not "fix" a compatibility problem, but it helps our users write NEON code that will work with ARM's compiler.

Indeed, the types are different, you will get an incompatible parameter error.

I am getting requests that we do that regardless of ARM's opinion, but I've resisted based on the notion that portability and compatibility are worth fighting for. It is pretty ironic and frustrating to me to hear that even people at ARM think these wrapper structs are a bad idea. I would still prefer to have ARM publish specifications and guidelines that make it possible to write portable NEON code. Do you guys even care about that?

Nobody said that the structures are a bad idea, nor that it's not
worth fighting for compatibility.

What I said was:

1. The use of structures is an implementation choice. GCC chooses not
to, we chose to. Simple as that.

2. The use of structures, *in IR*, is not necessary. Even using
structure in the source code, you can easily detect NEON types and
transform the IR accordingly.

We do concern ourselves with compatibility, more than people normally
believe. But there are certain constraints (partners, design issues,
integration) that we simply cannot ignore.

My first proposition of making every NEON call an intrinsic could help
not only IR generation and codegen, but also make the arm_neon.h
header more compatible with ARM's without the need of reinterpreting
structures. I still have to think more about it (haven't thought about
the header at all, so far), but this is something I can do and am
willing to do to help Clang without breaking compatibility with ARM
(the last thing I would want).

We could (maybe should) discuss the intrinsic issue off-list, though.

Nobody said that the structures are a bad idea, nor that it's not
worth fighting for compatibility.

What I said was:

1. The use of structures is an implementation choice. GCC chooses not
to, we chose to. Simple as that.

Really? ARM's specifications of these types show them as structs, and as my example demonstrates, GCC's "implementation choice" allows code that is incompatible with ARM's compiler. I guess you are saying that is OK.

2. The use of structures, *in IR*, is not necessary. Even using
structure in the source code, you can easily detect NEON types and
transform the IR accordingly.

I agree. I mentioned earlier that, as an implementation choice, llvm relies on its SROA optimization to remove the structs from the IR. You may choose to do that in the front-end. It shouldn't matter.

We do concern ourselves with compatibility, more than people normally
believe. But there are certain constraints (partners, design issues,
integration) that we simply cannot ignore.

I really don't know what you mean here. It sounds like those "certain constraints" might be referring to the fact that GCC's implementation is widespread and has become a de facto standard. If that's the case it is silly for llvm to continue trying to enforce the stricter type checking required for compatibility with ARM's compiler. We'll just adopt GCC's approach.

We've already had to support GCC-compatibility as an option, so dropping compatibility with ARM will make things easier for us. I'm just going to plan to go ahead with that. I would have preferred some official specification from ARM acknowledging GCC's approach as a valid alternative, but your statement (1) above seems to say exactly that.

My first proposition of making every NEON call an intrinsic could help
not only IR generation and codegen, but also make the arm_neon.h
header more compatible with ARM's without the need of reinterpreting
structures. I still have to think more about it (haven't thought about
the header at all, so far), but this is something I can do and am
willing to do to help Clang without breaking compatibility with ARM
(the last thing I would want).

We could (maybe should) discuss the intrinsic issue off-list, though.

Adding unnecessary intrinsics goes against the design philosophy of llvm and causes issues in the backend. I don't understand why that matters to you. If you have specific patches to propose for Clang, perhaps I can give you more specific feedback.

Really? ARM's specifications of these types show them as structs, and as my example demonstrates, GCC's "implementation choice" allows code that is incompatible with ARM's compiler. I guess you are saying that is OK.

OK in the sense that it works, yes. OK in the sense that I welcome it, then no.

I'm not an old ARM employee, so I don't know all the decisions taken
on NEON specs or why GCC had the design decisions it had. But I can't
possibly believe that they have to do what ARM says they should.

What I can do, however, is to help you keep the source compatibility
and reduce the cost of producing IR, if you think it's worth the cost.
I never said it wasn't, and you seem to think it is, so I guess we
agree in that.

We've already had to support GCC-compatibility as an option, so dropping compatibility with ARM will make things easier for us. I'm just going to plan to go ahead with that. I would have preferred some official specification from ARM acknowledging GCC's approach as a valid alternative, but your statement (1) above seems to say exactly that.

You're taking conclusions on your own thoughts... please don't put
words in my mouth.

The only point I made in my first email is that using structures in
the IR was not necessary, regardless of the headers.

If you want to keep source compatibility with armcc, I welcome and can help you.

Can we ask Lee or Richard or somebody for a ruling on why the spec is
the way it is? It would probably also help if armcc's arm_neon.h were
available publicly for us to examine to further the discussion.

What's being debated here seems like an optimization for compile time.
I'm certainly in favor of that, but I can think of a ton of things I'd
rather that we work on first that matter for correctness. For example,
we're lacking proper EH, MCization of ARM attributes, accurate
scheduling, etc.

deep

Can we ask Lee or Richard or somebody for a ruling on why the spec is
the way it is?

Hi Sandeep,

Al already explained what the requirements are (name mangling rules)
and that this is the only requirement.

As far as I understood (Al could correct me if I'm wrong), the
structures were used to hold the double values inside, since a double
aligns naturally at 64 bits, and you have to pack two doubles in a
structure for quad instructions (128 bits). Also, it's easy to name a
structure of two doubles, so the name mangling comes for free.

But this is as far as I go. I'd happily forward any further question
about the NEON (or any other ARM) ABI decisions.

It would probably also help if armcc's arm_neon.h were
available publicly for us to examine to further the discussion.

Indeed it would. I actually thought it was, TBH. It could be a
protection against patent trolling, I don't know, but will certainly
ask for a more definite answer.

As I said, I'm new at ARM and there's a lot I have to learn. Please be patient.

What's being debated here seems like an optimization for compile time.
I'm certainly in favor of that, but I can think of a ton of things I'd
rather that we work on first that matter for correctness. For example,
we're lacking proper EH, MCization of ARM attributes, accurate
scheduling, etc.

Indeed! We're getting involved in such matters as well, as you may
have seen in other threads.

Unfortunately, we don't have spare resources to work full time on it
at the moment. We're doing our best with the time we have, to at least
share the knowledge and at most help implementing the core missing
features. I truly hope you don't take it as lack of interest (from me
or ARM).

I'm not an old ARM employee, so I don't know all the decisions taken
on NEON specs or why GCC had the design decisions it had. But I can't
possibly believe that they have to do what ARM says they should.

I was lead on the NEON intrinsics in the ARM compiler back in 2005,
but I didn't do the gcc implementation, or write that bit of the ABI.

We wanted to define types and intrinsics, with readable names (not too
many underscores) but without polluting the user namespace (in the C sense).
So we had arm_neon.h define the user-level types and intrinsics in terms of
types (__simdxx) and intrinsics (__ndp_xxx) in the C implementation namespace.
The __simd types are "intrinsic" types - i.e. types that are recognized
and handled specially by the front end. They're structures because that
was the easiest way to create a specially-named type with a certain size
and alignment, as we didn't have anything like gcc's vector_size attribute.
We didn't intend programmers would be able to do anything with the __simd
structures, and our own compiler documentation (as opposed to our ABI
specification) doesn't even mention them.

The name mangling is a consequence of our implementation - in our compiler
the mangling happens to be done at a point in the front end where the types
have the __simd names and before they get lowered to native vectors.
With hindsight, we should have defined a specific name mangling rule.

Several years later, the NEON intrinsics were implemented in gcc.
At that point we had two implementations of the intrinsics, and it was
thought they should work together at the binary level too. So at least
the parameter passing and mangling had to be defined. Appendix A was
added to the AAPCS. Essentially it reverse-engineers enough of the
original implementation so that if you implement it the same way (i.e.
using struct __simdxxx), the parameter passing and name mangling would
fall out naturally. But as we know from gcc, you can make the types
whatever you want as long as you arrange for the parameter passing and
mangling to follow the rules. So the ABI requires more than is needed
for binary compatibility (and more than gcc in fact does) and not enough
for source compatibility.

We're having some discussions here about how we can reword the ABI to
make it clearer what is required of an implementation.

In any case, anything we say about source compatibility or binary
compatibility shouldn't constrain how a compiler works internally.
Though with LLVM bitcode being in some sense an 'interface', there might
be benefit in agreeing how particular kinds of things (e.g. architecture
or language-specific features) were represented in LLVM-IR. I.e. a
specialization or binding of LLVM-IR for an architecture.

Al

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.