[RFC] Opt-in vector of bool type

Hi,

We would like to extend Clang to allow 'bool' as a valid vector element
type in C/C++ code for select targets.

This is the natural type for SIMD masks and would facilitate clean SIMD
intrinsic declarations in C/C++ code.
Vectors of i1 are supported on IR level and below down to many SIMD
ISAs, such as AVX512 or the VE target (NEC SX-Aurora TSUBASA).
We understand the historical reasons for not supporting this (gcc
complicance).
However, this would be an opt-in feature and toolchains/targets that do
not want this will be unaffected by the change.

Looking forward to your feedback.

- Simon

There is a temptation when doing this to try to represent these as a vector of i1 in IR. Don't do this, it still has to be i8s, otherwise it causes a number of problems.

I'll leave it to the rest of the mailing list to judge whether the GCC incompatibility is justified. However, I'm curious as to why this would be opt-in on a target basis? Are there some targets that wouldn't be able to legalize this?

There is a temptation when doing this to try to represent these as a vector of i1 in IR. Don't do this, it still has to be i8s, otherwise it causes a number of problems.

Ok, we specifically want to lower it to <N x i1>.. what could go wrong?

I'll leave it to the rest of the mailing list to judge whether the GCC incompatibility is justified. However, I'm curious as to why this would be opt-in on a target basis? Are there some targets that wouldn't be able to legalize this?

I'd say that some targets may value strict gcc compliance higher than
supporting this type (ie if they have no use for it). Making it an
opt-in simply means less disturbance. In any case, it's again completely
in line with the wording of the gcc documentation to scalarize the type
for targets that do not support it.

Ok, we specifically want to lower it to <N x i1>.. what could go wrong?

I'm having trouble recalling the specifics, but we tried it on SYCL (a downstream) and had a ton of problems to the point we removed it. There isn't a good way to handle it from the ABI perspective, there is no good memory representation as a result, and many of the passes were not handling it correctly. It makes it a huge undertaking.

Ok, we specifically want to lower it to <N x i1>.. what could go wrong?

I'm having trouble recalling the specifics, but we tried it on SYCL (a downstream) and had a ton of problems to the point we removed it. There isn't a good way to handle it from the ABI perspective, there is no good memory representation as a result, and many of the passes were not handling it correctly. It makes it a huge undertaking.

That's actually a point in favor of making it opt out.
We do use <256 x i1> and <512 x i1> in our LLVM fork for SX-Aurora and
it is working fine for us. I guess that there could be issues with 'i1'
being smaller than the smallest addressable unit so things like <3 x i1>
could be problematic. I wonder, shouldn't _ExtInt have the exact same
problem?

_ExtInt doesn't occur in vectors, for which we've seen the problems, only in the vector types from the frontend.

Craig Topper is the one who will know better, he should be along in a few hours.

The main issue were various issues in SelectionDAG with loads/store of vectors that aren’t byte sized. For example PR42803 and PR44902.

The extended vector types also support operator which probably assumes the elements are individually addressable?

~Craig

Simon Moll via cfe-dev <cfe-dev@lists.llvm.org> writes:

We would like to extend Clang to allow 'bool' as a valid vector element
type in C/C++ code for select targets.

This is the natural type for SIMD masks and would facilitate clean SIMD
intrinsic declarations in C/C++ code.
Vectors of i1 are supported on IR level and below down to many SIMD
ISAs, such as AVX512 or the VE target (NEC SX-Aurora TSUBASA).
We understand the historical reasons for not supporting this (gcc
complicance).
However, this would be an opt-in feature and toolchains/targets that do
not want this will be unaffected by the change.

Looking forward to your feedback.

FWIW, this would also be useful for SVE predicates when generating
fixed-length code.

It's convenient that both GCC and clang reject boolean element types
as thing stand, e.g.:

    typedef bool foo __attribute__((vector_size(32)));

I think this means that it should be possible to treat packed boolean
vectors as a pure extension to what exists today. Compatibility with
older compilers shouldn't be a concern, it would just be a new feature.

GCC is a moving target too, and I don't know of any reason in principle
why GCC would never support this. A lot of the necessary plumbing
already exists.

So far the principle has been that vector_size should work the same
way regardless of the current target, with the compiler deciding
how best to implement the code. I think it would be good to follow
that for any extension too, rather than making the feature opt-in and
introducing a new target dependency.

Thanks,
Richard

Ah, the result of operator should be an LValue, so it should be addressable. Unfortunately, I don’t think we currently implement that correctly (and consider that a bug). See:

https://godbolt.org/z/Coo8Ai

Note we don’t allow taking a non-const ref or address of a vector element, but GCC does, though presumably that is something we should fix.

That's interesting, i wasn't aware of those bug reports.

The way i see it you are open to supporting this feature in Clang but there are LLVM bugs for <N x i1> types, which we may hit more often as a result, and then there is this unrelated Clang lvalue bug for attribute((vector_size)).

We will use the bool vector feature exclusively in our intrinsic headers to declare bool vectors that cleanly map to our ISA, so in a very controlled way. Of course, once the bool vector feature for Clang is out there people will use it with a certain expectation. Then again, it is already possible to generate arbitrary <N x i1> types in LLVM today, just not using Clang.

- Simon

Ah, the result of operator should be an LValue, so it should be addressable. Unfortunately, I don’t think we currently implement that correctly (and consider that a bug). See:

Note we don’t allow taking a non-const ref or address of a vector element, but GCC does, though presumably that is something we should fix.

FWIW, this would also be useful for SVE predicates when generating
fixed-length code.

Yes, i imagine that a lot of SIMD/vector targets would benefit from this.

It's convenient that both GCC and clang reject boolean element types
as thing stand, e.g.:

    typedef bool foo __attribute__((vector_size(32)));

I think this means that it should be possible to treat packed boolean
vectors as a pure extension to what exists today. Compatibility with
older compilers shouldn't be a concern, it would just be a new feature.

GCC is a moving target too, and I don't know of any reason in principle
why GCC would never support this. A lot of the necessary plumbing
already exists.

So far the principle has been that vector_size should work the same
way regardless of the current target, with the compiler deciding
how best to implement the code. I think it would be good to follow
that for any extension too, rather than making the feature opt-in and
introducing a new target dependency.

I have to say i am pleasantly surprised by the positive feedback for
this proposal :slight_smile: If there are no principal reservations against going
ahead with this, i'll submit a patch on Phabricator that enables 'bool'
in vector_size by default for all targets, so, basically a revert of:

    commit 8c9795d9fa94a7478d18e3906dff2408f379e8d9
    Author: Aaron Ballman <aaron@aaronballman.com>

        vector_size cannot be applied to Booleans. Updated the semantic
checking logic, as well as the comment and added a test case. Fixes PR12649
   
        llvm-svn: 190721

We can flesh out the details during review.

Thanks
- Simon

The way i see it you are open to supporting this feature in Clang but there are LLVM bugs for types, which we may hit more often as a result, and then there is this unrelated Clang lvalue bug for attribute((vector_size)).

I don’t take this as a proper summary of my position. I was warning you about the issues in LLVM, however the biggest issue is the fact that a vector of i1s isn’t individually addressable. Unless you have a way to produce an address for each individual element (which we don’t, and is why std::vector uses a proxy return type), I don’t think this fits in the type system.

Ok, thanks for clarifying your position. If this is a concern, couldn’t we just introduce a new attribute, similar to ‘vector_size’ or ‘ext_vector_type’ that simply does not allow element addressing, ie its values are more or less opaque and will only be consumed and produced as a whole (eg by intrinsics)? This would be good enough for SIMD builtins since those will only assemble and take masks apart outside of C/C++ language semantics.

I’m not sure how open the group are to adding ANOTHER new vector type, but I’d want an RFC that properly defines the semantics of such a type.

Why would we not treat these like bit fields?

I think that we should support this. I’ve worked with several architectures that have native vector predicate registers and it seems reasonable to support these directly.

-Hal

That is an option, its just different from the rest of the vector types so it is a decision we should make intentionally. I’d also expect said patch/RFC to define what this means in the case of all of the operations. For example:

Bool1 + Bool2 (I’m told) has a different meaning in the case of vector hardware than normal bools. I’d expect those to be clarified.

This isn’t as simple as reverting Aaron’s patch, there is some thought that needs to go into how that works in the case of the variety of operators.

I definitely agree that these should have well-defined/documented semantics.

-Hal

"Keane, Erich via cfe-dev" <cfe-dev@lists.llvm.org> writes:

The way i see it you are open to supporting this feature in Clang but there are LLVM bugs for <N x i1> types, which we may hit more often as a result, and then there is this unrelated Clang lvalue bug for attribute((vector_size)).

I don't take this as a proper summary of my position. I was warning you about the issues in LLVM, however the biggest issue is the fact that a vector of i1s isn't individually addressable. Unless you have a way to produce an address for each individual element (which we don't, and is why std::vector<bool> uses a proxy return type), I don't think this fits in the type system.

FWIW, I think taking the address of a vector_size element is already
an error in clang. E.g.

Right, I mentioned that: