New builtin function to get number of lanes in SIMD vectors

tl;dr:
I’m wondering whether is makes sense to have something like a __builtin_vectorelements() function, which returns the number of elements in a SIMD vector either at compile-time if known (fixed-sized vectors like x86, NEON) or at runtime (variable-sized vectors like SVE, RISC-V V).

I’ve posted this in LLVM IR topic (see here), but @nikic pointed out that this does not need a LLVM intrinsics but can be handled from Clang.

Long version:
I’ve been playing around with LLVMs vector types (via vector_size(n)) for cross-platform SIMD code. One issue that came up is the distinction between fixed- and variable-sized vectors. Let’s say I want to have a simple loop to add two arrays with SIMD instructions, along the lines of this C++ code:

// Can be fixed-sized (e.g., 16-byte x86 register)
using VecT __attribute__((vector_size(16))) = int;
// OR variable-sized vector (e.g., SVE or RISC-V V)
using VecT = svuint32_t;

// Generic loop on both platforms
void add_vectors(int* a, int* b, int* c) {
  for (int i = 0; i < N; i += ???) { // <-- how to increment i?
    *(VecT*)c = (VecT&)a[i] + (VecT&)b[i];
    c += i;
  }
}

Godbolt link for example. There is a bit of loop-unrolling in x86, the SVE assembly is a bit easier to read.

The main issue with wanting this to work with both fixed- and variable-sized vectors is that the loop increment is known for one at compile time, e.g., via sizeof(VecT) / sizeof(int). For scalable vectors, this is only known at runtime, so we need a method to get the number of vector elements (or lanes) at runtime. Other than that, the code works on, e.g., x86 and SVE, as shown in the godbolt link.

We could write a num_lanes() method for each platform (as shown in godbolt) that does this for us. I was wondering if it makes sense to have a bultin that does this for us. Something like __builtin_vectorelements() (similar naming to __builin_convertvector()). With both fixed and scalable vectors, programming for both will become more and more common. So maybe it makes sense to provide a method for users to avoid a num_lanes() method on their end for each platform and vector type. All the information needed is available in Clang, so it feels unnecessary to duplicate all the logic in user code again. For fixed-sized vectors, this is rather trivial to implement, and for scalable vectors, I guess we would need to find the right call depending on the size of the vector elements.

An open question that arose when discussing this from a LLVM IR perspective is that we want to do this without hard-coding target information in Clang. For fixed-sized vectors (can be determined via a cast to VectorType), we should be able to just get the vector size and divide it by the element size. However, I did not find a good vector abstraction for scalable vectors. So this might need isSVESizelessBuiltinType() and isRVVSizelessBuiltinType() checks for current scalable vectors instead of a cast to a VectorType. Are there a more generic way to find out whether a vector is scalable?

On the LLVM side, I found ISD::VSCALE, which should do the right thing. But I did not find a way to generate LLVM IR that calls this for us. Is this possible? In that case, I’d add an if/else in the new builtin that returns the value for fixed-sized vectors and some “call” to VSCALE in the other case.

I’m happy to discuss some ideas/thoughts and I’m also willing to implement this if there is interest in it. This is not fully fledged out conceptually in my mind yet, so I’m happy about any feedback or comments.

Best,
Lawrence

I can see the usefulness of this, I don’t think there is any real problem adding this as a builtin.

For the SVE sizeless types, there is just Type::isSizelessType ( which dispatches to isSizelssBuiltinType, https://clang.llvm.org/doxygen/Type_8cpp_source.html#l02371), however you might have to do the isSVE and isRVV variants of this to avoid the WebAsm types (at least when checking in Sema whether it is a valid type for this builtin.

As you mentioned, the number of elements in a VectorType is known at compile time, so that is easy enough for the compile-time version.

I might suggest limiting the argument to taking a type ONLY, as despite how confusing decltype (or typeof) can be, expressions end up having reference types that get confusing as well. It is perhaps valuable to have the builtin strip reference types and qualifiers as well (for the same reason).

Thanks for the input. I’ll start looking into implementing this.

This has been implemented and merged in: [Clang] Add __builtin_vectorelements to get number of elements in vector by lawben · Pull Request #69010 · llvm/llvm-project · GitHub