tl;dr:
I’m wondering whether is makes sense to have something like a new @llvm.vector.numelements()
intrinsic function, which returns the number of elements in a SIMD vector either at compile-time if known (fixed-sized vectors like x86, NEON) or at runtime (variable-sized vectors like SVE, RISC-V V).
Long version:
I’ve been playing around with LLVMs vector types (via vector_size(n)
) for cross-platform SIMD code. One issue that came up is the distinction between fixed- and variable-sized vectors. Let’s say I want to have a simple loop to add two arrays with SIMD instructions, along the lines of this C++ code:
// Can be fixed-sized (e.g., 16-byte x86 register)
using VecT __attribute__((vector_size(16))) = int;
// OR variable-sized vector (e.g., SVE or RISC-V V)
using VecT = svuint32_t;
// Generic loop on both platforms
void add_vectors(int* a, int* b, int* c) {
for (int i = 0; i < N; i += ???) { // <-- how to increment i?
*(VecT*)c = (VecT&)a[i] + (VecT&)b[i];
c += i;
}
}
Godbolt link for example. There is a bit of loop-unrolling in x86, the SVE assembly is a bit easier to read.
The main issue with wanting this to work with both fixed- and variable-sized vectors is that the loop increment is known for one at compile time, e.g., via sizeof(VecT) / sizeof(int)
. For scalable vectors, this is only known at runtime, so we need a method to get the number of vector elements (or lanes) at runtime. Other than that, the code works on, e.g., x86 and SVE, as shown in the godbolt link.
We could write a num_lanes()
method for each platform (as shown in godbolt) that does this for us. I was wondering if it makes sense to have a LLVM intrinsic that does this for us. Something like @llvm.vector.numelements
, which can be wrapped in Clang with something like __builtin_vectorelements()
(similar naming to __builin_convertvector()
). With both fixed and scalable vectors, programming for both will become more and more common. So maybe it makes sense to provide a method for users to avoid a num_lanes()
method on their end for each platform and vector type. All the information needed is available in LLVM, so it feels unnecessary to duplicate all the logic in user code again. For fixed-sized vectors, this is rather trivial to implement, and for scalable vectors, I guess we would need to find the right call depending on the sie of the vector elements.
Does this make sense at all? Is an intrinsic function the right thing to use here? Or is there maybe already a way to express this? Maybe we only need a Clang wrapper here instead of an LLVM intrinsics?
I’m happy to discuss some ideas/thoughts and I’m also willing to implement this if there is interest in it.
Best,
Lawrence