[Subject changed; it was "Vectors with non-power-of-2 elements"]
I've looked at this at bit more, and it looks like it is a bug. The 'clang'
front-end permits vectors to be declared which have a non-power-of-2 number
of elements, while 'gcc' forbids the size of a vector to have a
non-power-of-2 number of bytes.
But beyond permitting the declaration it does not appear to follow through
on the logical semantics.
When LLVM sees these, it will either split a vector which exceeds the size
of a natural vector register, or widen it if it is too small.
This is okay for lowering arithmetic and other operations within the
processor, but both the size and memory accesses are not consistent with the
For programs that iterate over images, it is very common to view the element
3, 5 or 7 elements at a time. The underlying frame is typically an array of
the corresponding scalar type, but the programmer needs to take advantage of
accessing it using explicit vectorisation. For example:
for(char3 x = (char3*)(row + 1); x < endtest; ++x)
But this surprisingly accesses 16-bytes at a time from memory, and not 12.
For reads this is not a big problem provided the access stays within valid
addressable memory; but for writes the excess overwrite is critical.
Does anyone know about how this is supposed to behave? The IR for the
memory accesses and the 'sizeof' are generated by 'clang', so it is already
too late for the target. I haven't been able to find a target configurable
feature in 'clang' that would allow me to get the behaviour I need.