Vectors with non-power-of-2 elements

We are experiencing a number of problems with handling vectors whose number
of elements is not a power-of-2, and in particular 3-element vectors. With
the following example:

  #include <stdio.h>

  typedef float __attribute__((ext_vector_type(3))) float3; // Clang Only
  // typedef float __attribute__((vector_size(12))) float3; // For GCC

  volatile float3 v3f32;

  int main() {
    float3 f3 = { 1.1f, 2.2f, 3.3f };
    printf ( "Sizeof 'float3' is %d\n", sizeof(float3));

    v3f32 = f3; // Force a write
    f3 = v3f32; // Force a read

    return 0;
  }

'clang' reports the size as being 16-bytes, and transacts the object to and
from memory as 16-bytes. Also, when vectors of 3-elements are passed with
VARARGS, I have to use 'va_arg' with the 4-element variant or the compiler
will crash when validating the types.

We have no special code for handling 3-element vectors, and I have
subsequently tried this with the X86 binary distributions of 'clang' v3.5.2
and v3.7.0 and I observe the same issue as we are seeing in our SHAVE
target.

With 'gcc' and the 'element_size' variant, I get an error complaining that
the number of bytes is not a power-of-2, but a comment in
'tools/clang/lib/Sema/SemaType.cpp' says:

  // Success! Instantiate the vector type, the number of elements is > 0,
and
  // not required to be a power of 2, unlike GCC.

which would lead me to believe that 3-element vectors should be fine.

Is there something I have to describe in my target machine implementation or
target transform information that will allow 'float3' above be 12-bytes, and
to transact to memory using 12-byte transfers? Or is this a more general
bug in the implementation? I have experimented with DataLayout changes such
as:

  -v96:32
  -v48:16
  -v12:8

but this just results in crashes in LLVM.

With the types of algorithms that are developed for our platform, 3-element
vectors are quite common. Less common, but also fairly frequent are
5-element and 7-element vectors (pixel analysis and 2D convolutions).
OpenCL provides for 2-, 3-, 4-, 8- and 16-element vectors, but it is not
clear to me that the 3-element vector support for OpenCL is working either.
Longer term, it would be valuable to us if Clang/LLVM supported 3-, 5- and
7-element vectors as first-class citizens of the compiler (e.g. v3f32, v7i8,
etc.), but that is a topic for another day. For now I am happy if I can get
the 'v3X' types working.

Thanks,

  MartinO - Movidius Ltd.