Here is a sample file that compiles correctly on g++ and MSVC:
#include <cstdio> #include <xmmintrin.h>
template <class V > struct foo {
V m;
operator V() const { return m; }
};
int main()
{
foo <__m128> x = {{1, 2, 3, 4}};
__m128 y = _mm_shuffle_ps(x, x, 0);
}
Latets clang (fresh from the SVN) on Linux x86 give the following error:
src/pipo.cpp:16:14: error: first two arguments to __builtin_shufflevector must be vectors
__m128 y = _mm_shuffle_ps(x, x, 0);
^~~~~~~~~~~~~~~~~~~~~~~
In file included from src/pipo.cpp:2:
/usr/local/lib/clang/2.0/include/xmmintrin.h:726:10: note: instantiated from:
(__builtin_shufflevector(a, b, (mask) & 0x3, ((mask) & 0xc) >> 2, \
^
1 error generated.
This error doesn't appear if we use _mm_add_ps or other intrinsics. Same behavior for __m128i and __m128d
I think we actually need to introduce explicit casts into xmmintrin.h;
__builtin_shufflevector is a type-variant method, so it's impossible
to figure out the type of the shuffle if the inputs are classes.
A quick follow-up on this matter. Everything went fine afterwards and we have some news on the actual SIMD performacnes of clang code.
Some of our tests actually don't run for other reason and return meaningless results due to some quirks in timing functions
BUT !! Most of our results are within a 5% margin with g++-4.5 generated executable using our algorithms.