Consider this simple piece of code which takes the product of an array

of complex numbers.

#include <complex.h>

complex float f(complex float x) {

complex float p = 1.0;

for (int i = 0; i < 32; i++)

p *= x[i];

return p;

}

If I compile it with -O3 -march=bdver2 -ffast-math using clang 3.9.1 I get

That is unvectorised assembly.

.LCPI0_0:

.long 1065353216 # float 1

f: # @f

vxorps xmm1, xmm1, xmm1

vmovss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero

xor eax, eax

.LBB0_1: # =>This Inner Loop Header: Depth=1

vmovss xmm2, dword ptr [rdi + 8*rax] # xmm2 = mem[0],zero,zero,zero

vmovss xmm3, dword ptr [rdi + 8*rax + 4] # xmm3 = mem[0],zero,zero,zero

vmulss xmm4, xmm2, xmm1

vmulss xmm5, xmm3, xmm1

vfmaddss xmm1, xmm3, xmm0, xmm4

vfmsubss xmm0, xmm2, xmm0, xmm5

inc rax

cmp rax, 32

jne .LBB0_1

vinsertps xmm0, xmm0, xmm1, 16 # xmm0 = xmm0[0],xmm1[0],xmm0[2,3]

ret

Am I using the wrong flags or is this simply a missing feature

currently? The target CPU is the AMD FX-8350.

As a test I also tried icc (the Intel Compiler) which does appear to

give vectorised code so it is at least possible in principle.

Raphael