clang fails to vectorise the product of a complex array

Consider this simple piece of code which takes the product of an array
of complex numbers.

#include <complex.h>
complex float f(complex float x) {
  complex float p = 1.0;
  for (int i = 0; i < 32; i++)
    p *= x[i];
  return p;
}

If I compile it with -O3 -march=bdver2 -ffast-math using clang 3.9.1 I get

That is unvectorised assembly.

.LCPI0_0:
        .long 1065353216 # float 1
f: # @f
        vxorps xmm1, xmm1, xmm1
        vmovss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
        xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
        vmovss xmm2, dword ptr [rdi + 8*rax] # xmm2 = mem[0],zero,zero,zero
        vmovss xmm3, dword ptr [rdi + 8*rax + 4] # xmm3 = mem[0],zero,zero,zero
        vmulss xmm4, xmm2, xmm1
        vmulss xmm5, xmm3, xmm1
        vfmaddss xmm1, xmm3, xmm0, xmm4
        vfmsubss xmm0, xmm2, xmm0, xmm5
        inc rax
        cmp rax, 32
        jne .LBB0_1
        vinsertps xmm0, xmm0, xmm1, 16 # xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
        ret

Am I using the wrong flags or is this simply a missing feature
currently? The target CPU is the AMD FX-8350.

As a test I also tried icc (the Intel Compiler) which does appear to
give vectorised code so it is at least possible in principle.

Raphael