Vectorization in multiple threads

I want to create multiple threads and do vectorization in each. like if my loop has 2048 iterations it should be divided first into threads like if 4 threads so each thread should do 512 iterations using vector instructions of length 64.

how to achieve this? i have tried using #pragma omp parallel for simd and compiled using following;

clang -S -emit-llvm 1.c -march=knl -O3 -fopenmp -mllvm -disable-llvm-optzns -o 1.ll
opt -S -O3 -force-vector-width=64 1.ll -o 1_o3.ll

i m getting this error;

remark: :0:0: loop not vectorized: could not determine number of loop iterations
warning: :0:0: loop not vectorized: failed explicitly specified loop vectorization

How to resolve this?