Loop Vectorizer Update

Hi,

LLVM now has a new loop vectorizer. We are now able to vectorize loops such as this:

    for (i=0; i<n; i++) {
      a[i] = b[i+1] + c[i+3] + i;
      sum += d[i];
    }

The loop vectorizer is disabled by default and can be enabled in clang using the "-mllvm -vectorize" flag (or '-loop-vectorize' in opt). The loop vectorizer is far from being 'ready', and this feature should be considered as "highly experimental".

The work on the loop vectorizer had just began, and there is lots of work ahead. If you find bugs, or opportunities for improvements, then please open a bugzilla bug report and CC me. If you decide to run the loop vectorizer on public benchmarks or on your own workloads then please share the results. This information is important because it can help us decide where to focus our efforts.

We currently know of a number of areas where we can improve. At the moment the vectorizer will vectorize anything it can, because we do not have a "cost-model" to estimate the profitability of vectorization. Implementing a cost model is a high-priority for us, and until this is ready you should expect to see slowdowns on many loops. Another area which we need to improve is the memory dependence check. At the moment we have a very basic memory legality check which can be improved. Additionally, there are a number of cases where we generate poor vector code or suffer from a phase-ordering problem. Once we solve these problems we can continue to implement additional features.

Thanks,
Nadav