I have been analyzing the LLVM vectorizer by running some benchmarks. For vectorization, I have used the following flags:
Am I missing any other flags which will improve vectorizer performance?
These flags should be enough for Clang to vectorise your code based on
If that doesn't give you the vectorisation you want, you can try to
force the vector width or unroll factor by either command line options
or pragmas in the code:
If you don't understand why a loop is not being vectorised, you can
try the Clang diagnostics:
Loop vectorizationwas first introduced in LLVM 3.2 and turned on by default in LLVM 3.3. It has been discussed previously on this blog in 2012and 2013, as well as at FOSDEM 2014, and at Apple's WWDC 2013.
And, of course, if you spot a loop or a basic block that could have
been vectorised but wasn't, please open a bug on our bugzilla with the
results of the diagnostics and your experimentation with widths and
Hope that helps.
Thanks a lot for your help! I will perform some experimentation with pragma directives as suggested by you.