Recently I compiled the attached .c file using Clang with “-mavx2 -mfma -m32 -O3” optimization flags.
First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it.
Neither the SLPVectorizer or the LoopVectorizer is however doing any vectorization (also checked it using -debug-only flag) as witnessed by the LLVM IR dump.
Therefore, the vectorization should happen in the backend(?).
Can I know whether the x86 backend does additional vectorization of scalar code and if so in which passes?
NB - I posed the same question with the source files in a previous email, but the limit of 100kB was reached.
The X86 backend shouldn’t be doing any additional vectorization. If there are no vector types in IR, I don’t think the X86 backend will create any.
I isolated the LLVM IR and the X86 instructions emitted for the function and are attached herewith and it is clearly emitting vector instructions. I am having a hard time figuring out where the vector instructions are formulated. For sure SLP and Loop vectorizer is not doing anything.
bzip2.ll (20.2 KB)
bzip2.s (11 KB)
Almost all of those instructions end in “sdl” which are the scalar floating point instructions using the lower 32-bits of the xmm registers. The only one that ends in “ps” is an xor of a register with itself which is the idiom for zeroing a register.
Oh yes, my bad just forgot that floating point instructions use the xmms too! Not the old X87 stack!
Thanks for the help!