I am doing optimization for an new backend. And I have read the auto-vectorization guide,https://llvm.org/docs/Vectorizers.html.
The hardware support vector operation. I wonder how can I get the loop vectorizer tuned for my backend.
Is there any documents or suggestions? Thanks!
The way I understand it, the auto-vectorizer decides whether to “Widen” an instruction into a vector instruction is by calling one of the “getCost” methods of the TargetTransformInfo class. For example, take a look at this line in LoopVectorize.cpp which is calling the TTI.getArithmeticInstrCost for all the basic arithmetic instruction cases, and it passes in the instruction opcode, Vector Type and any other info needed. These getCost calls eventually end up with one of the TargetTransformInfo methods, for example this is the ARM backend implementation for calculating the vector arithmetic instruction costs.
So I believe all you need to do to tune the auto-vectorizer for your backend is to implement a subclass of BaseTTIImplBase as done here for ARM and then implement the appropriate get*Cost methods to return proper costs for your backend along with the getST and getTLI methods which are needed by the BaseTTIIimpl class. A few other methods you might need to implement for your TargetTransformInfo are the getNumberOfRegisters to return the number of registers for a given (VectorType) RegisterClass, getMinimumVF (for the minimum vectorization factor for your subtarget), etc. You can take a look at any of the existing backend TTI implementations to get an idea as to how these methods have to be implemented.
P.S. I’m still relatively new to LLVM (little over 7 months of experience working on an LLVM backend), so I apologize if I’ve made any mistakes.