According the source code *1 and my experiment, LLVM implements a transform that change the division to multiplication and shift right.
In my experiment, this optimization are applied at the backend ( because I saw that change on X86 assembly code instead of LLVM IR ).
I know this change may be associated with the hardware. In my point, in some hardware, the multiplication and shift right maybe more expensive than a single division operator. So this optimization is implemented in backend.
But when I search the DAGCombiner.cpp, I saw a function named
isIntDivCheap(). And in the definition of this function, there are some comments point that the decision of cheap or expensive depends on optimizing base on code size or the speed.
That is, if I always optimize the code base on the speed, the division will convert to multiplication and shift right. On the contrary, the division will not convert.
In the other hand, a single division is always slower than multiplication and shift right or the function will do more thing to decide the cost.
So, why this optimization is NOT implemented in LLVM IR if a single division always slower?
*1: LLVM: lib/Support/DivisionByConstantInfo.cpp File Reference
This is because a division is easier to analyze and compose with other transforms than some kind of magic multiply-shift sequence.
Something useful to keep in mind is that IR transformations are mostly about canonicalization and not optimization (with the exception of some late optimization passes like vectorization and runtime unrolling). The backend is responsible for emitting non-canonical machine IR if the canonical form would perform worse.
Thanks for your reply.
That’s help me a lot.
But I can’t understand what you mean about “ LLVM IR transformation are not optimization”.
I think some optimizations that speed up for the whole machines like peephole, constant folding, etc applied on LLVM IR for modularity.
Maybe I have some misunderstandings for that.
That’s true of InstCombine, not DAGCombiner?
I think Nikita’s statement here does more to obfuscate than it does to illuminate, so let me explain in a different way.
At the level of LLVM IR, it is extremely difficult to predict whether one short chain of instructions will be faster than another short chain of instructions; LLVM IR is just too high-level compared to the nitty-gritty of actual machine instruction sets. The natural consequence of this is that peephole optimization at the LLVM IR isn’t reliable, on its own, to implement a peephole optimizer. So while InstCombine is the main peephole optimization pass at the LLVM IR level, its purpose is as much to implement a canonical form for LLVM IR as it is to do common peepholes.
The second thing to note of LLVM IR is that preserving comprehensible IR in optimizations tends to be more preferable to providing the fastest variants. So divide-by-constant isn’t an IR-level pass (it is SelectionDAG, IIRC) because the resulting form destroys the ability of subsequent analyses to recognize that it’s a division. Also keep in mind the kind of instructions that LLVM IR lacks: there is no multiply-high or wide-multiply instruction in LLVM IR (although there is in the instruction selection phase), so algorithms that crucially rely on such operations being fast have to rely on matching particular LLVM IR patterns.
So rather than Nikita’s statement that IR passes are about canonicalization and not optimization, I would instead say that LLVM IR passes are about analysis-preserving optimizations, and not maximal-performance optimizations, especially in the realm of peephole optimizations. Something like redundancy elimination or code motion that doesn’t really deleteriously affect reasoning about code would tend to be an IR-level pass, while something that aggressively replaces expensive instructions with more complicated would tend to be pushed down to the machine code generation or optimization phase.