When I compile to llvm (using -emit-llvm -O2 -S) the following piece of code:
int f(int *ptr, int incr, int n)
int r = n+1;
r = n;
ptr += incr;
clang notices the index can be generated using a multiplication ( ptr[incr * loopindex] ) and generates some code which is at least as complicated (a multiplication is more or at least as complex as an addition + the generated code uses indirect addressing instead of direct addressing which is sometimes more complex).
I am new to clang but I was thinking a compiler should do some strength reduction (and not the other way !).
Could you explain me why clang replaces this addition+direct addressing by a multiplication + indirect addressing ?
Thank you !
The optimizer is canonicalizing the loop, which the backend then lowers to a more efficient form:
_f: ## @f
## BB#0: ## %entry
movq %rsp, %rbp
leal -1(%rdx), %ecx
movslq %esi, %rsi
shlq $2, %rsi
leal 1(%rdx), %eax
.align 4, 0x90
LBB0_1: ## %do.body
## =>This Inner Loop Header: Depth=1
cmpl $0, (%rdi)
cmovnel %edx, %eax
addq %rsi, %rdi
## BB#2: ## %do.end
LLVM's backend optimizers are quite powerful; don't assume that the IR is the last word on the subject.
OK, I wanted to simplify the example and it turns out that the backend optimizer is able to generate code without any multiplication.
But what if the backend cannot derive an IR without multiplication ? What was a simple code at the beginning might become more complicated…
I will simplify my example (but not that much) and post it on the mailing list.
There are people who can speak to this better than I can, but my understanding is that the optimizers only do this transformation for loops that fit the narrow definition of canonical form that they expect the target-independent optimizer to be able to lower efficiently. By all means, though, if you can find a counter-example, that's something we want to know about.
OK, I am wrong to a large extent.
For the following piece of code (see at the end of the message), the backend still does generate a multiplication, but not inside the inner loop and does not use indirect addressing as I said.
But still, it does use a multiplication (some targets might not even support multiplications: e.g. some microcontrollers).
Basically the multiplication is done to update the pointer at the end of the loop, the backend converts:
n times: ptr+=incr
1 time: ptr += n * incr;
And the backend is using a copy of the pointer for the inner loop that is incremented by:
ptr_copy += incr;
so that there is no multiplication nor indirect addressing inside the inner loop.
Anyway, thank you for the explanations on canonicalization.
===== Sample Code =====
Basically, this piece of code iterates on a list of concatenated arrays and find the last array that has at least a value different from zero.
Compiled with llvm2.8 using:
clang -emit-llvm -O2 -S myfile.c -o myfile.ll
llc -O2 myfile.ll
int f(int *ptr, int incr, int *array_length, int narray)
int r = narray+1;
int n = *array_length;
r = narray;
ptr += incr;