Loop invariant and strength reduction

When I compile to llvm (using -emit-llvm -O2 -S) the following piece of code:

int f(int *ptr, int incr, int n)
{
int r = n+1;

do
{
if(*ptr!=0)
r = n;
ptr += incr;
n–;
} while(n!=0);
return r;
}

clang notices the index can be generated using a multiplication ( ptr[incr * loopindex] ) and generates some code which is at least as complicated (a multiplication is more or at least as complex as an addition + the generated code uses indirect addressing instead of direct addressing which is sometimes more complex).
I am new to clang but I was thinking a compiler should do some strength reduction (and not the other way !).

Could you explain me why clang replaces this addition+direct addressing by a multiplication + indirect addressing ?

Thank you !

Damien

The optimizer is canonicalizing the loop, which the backend then lowers to a more efficient form:

_f: ## @f
Leh_func_begin0:
## BB#0: ## %entry
  pushq %rbp
Ltmp0:
  movq %rsp, %rbp
Ltmp1:
  leal -1(%rdx), %ecx
  movslq %esi, %rsi
  shlq $2, %rsi
  incq %rcx
  leal 1(%rdx), %eax
  .align 4, 0x90
LBB0_1: ## %do.body
                                        ## =>This Inner Loop Header: Depth=1
  cmpl $0, (%rdi)
  cmovnel %edx, %eax
  addq %rsi, %rdi
  decl %edx
  decq %rcx
  jne LBB0_1
## BB#2: ## %do.end
  popq %rbp
  ret

LLVM's backend optimizers are quite powerful; don't assume that the IR is the last word on the subject.

John.

Thanks.

OK, I wanted to simplify the example and it turns out that the backend optimizer is able to generate code without any multiplication.
But what if the backend cannot derive an IR without multiplication ? What was a simple code at the beginning might become more complicated…

I will simplify my example (but not that much) and post it on the mailing list.

Damien

There are people who can speak to this better than I can, but my understanding is that the optimizers only do this transformation for loops that fit the narrow definition of canonical form that they expect the target-independent optimizer to be able to lower efficiently. By all means, though, if you can find a counter-example, that's something we want to know about.

John.

OK, I am wrong to a large extent.
For the following piece of code (see at the end of the message), the backend still does generate a multiplication, but not inside the inner loop and does not use indirect addressing as I said.
But still, it does use a multiplication (some targets might not even support multiplications: e.g. some microcontrollers).

Basically the multiplication is done to update the pointer at the end of the loop, the backend converts:
n times: ptr+=incr
to
1 time: ptr += n * incr;

And the backend is using a copy of the pointer for the inner loop that is incremented by:
ptr_copy += incr;
so that there is no multiplication nor indirect addressing inside the inner loop.

Anyway, thank you for the explanations on canonicalization.

Damien

===== Sample Code =====

Basically, this piece of code iterates on a list of concatenated arrays and find the last array that has at least a value different from zero.

Compiled with llvm2.8 using:
clang -emit-llvm -O2 -S myfile.c -o myfile.ll
llc -O2 myfile.ll

int f(int *ptr, int incr, int *array_length, int narray)
{
int r = narray+1;

do
{
int n = *array_length;

do
{
if(*ptr!=0)
r = narray;
ptr += incr;
n–;
} while(n!=0);

array_length++;
narray–;
} while(narray!=0);

return r;
}