I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after sum() is inlined and the loop unrolled, the generated code
loads all values of array v (aka &x[i]) into registers before adding
any numbers up -- in the process it runs out of registers and starts
spilling (in essense copying the doubles from one area of memory to
another). After that, it proceeds to add the numbers up.
But why not add the numbers into 1 register directly? Clearly this is
what the C code is doing -- nothing could have been more explicit.
The really strange thing, is that in the assingment to p[i] is removed
(line marked with "xxx..."), then the code produced is optimal and
exactly what one expects. I show this result in "Output B" where you
get a beatiful sequence of addsd into register xmm2.
It's all very strange and it points to some questionable decision
making on the part of llvm. I tried different versions of the sum()
function (elliminating the loop for example) but it does not help.
Another observation is that the loop variable i (in foo) must be
involved: if one does *p = 5 (instead of p[i] = 5), the problem also
I would appreciate some advice on how to get around this problem.
Thank you for any help,
double sum( double* v, int v_siz )
double sum = 0.0;
int i = 0;
for (; i != v_siz; ++i)
sum += v[i];
double foo(double *x, int *p, int k)
double s = 0.0;
for (int i = 0; i != k;++i)
s += sum(&x[i], 18);
p[i] = 5; // xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
====== Output A ======