Duplicate loading of double constants

Hi,

I found that in some cases llvm generates duplicate loads of double constants,
e.g.

$ cat t.c
double f(double* p, int n)
{
double s = 0;
if (n)
s += *p;
return s;
}
$ clang -S -O3 t.c -o -

f: # @f
.cfi_startproc

BB#0:

xorps %xmm0, %xmm0
testl %esi, %esi
je .LBB0_2

BB#1:

xorps %xmm0, %xmm0
addsd (%rdi), %xmm0
.LBB0_2:
ret

Note that there are 2 xorps instructions, the one in BB#1 being clearly redundant
as it’s dominated by the first one. Two xorps come from 2 FsFLD0SD generated by
instruction selection and never eliminated by machine passes. My guess would be
machine CSE should take care of it.

A variation of this case without indirection shows the same problem, as well as
not commuting addps, resulting in an extra movps:

$ cat t.c
double f(double p, int n)
{
double s = 0;
if (n)
s += p;
return s;
}
$ clang -S -O3 t.c -o -

f: # @f
.cfi_startproc

BB#0:

xorps %xmm1, %xmm1
testl %edi, %edi
je .LBB0_2

BB#1:

xorps %xmm1, %xmm1
addsd %xmm1, %xmm0
movaps %xmm0, %xmm1
.LBB0_2:
movaps %xmm1, %xmm0
ret

Thanks,
Eugene

Hi,

I found that in some cases llvm generates duplicate loads of double
constants,
e.g.

$ cat t.c
double f(double* p, int n)
{
    double s = 0;
    if (n)
        s += *p;
    return s;
}
$ clang -S -O3 t.c -o -
...
f: # @f
        .cfi_startproc
# BB#0:
        xorps %xmm0, %xmm0
        testl %esi, %esi
        je .LBB0_2
# BB#1:
        xorps %xmm0, %xmm0
        addsd (%rdi), %xmm0
.LBB0_2:
        ret
...

Thanks. Please file a bug for this on llvm.org/bugs .

The crux of the problem is that machine CSE runs before register allocation
and is consequently extremely conservative when doing CSE to avoid
potentially increasing register pressure. Of course, with such a small
testcase, register pressure isn't a problem. MachineCSE might be able to do
a better job here.

Nick

Note that there are 2 xorps instructions, the one in BB#1 being clearly

Thanks. Please file a bug for this on llvm.org/bugs .

Done (PR16938).

The crux of the problem is that machine CSE runs before register
allocation and is consequently extremely conservative when doing CSE to
avoid potentially increasing register pressure. Of course, with such a
small testcase, register pressure isn't a problem. MachineCSE might be able
to do a better job here.

Nick

I figured it was trying to avoid adding register pressure, but shouldn't it
be more aggressive with constants? Isn't register allocator smart enough to
spill constants when it runs out of registers?

Also, what do you think on commuting addps in the second example?

Thanks,
Eugene

I figured it was trying to avoid adding register pressure, but shouldn't it
be more aggressive with constants? Isn't register allocator smart enough to
spill constants when it runs out of registers?

Also, what do you think on commuting addps in the second example?

I think this is a problem in CSE and we should have handled it there
and submitted a patch yesterday for this :slight_smile:

-eric