[ARM] Thumb code-gen for 8-bit imm arguments results in extra reg copies

Hi,
For the following test-case:

void foo(unsigned, unsigned);
void f()
{
  foo(10, 20);
  foo(10, 20);
}

clang --target=arm-linux-gnueabi -mthumb -O2 generates:

        push {r4, r5, r7, lr}
        movs r4, #10
        movs r5, #20
        movs r0, r4
        movs r1, r5
        bl foo
        movs r0, r4
        movs r1, r5
        bl foo
        pop {r4, r5, r7}
        pop {r0}
        bx r0

Is there any particular reason for loading constants in r4, r5 and then copying
them into r0, r1 rather than loading them directly in r0, r1 before both calls ?

I suppose for higher immediate values, that require either loading from memory,
or need multiple instructions to construct, it is reasonable to pre-compute them
into registers, but for 8-bit immediate values, would it be more
beneficial to load
them directly in argument registers instead ?

Looking at the ISel dump, for the above test-case:
  %0:tgpr, dead $cpsr = tMOVi8 10, 14, $noreg
  %1:tgpr, dead $cpsr = tMOVi8 20, 14, $noreg
  $r0 = COPY %0:tgpr
  $r1 = COPY %1:tgpr

IIUC, there are a couple of reasons why this happens:
(a) tMOVi8 pattern isn't marked with isRematerializable, isAsCheapAsMove,
and isMoveImm.
(b) After annotating the pattern with above flags,
RegisterCoalescer::reMaterializeTrivialDef still bails out because
the above assignment has 2 definitions, with only one live definition.

To address this issue, I attached a hackish patch that
(a) Marks tMOVi8 pattern with:
let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1
I am not sure if this is entirely correct ?

(b) Modifies RegisterCoalescer::reMaterializeTrivialDef and
TargetInstrInfo::isReallyTriviallyReMaterializableGeneric to check
for single live def, instead of single def.

Does the patch look in the right direction ?
For the above test, it generates:

        push {r7, lr}
        movs r0, #10
        movs r1, #20
        bl foo
        movs r0, #10
        movs r1, #20
        bl foo
        pop {r7}
        pop {r0}
        bx r0

However I am not sure if the patch causes correctness issues.
Testing with make check-llvm showed a relatively larger fallout (36 fail tests),
which I am investigating. Just wanted to ask, is there something
obviously wrong with the patch or the idea ?

Thanks,
Prathamesh

llvm-612-2.diff (2.08 KB)

This seems dodgy to me. The instruction does also change CPSR so for
the transformation to be valid you have to know that register is dead
where the new instruction is being inserted. As far as I can tell the
hasOneDef check in reMaterializeTrivialDef is a simple heuristic to
keep the analysis local and avoid dealing with such issues.

I also don't know how other users of the rematerializable property
will cope with a wider range of
isReallyTriviallyReMaterializableGeneric instructions.

Once you start having to consider the surroundings, is it still
"trivial"? I honestly don't know right now.

Cheers.

Tim.

> (b) Modifies RegisterCoalescer::reMaterializeTrivialDef and
> TargetInstrInfo::isReallyTriviallyReMaterializableGeneric to check
> for single live def, instead of single def.

This seems dodgy to me. The instruction does also change CPSR so for
the transformation to be valid you have to know that register is dead
where the new instruction is being inserted. As far as I can tell the
hasOneDef check in reMaterializeTrivialDef is a simple heuristic to
keep the analysis local and avoid dealing with such issues.

I also don't know how other users of the rematerializable property
will cope with a wider range of
isReallyTriviallyReMaterializableGeneric instructions.

Once you start having to consider the surroundings, is it still
"trivial"? I honestly don't know right now.

Hi Tim,
Thanks for the response!
Sorry, I didn't entirely understand why it isn't safe to propagate
while checking for single live def, could you please elaborate ?
IIUC, for above case, tmovi8 (movs) would update cpsr n/z flags, but
since it's overwritten by following instructions, it's marked as dead
in both cases ?
The patch just checks that only one def is live and remaining defs of
DefMI are dead in reMaterializeTrivialDef, in which case I assume it
should be safe to
propagate value of CopyMI into DefMI ?
If reMaterializeTrivialDef is not the right place to handle the
transform, could you please suggest where should I look for adding it
?
Thanks!

Regards,
Prathamesh

Sorry, I didn't entirely understand why it isn't safe to propagate
while checking for single live def, could you please elaborate?

Imagine something like:

    %1:gpr = tMOVi8 123, def dead %cpsr
    [...]
    tCMPr %2:gpr, %3:gpr, def %cpsr
    %4:gpr = COPY %1:gpr
    tBcc %bb.1, 1, implicit %cpsr

The initial mov only has one live use, but reinserting it between the
cmp and the bne would make the branch go to the wrong place. You need
%cpsr to be dead at the place you're intending to insert the mov too.

If reMaterializeTrivialDef is not the right place to handle the
transform, could you please suggest where should I look for adding it?

It might be the right place, otherwise it might fit in with a separate
ARM pass of its own, or a known more expensive computation that hasn't
been written yet. I don't know, but we should at least consider
whether we want to complicate the interface for these functions.

Cheers.

Tim.

> Sorry, I didn't entirely understand why it isn't safe to propagate
> while checking for single live def, could you please elaborate?

Imagine something like:

    %1:gpr = tMOVi8 123, def dead %cpsr
    [...]
    tCMPr %2:gpr, %3:gpr, def %cpsr
    %4:gpr = COPY %1:gpr
    tBcc %bb.1, 1, implicit %cpsr

The initial mov only has one live use, but reinserting it between the
cmp and the bne would make the branch go to the wrong place. You need
%cpsr to be dead at the place you're intending to insert the mov too.

Hi Tim,
Ah indeed, thanks for the clarification!
Would it make sense to restrict the transform where:
(a) DefMI and CopyMI are in same basic block
(b) Check that at least one of the instructions following DefMI till
end of block define cpsr
before it is used ? So the value of cpsr at DefMI is still dead.

Is there any API I could use to check if a particular instruction
defines / uses cpsr ?

Thanks,
Prathamesh