Sink redundant spill after RA

Hi All,

I found some cases where a spill of a live range in a block is reloaded only in one of its successors, and there is no reload in other paths through other successors. Since the spill is reloaded only in a certain path, it must be okay to sink such spill close to its reloads. In the AArch64 code below, there is a spill(x2) in the entry, but this value is reloaded only in %bb.1, not in .LBB2_32. If we sink the spill (str x2, [sp, #120]) from the entry to its successor (%bb.1), the load-from-store promotion might catch this and replace the ldr in %bb.1 with a mov instruction. As we move such spill down to its successor, we can also encourage more shrink-wrapping as well.

.globl _mytest

// %bb.0: // %entry

sub sp, sp, #224 // =224

stp x28, x27, [sp, #128] // 8-byte Folded Spill

stp x26, x25, [sp, #144] // 8-byte Folded Spill

stp x24, x23, [sp, #160] // 8-byte Folded Spill

stp x22, x21, [sp, #176] // 8-byte Folded Spill

stp x20, x19, [sp, #192] // 8-byte Folded Spill

stp x29, x30, [sp, #208] // 8-byte Folded Spill

ldrsw x8, [x0, #4424]

sxtw x10, w2 <------------- w2 is the use of spilled value before spill.

sxtw x12, w1

madd x8, x8, x10, x12

ldr x9, [x0, #8]

add x9, x9, x8, lsl #2

ldrh w11, [x9]

ldrh w10, [x0, #16]

str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!!

cmp w11, w10

b.eq .LBB2_32

// %bb.1: // %if.end

ldr x13, [sp, #120] // 8-byte Folded Reload <-------------- reload !!

< omitted >

:

.LBB2_32: // %cleanup <----- no reload from [sp, #120]

ldp x29, x30, [sp, #208] // 8-byte Folded Reload

ldp x20, x19, [sp, #192] // 8-byte Folded Reload

ldp x22, x21, [sp, #176] // 8-byte Folded Reload

ldp x24, x23, [sp, #160] // 8-byte Folded Reload

ldp x26, x25, [sp, #144] // 8-byte Folded Reload

ldp x28, x27, [sp, #128] // 8-byte Folded Reload

add sp, sp, #224 // =224

ret

Unless there is hidden issues that prevent it from being sunk, I think such sinking should be done after RA because sinking it down during RA will extend the live range of the spilled value. Please let me know if there any hidden issue that I miss here? I may happy to hear any opinion about it.

Thanks,

Jun

Hi All,

I found some cases where a spill of a live range in a block is reloaded only in one of its successors, and there is no reload in other paths through other successors. Since the spill is reloaded only in a certain path, it must be okay to sink such spill close to its reloads. In the AArch64 code below, there is a spill(x2) in the entry, but this value is reloaded only in %bb.1, not in .LBB2_32. If we sink the spill (str x2, [sp, #120]) from the entry to its successor (%bb.1), the load-from-store promotion might catch this and replace the ldr in %bb.1 with a mov instruction. As we move such spill down to its successor, we can also encourage more shrink-wrapping as well.

.globl _mytest

// %bb.0: // %entry

sub sp, sp, #224 // =224

stp x28, x27, [sp, #128] // 8-byte Folded Spill

stp x26, x25, [sp, #144] // 8-byte Folded Spill

stp x24, x23, [sp, #160] // 8-byte Folded Spill

stp x22, x21, [sp, #176] // 8-byte Folded Spill

stp x20, x19, [sp, #192] // 8-byte Folded Spill

stp x29, x30, [sp, #208] // 8-byte Folded Spill

ldrsw x8, [x0, #4424]

sxtw x10, w2 <------------- w2 is the use of spilled value before spill.

sxtw x12, w1

madd x8, x8, x10, x12

ldr x9, [x0, #8]

add x9, x9, x8, lsl #2

ldrh w11, [x9]

ldrh w10, [x0, #16]

str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!!

cmp w11, w10

b.eq .LBB2_32

// %bb.1: // %if.end

Presumably there is a redefinition of x2 somewhere in here, otherwise it wouldn’t need to be spilled at all?

ldr x13, [sp, #120] // 8-byte Folded Reload <-------------- reload !!

< omitted >

:

.LBB2_32: // %cleanup <----- no reload from [sp, #120]

ldp x29, x30, [sp, #208] // 8-byte Folded Reload

ldp x20, x19, [sp, #192] // 8-byte Folded Reload

ldp x22, x21, [sp, #176] // 8-byte Folded Reload

ldp x24, x23, [sp, #160] // 8-byte Folded Reload

ldp x26, x25, [sp, #144] // 8-byte Folded Reload

ldp x28, x27, [sp, #128] // 8-byte Folded Reload

add sp, sp, #224 // =224

ret

Unless there is hidden issues that prevent it from being sunk, I think such sinking should be done after RA because sinking it down during RA will extend the live range of the spilled value. Please let me know if there any hidden issue that I miss here? I may happy to hear any opinion about it.

Thanks,

Jun

FROM: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] ON BEHALF OF
Jun Lim via llvm-dev
SENT: Thursday, February 22, 2018 11:05 AM

Hi All,

I found some cases where a spill of a live range in a block is
reloaded only in one of its successors, and there is no reload in
other paths through other successors. Since the spill is reloaded only
in a certain path, it must be okay to sink such spill close to its
reloads. In the AArch64 code below, there is a spill(x2) in the entry,
but this value is reloaded only in %bb.1, not in .LBB2_32. If we sink
the spill (str x2, [sp, #120]) from the entry to its successor
(%bb.1), the load-from-store promotion might catch this and replace
the ldr in %bb.1 with a mov instruction. As we move such spill down to
its successor, we can also encourage more shrink-wrapping as well.

.globl _mytest

// %bb.0: // %entry

        sub sp, sp, #224 // =224

        stp x28, x27, [sp, #128] // 8-byte Folded Spill

        stp x26, x25, [sp, #144] // 8-byte Folded Spill

        stp x24, x23, [sp, #160] // 8-byte Folded Spill

        stp x22, x21, [sp, #176] // 8-byte Folded Spill

        stp x20, x19, [sp, #192] // 8-byte Folded Spill

        stp x29, x30, [sp, #208] // 8-byte Folded Spill

        ldrsw x8, [x0, #4424]

        sxtw x10, w2 <------------- w2 is the
use of spilled value before spill.

        sxtw x12, w1

        madd x8, x8, x10, x12

        ldr x9, [x0, #8]

        add x9, x9, x8, lsl #2

        ldrh w11, [x9]

        ldrh w10, [x0, #16]

        str x2, [sp, #120] // 8-byte Folded Spill
<------------- spill !!!

        cmp w11, w10

        b.eq .LBB2_32

// %bb.1: // %if.end

Presumably there is a redefinition of x2 somewhere in here, otherwise
it wouldn't need to be spilled at all?

In the test case I’m looking at, x2 is redefined in later blocks, but no redefinition of x2 before reloading in %bb.1.

From: junbuml@codeaurora.org [mailto:junbuml@codeaurora.org]
Sent: Thursday, February 22, 2018 11:39 AM

> FROM: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] ON BEHALF OF
> Jun Lim via llvm-dev
> SENT: Thursday, February 22, 2018 11:05 AM
>
> Hi All,
>
> I found some cases where a spill of a live range in a block is
> reloaded only in one of its successors, and there is no reload in
> other paths through other successors. Since the spill is reloaded only
> in a certain path, it must be okay to sink such spill close to its
> reloads. In the AArch64 code below, there is a spill(x2) in the entry,
> but this value is reloaded only in %bb.1, not in .LBB2_32. If we sink
> the spill (str x2, [sp, #120]) from the entry to its successor
> (%bb.1), the load-from-store promotion might catch this and replace
> the ldr in %bb.1 with a mov instruction. As we move such spill down to
> its successor, we can also encourage more shrink-wrapping as well.
>
> .globl _mytest
>
> // %bb.0: // %entry
>
> sub sp, sp, #224 // =224
>
> stp x28, x27, [sp, #128] // 8-byte Folded Spill
>
> stp x26, x25, [sp, #144] // 8-byte Folded Spill
>
> stp x24, x23, [sp, #160] // 8-byte Folded Spill
>
> stp x22, x21, [sp, #176] // 8-byte Folded Spill
>
> stp x20, x19, [sp, #192] // 8-byte Folded Spill
>
> stp x29, x30, [sp, #208] // 8-byte Folded Spill
>
> ldrsw x8, [x0, #4424]
>
> sxtw x10, w2 <------------- w2 is the
> use of spilled value before spill.
>
> sxtw x12, w1
>
> madd x8, x8, x10, x12
>
> ldr x9, [x0, #8]
>
> add x9, x9, x8, lsl #2
>
> ldrh w11, [x9]
>
> ldrh w10, [x0, #16]
>
> str x2, [sp, #120] // 8-byte Folded Spill
> <------------- spill !!!
>
> cmp w11, w10
>
> b.eq .LBB2_32
>
> // %bb.1: // %if.end
>
> Presumably there is a redefinition of x2 somewhere in here, otherwise
> it wouldn't need to be spilled at all?
>

In the test case I’m looking at, x2 is redefined in later blocks, but no
redefinition of x2 before reloading in %bb.1.

That seems odd. Are there other reloads of this spilled value that you aren't showing? I'm trying to understand why this register is being spilled at all in this case.

From: junbuml@codeaurora.org [mailto:junbuml@codeaurora.org]
Sent: Thursday, February 22, 2018 11:39 AM

> FROM: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] ON BEHALF OF
> Jun Lim via llvm-dev
> SENT: Thursday, February 22, 2018 11:05 AM
>
> Hi All,
>
> I found some cases where a spill of a live range in a block is
> reloaded only in one of its successors, and there is no reload in
> other paths through other successors. Since the spill is reloaded only
> in a certain path, it must be okay to sink such spill close to its
> reloads. In the AArch64 code below, there is a spill(x2) in the entry,
> but this value is reloaded only in %bb.1, not in .LBB2_32. If we sink
> the spill (str x2, [sp, #120]) from the entry to its successor
> (%bb.1), the load-from-store promotion might catch this and replace
> the ldr in %bb.1 with a mov instruction. As we move such spill down to
> its successor, we can also encourage more shrink-wrapping as well.
>
> .globl _mytest
>
> // %bb.0: // %entry
>
> sub sp, sp, #224 // =224
>
> stp x28, x27, [sp, #128] // 8-byte Folded Spill
>
> stp x26, x25, [sp, #144] // 8-byte Folded Spill
>
> stp x24, x23, [sp, #160] // 8-byte Folded Spill
>
> stp x22, x21, [sp, #176] // 8-byte Folded Spill
>
> stp x20, x19, [sp, #192] // 8-byte Folded Spill
>
> stp x29, x30, [sp, #208] // 8-byte Folded Spill
>
> ldrsw x8, [x0, #4424]
>
> sxtw x10, w2 <------------- w2 is the
> use of spilled value before spill.
>
> sxtw x12, w1
>
> madd x8, x8, x10, x12
>
> ldr x9, [x0, #8]
>
> add x9, x9, x8, lsl #2
>
> ldrh w11, [x9]
>
> ldrh w10, [x0, #16]
>
> str x2, [sp, #120] // 8-byte Folded Spill
> <------------- spill !!!
>
> cmp w11, w10
>
> b.eq .LBB2_32
>
> // %bb.1: // %if.end
>
> Presumably there is a redefinition of x2 somewhere in here, otherwise
> it wouldn't need to be spilled at all?
>

In the test case I’m looking at, x2 is redefined in later blocks, but no
redefinition of x2 before reloading in %bb.1.

That seems odd. Are there other reloads of this spilled value that
you aren't showing? I'm trying to understand why this register is
being spilled at all in this case.

Yes, there are other reloads of the spilled value in other blocks and some of them are reloaded after x2 is redefined in the path, but some are reloaded without redefinition of x2 (e.g., the case in %bb.1). What I guess is that since x2 is a function parameter, a copy must be placed in the entry, so RA might placed the spill in there, and we placed reloads in every use of this value. In some path x2 doesn't need to be redefined, but in some other paths x2 is redefined.