ScheduleDAGInstrs computes deps using IR Values that may be invalid

Hi All,
I've encountered an issue where tail merging MIs is causing a problem with
the post-RA MI scheduler dependency analysis and I'm not sure of the best
way to address the problem.

In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp) is
merging common code from BB#14 and BB#15 into BB#16. It's clear that
there are 4 common instructions (marked with an *) in BB#14 and BB#15:

From: "Chad Rosier" <mcrosier@codeaurora.org>
To: llvmdev@cs.uiuc.edu
Sent: Thursday, February 19, 2015 12:06:10 PM
Subject: [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

Hi All,
I've encountered an issue where tail merging MIs is causing a problem
with
the post-RA MI scheduler dependency analysis and I'm not sure of the
best
way to address the problem.

In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp)
is
merging common code from BB#14 and BB#15 into BB#16. It's clear that
there are 4 common instructions (marked with an *) in BB#14 and
BB#15:

--------------------------------------------------------------------------------
BB#14: derived from LLVM BB %if.end.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
    %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X16, %X13
* %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep95](tbaa=<0x6e02518>)
* %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
* %W2<def> = SUBWrr %W3<kill>, %W19<kill>
* STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep95](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16

BB#15: derived from LLVM BB %L20.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
    %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X17, %X13
* %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep91](tbaa=<0x6e02518>)
* %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
* %W2<def> = SUBWrr %W3<kill>, %W19<kill>
* STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16
--------------------------------------------------------------------------------

The pass splits the edges between BB#14/BB#15 and inserts the common
code
into a new common successor BB#17. BB#17 is then merged into BB#16:

--------------------------------------------------------------------------------
BB#16: derived from LLVM BB %L30.1
    Predecessors according to CFG: BB#15 BB#14
        %W19<def> = LDRBBui %X5, 1;
        mem:LD1[%scevgep91](tbaa=<0x6e02518>)
    (BB#15)
        %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
    (BB#15)
        %W2<def> = SUBWrr %W3<kill>, %W19<kill>
    (BB#15)
        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>) (BB#15)

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)
        %W6<def> = LDRBBui %X6<kill>, 1;
mem:LD1[%scevgep103](tbaa=<0x6e02518>)
        %W5<def> = MADDWrrr %W6<kill>, %W0<kill>, %W7<kill>
        %X9<def> = ADDXri %X9<kill>, 2, 0
        %X13<def> = ADDXri %X13<kill>, 2, 0
        %WZR<def,dead> = SUBSWrr %W4, %W8, %NZCV<imp-def>,
        %X4<imp-use,kill>
        STRBBui %W5<kill>, %X18<kill>, 1;
        mem:ST1[%31](tbaa=<0x6e02518>)
        Bcc 1, <BB#9>, %NZCV<imp-use>
    Successors according to CFG: BB#13(4) BB#9(124)
--------------------------------------------------------------------------------

This transformation looks fine, IMO. However, notice that the merged
instructions are derived from BB#15. The memory operations in BB#15
were
accessing array 'C', while the memory operations in BB#14 were
accessing
array 'B'.

The next two loads

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

load from array 'B' and 'C', respectively. The problem occurs
because
ScheduleDAGInstrs does not correctly identify the true dependency
between
the

        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)

and

        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

instructions. AFAICT, the problem is that the Values associated with
the
MI's memory operands are used to determine the underlying objects
accessed
by a memory operation (See getUnderlyingObjectsForInstr() in
ScheduleDAGInstrs.cpp). In turn the dependency analysis relies on
these
Values.

When the STRBBui is merged into BB#16 it is derived from BB#15.
Thus, the
STRBBui associated Value is 'C'. The dependency analysis detects the
dependency on array 'C', but misses the dependency on 'B'. Later,
the
scheduler hoists the LDRSBWui above the STRBBui and violates a true
dependency. Boo..

Assuming my analysis is correct, the first solution that comes to
mind is
to disable tail merging when we encounter a memory operation (unless
they
access the same objects). Thoughts?

I'm not particularly fond of inspecting the Values associated with MI
operands for dependency analysis, but it's what we have now..

I would like to know everyones thoughts. Is there a more robust
solution
I'm missing? What happens if another MI-level pass performs a
similar
type of merging?

You can also remove the MMOs when you tail merge, or add multiple MMOs. Currently the two are equivalent because we ignore all MMOs on instructions with multiple MMOs in ScheduleDAGInstrs.cpp. It would be really nice, however, to support multiple MMOs, so work here would certainly be appreciated.

-Hal

From: "Chad Rosier" <mcrosier@codeaurora.org>
To: llvmdev@cs.uiuc.edu
Sent: Thursday, February 19, 2015 12:06:10 PM
Subject: [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that
may be invalid

Hi All,
I've encountered an issue where tail merging MIs is causing a problem
with
the post-RA MI scheduler dependency analysis and I'm not sure of the
best
way to address the problem.

In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp)
is
merging common code from BB#14 and BB#15 into BB#16. It's clear that
there are 4 common instructions (marked with an *) in BB#14 and
BB#15:

--------------------------------------------------------------------------------
BB#14: derived from LLVM BB %if.end.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
    %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X16, %X13
* %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep95](tbaa=<0x6e02518>)
* %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
* %W2<def> = SUBWrr %W3<kill>, %W19<kill>
* STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep95](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16

BB#15: derived from LLVM BB %L20.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
    %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X17, %X13
* %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep91](tbaa=<0x6e02518>)
* %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
* %W2<def> = SUBWrr %W3<kill>, %W19<kill>
* STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16
--------------------------------------------------------------------------------

The pass splits the edges between BB#14/BB#15 and inserts the common
code
into a new common successor BB#17. BB#17 is then merged into BB#16:

--------------------------------------------------------------------------------
BB#16: derived from LLVM BB %L30.1
    Predecessors according to CFG: BB#15 BB#14
        %W19<def> = LDRBBui %X5, 1;
        mem:LD1[%scevgep91](tbaa=<0x6e02518>)
    (BB#15)
        %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
    (BB#15)
        %W2<def> = SUBWrr %W3<kill>, %W19<kill>
    (BB#15)
        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>) (BB#15)

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)
        %W6<def> = LDRBBui %X6<kill>, 1;
mem:LD1[%scevgep103](tbaa=<0x6e02518>)
        %W5<def> = MADDWrrr %W6<kill>, %W0<kill>, %W7<kill>
        %X9<def> = ADDXri %X9<kill>, 2, 0
        %X13<def> = ADDXri %X13<kill>, 2, 0
        %WZR<def,dead> = SUBSWrr %W4, %W8, %NZCV<imp-def>,
        %X4<imp-use,kill>
        STRBBui %W5<kill>, %X18<kill>, 1;
        mem:ST1[%31](tbaa=<0x6e02518>)
        Bcc 1, <BB#9>, %NZCV<imp-use>
    Successors according to CFG: BB#13(4) BB#9(124)
--------------------------------------------------------------------------------

This transformation looks fine, IMO. However, notice that the merged
instructions are derived from BB#15. The memory operations in BB#15
were
accessing array 'C', while the memory operations in BB#14 were
accessing
array 'B'.

The next two loads

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

load from array 'B' and 'C', respectively. The problem occurs
because
ScheduleDAGInstrs does not correctly identify the true dependency
between
the

        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)

and

        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

instructions. AFAICT, the problem is that the Values associated with
the
MI's memory operands are used to determine the underlying objects
accessed
by a memory operation (See getUnderlyingObjectsForInstr() in
ScheduleDAGInstrs.cpp). In turn the dependency analysis relies on
these
Values.

When the STRBBui is merged into BB#16 it is derived from BB#15.
Thus, the
STRBBui associated Value is 'C'. The dependency analysis detects the
dependency on array 'C', but misses the dependency on 'B'. Later,
the
scheduler hoists the LDRSBWui above the STRBBui and violates a true
dependency. Boo..

Assuming my analysis is correct, the first solution that comes to
mind is
to disable tail merging when we encounter a memory operation (unless
they
access the same objects). Thoughts?

I'm not particularly fond of inspecting the Values associated with MI
operands for dependency analysis, but it's what we have now..

I would like to know everyones thoughts. Is there a more robust
solution
I'm missing? What happens if another MI-level pass performs a
similar
type of merging?

You can also remove the MMOs when you tail merge, or add multiple MMOs.
Currently the two are equivalent because we ignore all MMOs on
instructions with multiple MMOs in ScheduleDAGInstrs.cpp. It would be
really nice, however, to support multiple MMOs, so work here would
certainly be appreciated.

Thanks for the advice, Hal. That sounds like a much better solution now
that I better understand how ScheduleDAGInstrs.cpp uses MMOs.