This example is on AMDGPU:

**Code:**

```
SU(0): %3588:sreg_32 = nsw S_MUL_I32 %163:sreg_32_xm0_xexec, 50
...
%3049:vgpr_32 = V_ADD3_U32_e64 %673:vgpr_32, %6970:vgpr_32, %674:vgpr_32, implicit $exec
%3050:vgpr_32 = V_ADD3_U32_e64 %3049:vgpr_32, %706:vgpr_32, %707:vgpr_32, implicit $exec
%3051:vgpr_32 = V_ADD3_U32_e64 %3050:vgpr_32, %739:vgpr_32, %740:vgpr_32, implicit $exec
%3052:vgpr_32 = V_ADD3_U32_e64 %3051:vgpr_32, %772:vgpr_32, %773:vgpr_32, implicit $exec
%3053:vgpr_32 = V_ADD3_U32_e64 %3052:vgpr_32, %807:vgpr_32, %808:vgpr_32, implicit $exec
%3054:vgpr_32 = V_ADD3_U32_e64 %3053:vgpr_32, %842:vgpr_32, %843:vgpr_32, implicit $exec
%3055:vgpr_32 = V_ADD3_U32_e64 %3054:vgpr_32, %877:vgpr_32, %878:vgpr_32, implicit $exec
%3056:vgpr_32 = V_ADD3_U32_e64 %3055:vgpr_32, %912:vgpr_32, %913:vgpr_32, implicit $exec
%3057:vgpr_32 = V_ADD3_U32_e64 %3056:vgpr_32, %947:vgpr_32, %948:vgpr_32, implicit $exec
%3058:vgpr_32 = V_ADD3_U32_e64 %3057:vgpr_32, %982:vgpr_32, %983:vgpr_32, implicit $exec
%3059:vgpr_32 = V_ADD3_U32_e64 %3058:vgpr_32, %1017:vgpr_32, %1018:vgpr_32, implicit $exec
%3060:vgpr_32 = V_ADD3_U32_e64 %3059:vgpr_32, %1052:vgpr_32, %1053:vgpr_32, implicit $exec
%3061:vgpr_32 = V_ADD3_U32_e64 %3060:vgpr_32, %1087:vgpr_32, %1088:vgpr_32, implicit $exec
%3062:vgpr_32 = V_ADD3_U32_e64 %3061:vgpr_32, %1122:vgpr_32, %1123:vgpr_32, implicit $exec
...
```

**Debug statements for the first 2 iterations of bidirectional scheduling**

```
** ScheduleDAGMILive::schedule picking next node
Queue BotQ.P:
Queue BotQ.A: 2626 2410
Queue TopQ.P:
Queue TopQ.A: 0 2 51 52 304 561 818 1089 1095 1380 1637 1894 2610 2611 2614
TopQ.A + Remain MOps: 2627
BotQ.A RemLatency SU(2626) 131c
BotQ.A + Remain MOps: 2628
TopQ.A RemLatency SU(0) 131c
Picking from Bot:
Cand SU(2626) ORDER
Picking from Top:
Cand SU(0) ORDER
Top Cand: Cand SU(0) ORDER
Bot Cand: Cand SU(2626) ORDER
Picking: Cand SU(2626) ORDER
Scheduling SU(2626) BUFFER_STORE_BYTE_OFFEN_exact
--------------------------------------------------------------------------------------------------------------
** ScheduleDAGMILive::schedule picking next node
Queue BotQ.P:
Queue BotQ.A: 2410 2543 2623 2622 2625
Queue TopQ.P:
Queue TopQ.A: 0 2 51 52 304 561 818 1089 1095 1380 1637 1894 2610 2611 2614
TopQ.A + Remain MOps: 2626
BotQ.A RemLatency SU(2623) 131c
BotQ.A + Remain MOps: 2628
TopQ.A RemLatency SU(0) 131c
Picking from Bot:
Cand SU(2410) ORDER
Cand SU(2543) ORDER
Cand SU(2623) ORDER
Cand SU(2625) ORDER
Picking from Top:
Cand SU(0) NOCAND
Top Cand: Cand SU(0) NOCAND
Bot Cand: Cand SU(2625) ORDER
Picking: Cand SU(2625) ORDER
Scheduling SU(2625) %6477:vgpr_32 = V_ADD_U32_e32 4, %6476:vgpr_32, implicit $exec
```

In the example above, the scheduler enqueues SU(0) into TopQ. It decides to pick a `buffer_store_byte `

instruction with reason `Order`

. SU(0) is also the `TopCand`

in the next iteration, however this time it’s reason is set to `NoCand`

because no heuristic in `tryCandidate`

could alter the reason to anything else. So the scheduler ends up picking the alternative until the very end when it picks SU(0) from `BotQ`

after dequeuing it from `TopQ`

. This pattern is problematic because often the scheduler is forced to pick nodes which lead to excessive register pressure. Remark that the V_ADD3’s operands above depend on three operands: result from the previous V_ADD3 and from V_MUL’s that are about 2000 instructions above. This leads to too many long live ranges interfering with each other which further causes too many vgpr spills.

I tried forcing the scheduler to pick the `TopCand`

if it’s SU# is <= 100. Yet, the scheduler ends up scheduling bottom-up like the above case.

Thanks,

Anshil