machine scheduler: pre-RA bidirectional scheduling

Hi,

I would like to get some feedback about the current status of the pre-RA machine scheduler bidirectional scheduling.

I have tried enabling this on SystemZ, and found that it slightly increases spilling generally (in one benchmark with as much as 5%). Benchmarking do not indicate that bidirectional would be a win, either. Is this within the expected, or does it indicate something in the backend that could be fixed?

Theoretically, it should be better since it enables pre-RA resource balancing and the second latency heuristic, right? It should also be better to take from two sides the best candidate, instead of just from one side, exposing the heuristics to more nodes and finding more good candidates.

AArch, AMDGPU and PowerPC seem to be the only ones enabling bidirectional scheduling, which indicates that bottom-up is still the norm.

So, I then would like to ask what is the general advice on when to enable it? What were the original intentions behind this? Any particular type of target that would benefit from this?

Thanks for any advice and explanation,

Jonas

Hi,

I would like to get some feedback about the current status of the
pre-RA machine scheduler bidirectional scheduling.

I have tried enabling this on SystemZ, and found that it slightly
increases spilling generally (in one benchmark with as much as 5%).
Benchmarking do not indicate that bidirectional would be a win,
either. Is this within the expected, or does it indicate something in
the backend that could be fixed?

Theoretically, it should be better since it enables pre-RA resource
balancing and the second latency heuristic, right? It should also be
better to take from two sides the best candidate, instead of just from
one side, exposing the heuristics to more nodes and finding more good
candidates.

AArch, AMDGPU and PowerPC seem to be the only ones enabling
bidirectional scheduling, which indicates that bottom-up is still the
norm.

For PowerPC, bidirectional scheduling was a net win from benchmarking,
and we use post-RA scheduling with aggressive anti-dependency breaking
for the same reason.

-Hal

Hi,

Hi,

I would like to get some feedback about the current status of the
pre-RA machine scheduler bidirectional scheduling.

I have tried enabling this on SystemZ, and found that it slightly
increases spilling generally (in one benchmark with as much as 5%).
Benchmarking do not indicate that bidirectional would be a win,
either. Is this within the expected, or does it indicate something in
the backend that could be fixed?

Theoretically, it should be better since it enables pre-RA resource
balancing and the second latency heuristic, right? It should also be to
better to take from two sides the best candidate, instead of just from
one side, exposing the heuristics to more nodes and finding more good
candidates.

AArch, AMDGPU and PowerPC seem to be the only ones enabling
bidirectional scheduling, which indicates that bottom-up is still the
norm.

For PowerPC, bidirectional scheduling was a net win from benchmarking,
and we use post-RA scheduling with aggressive anti-dependency breaking
for the same reason.

Thanks for the reply, Hal! It seems like PowerPC is using the pre-RA machine-scheduler, while
using the PostRASchedulerList with aggressive anti-dependency breaking. I wonder then if there
is any reason in particular not to run the anti-dep breaker post-RA if using the post-RA machine
scheduler instead (is the idea that it might not be needed if pre-ra machine-sched is enabled,
while PPC in particular still benefits from it)?

I suppose that if benchmarks do not benefit from bidirectional pre-RA scheduling on SystemZ,
there isn't any principal argument at the moment to enable it?

/Jonas

Hi,

Hi,

I would like to get some feedback about the current status of the
pre-RA machine scheduler bidirectional scheduling.

I have tried enabling this on SystemZ, and found that it slightly
increases spilling generally (in one benchmark with as much as 5%).
Benchmarking do not indicate that bidirectional would be a win,
either. Is this within the expected, or does it indicate something in
the backend that could be fixed?

Theoretically, it should be better since it enables pre-RA resource
balancing and the second latency heuristic, right? It should also be to
better to take from two sides the best candidate, instead of just from
one side, exposing the heuristics to more nodes and finding more good
candidates.

AArch, AMDGPU and PowerPC seem to be the only ones enabling
bidirectional scheduling, which indicates that bottom-up is still the
norm.

For PowerPC, bidirectional scheduling was a net win from benchmarking,
and we use post-RA scheduling with aggressive anti-dependency breaking
for the same reason.

Thanks for the reply, Hal! It seems like PowerPC is using the pre-RA
machine-scheduler, while
using the PostRASchedulerList with aggressive anti-dependency
breaking. I wonder then if there
is any reason in particular not to run the anti-dep breaker post-RA if
using the post-RA machine
scheduler instead (is the idea that it might not be needed if pre-ra
machine-sched is enabled,
while PPC in particular still benefits from it)?

I believe that the machine scheduler still does not support anti-dep
breaking.

-Hal

Hi,

Hi,

I would like to get some feedback about the current status of the
pre-RA machine scheduler bidirectional scheduling.

I have tried enabling this on SystemZ, and found that it slightly
increases spilling generally (in one benchmark with as much as 5%).
Benchmarking do not indicate that bidirectional would be a win,
either. Is this within the expected, or does it indicate something in
the backend that could be fixed?

Theoretically, it should be better since it enables pre-RA resource
balancing and the second latency heuristic, right? It should also be to
better to take from two sides the best candidate, instead of just from
one side, exposing the heuristics to more nodes and finding more good
candidates.

AArch, AMDGPU and PowerPC seem to be the only ones enabling
bidirectional scheduling, which indicates that bottom-up is still the
norm.

For PowerPC, bidirectional scheduling was a net win from benchmarking,
and we use post-RA scheduling with aggressive anti-dependency breaking
for the same reason.

Thanks for the reply, Hal! It seems like PowerPC is using the pre-RA machine-scheduler, while
using the PostRASchedulerList with aggressive anti-dependency breaking. I wonder then if there
is any reason in particular not to run the anti-dep breaker post-RA if using the post-RA machine
scheduler instead (is the idea that it might not be needed if pre-ra machine-sched is enabled,
while PPC in particular still benefits from it)?

I suppose that if benchmarks do not benefit from bidirectional pre-RA scheduling on SystemZ,
there isn't any principal argument at the moment to enable it?

/Jonas

One of the reasons bidirectional scheduling was enabled for some in-tree targets is just so that the code paths are well tested. If it doesn’t help your target then it's probably a net loss just because it makes debugging the scheduling heuristics a bit more difficult.

FWIW: At the time the MachineScheduler was being developed, the critical anti-dependence breaking pass was a severe source of bugs. I considered it to be unmaintainable, a terrible approach, and effectively deprecated as soon as targets could move to preRA scheduling with postRA rescheduling of spill code. I was also hoping that the MachineScheduler would be able to break anti-dependencies on-the-fly to form better instructions groups—that never happened, partly because I never worked on an in-order target. I’m not sure why PPC still benefits from anti-dependence breaking, so I can’t say of there is a better approach for that target.

-Andy