Scheduler Roadmap

Hi everyone,

As I've mentioned before we are doing some work on the LLVM scheduler to
improve it. Specifically, we are working to get loads and stores to
move past each other when possible.

When I asked about enhancing scheduler heuristics a month or so ago, I
got a response about a MachineInstr scheduler and that that was the way
of the LLVM future. Is that so? Is the ScheduleDAG going away?

We found a thread started by Hal late last year with a patch from him to
do exactly what we are trying to do - free up load/store motion. The
discussion didn't really seem to come to a resolution and the patch
doesn't appear in trunk.

So we are in a quandry. Do we continue our ScheduleDAG enhancements or
do we wait for a MachineInstr scheduler? Will we have to throw away
work on ScheduleDAG schedulers?

Is there a roadmap for the scheduler? When should we expect a
MachineInstr scheduler to be usable? Can it be backported to 3.1?

In general, it would be very helpful for folks to post about major
architectural changes like this before work begins so the rest of us can
plan our work. After all, we are likely to run into the same problems
that need solutions. As it is, it gets pretty frustrating to do a bunch
of work in the current release only to have to throw it away because
someone else did something different.

Thanks for any insight you can provide!

                                -Dave

Hi everyone,

As I've mentioned before we are doing some work on the LLVM scheduler to
improve it. Specifically, we are working to get loads and stores to
move past each other when possible.

When I asked about enhancing scheduler heuristics a month or so ago, I
got a response about a MachineInstr scheduler and that that was the way
of the LLVM future. Is that so? Is the ScheduleDAG going away?

You sent a lengthy RFC on Apr 20 that demonstrated you aren't following developments on trunk. That's perfectly fine, but if you want to use the new scheduler before it is mature, you'll need to follow trunk.

We found a thread started by Hal late last year with a patch from him to
do exactly what we are trying to do - free up load/store motion. The
discussion didn't really seem to come to a resolution and the patch
doesn't appear in trunk.

Feel free to use the patch and send your thanks to Hal. It doesn't serve any purpose to mainline a partial solution only to replace it before it can ever be enabled by default, which would require a major performance investigation and introduces a huge risk (AliasAnalysis in CodeGen has not be well tested).

So we are in a quandry. Do we continue our ScheduleDAG enhancements or
do we wait for a MachineInstr scheduler? Will we have to throw away
work on ScheduleDAG schedulers?

If you're basing development on anything other than today's trunk, then please continue working on the SelectionDag scheduler.

Either way, if the scheduler means that much to you, you'll be developing your own target-specific algorithm. You can do that and be fairly decoupled from target independent changes.

Either way, you won't need to throw much away. Target-specific heuristics should be easy to transfer to an MI ScheduleDag from an SD ScheduleDAG. ScheduleDAG doesn't change.

Is there a roadmap for the scheduler? When should we expect a
MachineInstr scheduler to be usable? Can it be backported to 3.1?

Only use MachineInstr scheduler now if it solves a pressing problem for you. If you need it in 3.1, then develop your own MachineInstr scheduler using the trunk implementation as a helpful reference.

If you're developing based on 3.1, don't expect to upstream changes.

In general, it would be very helpful for folks to post about major
architectural changes like this before work begins so the rest of us can
plan our work. After all, we are likely to run into the same problems
that need solutions. As it is, it gets pretty frustrating to do a bunch
of work in the current release only to have to throw it away because
someone else did something different.

There's a difference between having an intention to replace the scheduler (for over a year now) and planning the work. The actual planning coincided nicely with your RFC, providing a good opportunity to air it on llvm-dev.

-Andy

Andrew Trick <atrick@apple.com> writes:

When I asked about enhancing scheduler heuristics a month or so ago, I
got a response about a MachineInstr scheduler and that that was the way
of the LLVM future. Is that so? Is the ScheduleDAG going away?

You sent a lengthy RFC on Apr 20 that demonstrated you aren't
following developments on trunk. That's perfectly fine, but if you
want to use the new scheduler before it is mature, you'll need to
follow trunk.

Ok, but that doesn't answer the question. Is SchedulerDAG going away?
If so, what's the timeframe for that? 3.2?

Feel free to use the patch and send your thanks to Hal. It doesn't
serve any purpose to mainline a partial solution only to replace it
before it can ever be enabled by default, which would require a major
performance investigation and introduces a huge risk (AliasAnalysis in
CodeGen has not be well tested).

Er, but as I understand it the MachineInstr scheduler will also use
alias analysis.

So we are in a quandry. Do we continue our ScheduleDAG enhancements or
do we wait for a MachineInstr scheduler? Will we have to throw away
work on ScheduleDAG schedulers?

If you're basing development on anything other than today's trunk,
then please continue working on the SelectionDag scheduler.

But is that going to be throw away work? That's what we need to
understand.

Either way, you won't need to throw much away. Target-specific
heuristics should be easy to transfer to an MI ScheduleDag from an SD
ScheduleDAG. ScheduleDAG doesn't change.

Ok, if that's the case it will help. But we also need the flexibility
provided by Hal's patch. I see that there is some work in that area in
the MachineInstr scheduler. A timeline would be help as to when you
expect this all to be ready.

Is there a roadmap for the scheduler? When should we expect a
MachineInstr scheduler to be usable? Can it be backported to 3.1?

Only use MachineInstr scheduler now if it solves a pressing problem
for you. If you need it in 3.1, then develop your own MachineInstr
scheduler using the trunk implementation as a helpful reference.

If you're developing based on 3.1, don't expect to upstream changes.

Again, this doesn't answer the questions:

- What's the expected release time for the new schedueler? 3.2?

- How difficult do you expect a backport to 3.1 to be? We have to work
  from 3.1. Trunk is too buggy.

In general, it would be very helpful for folks to post about major
architectural changes like this before work begins so the rest of us can
plan our work. After all, we are likely to run into the same problems
that need solutions. As it is, it gets pretty frustrating to do a bunch
of work in the current release only to have to throw it away because
someone else did something different.

There's a difference between having an intention to replace the
scheduler (for over a year now) and planning the work.

Ok, but I don't recall any post about replacing the scheduler. If there
was I missed it. That's easy to do on a list with hundreds of messages
a day. It would be helpful to have a roadmap page on the LLVM web site
to call attention to decisions like this. Such a page could be updated
as timelines become firmer.

The actual planning coincided nicely with your RFC, providing a good
opportunity to air it on llvm-dev.

I'm not sure what you mean by "planning." Do you mean design work? I
think that's quite different from a roadmap with timelines. It's really
the former that us non-Apple people need to have to effectively plan our
work. The last thing we want to do is duplicate what's already planned
to happen.

                              -Dave

Andrew Trick <atrick@apple.com> writes:

When I asked about enhancing scheduler heuristics a month or so ago, I
got a response about a MachineInstr scheduler and that that was the way
of the LLVM future. Is that so? Is the ScheduleDAG going away?

You sent a lengthy RFC on Apr 20 that demonstrated you aren't
following developments on trunk. That's perfectly fine, but if you
want to use the new scheduler before it is mature, you'll need to
follow trunk.

Ok, but that doesn't answer the question. Is SchedulerDAG going away?
If so, what's the timeframe for that? 3.2?

Feel free to use the patch and send your thanks to Hal. It doesn't
serve any purpose to mainline a partial solution only to replace it
before it can ever be enabled by default, which would require a major
performance investigation and introduces a huge risk (AliasAnalysis in
CodeGen has not be well tested).

Er, but as I understand it the MachineInstr scheduler will also use
alias analysis.

So we are in a quandry. Do we continue our ScheduleDAG enhancements or
do we wait for a MachineInstr scheduler? Will we have to throw away
work on ScheduleDAG schedulers?

If you're basing development on anything other than today's trunk,
then please continue working on the SelectionDag scheduler.

But is that going to be throw away work? That's what we need to
understand.

Either way, you won't need to throw much away. Target-specific
heuristics should be easy to transfer to an MI ScheduleDag from an SD
ScheduleDAG. ScheduleDAG doesn't change.

Ok, if that's the case it will help. But we also need the flexibility
provided by Hal's patch. I see that there is some work in that area in
the MachineInstr scheduler. A timeline would be help as to when you
expect this all to be ready.

Is there a roadmap for the scheduler? When should we expect a
MachineInstr scheduler to be usable? Can it be backported to 3.1?

Only use MachineInstr scheduler now if it solves a pressing problem
for you. If you need it in 3.1, then develop your own MachineInstr
scheduler using the trunk implementation as a helpful reference.

If you're developing based on 3.1, don't expect to upstream changes.

Again, this doesn't answer the questions:

- What's the expected release time for the new schedueler? 3.2?

- How difficult do you expect a backport to 3.1 to be? We have to work
from 3.1. Trunk is too buggy.

You've stated that trunk is too buggy for you to work from on multiple occasions. Can you elaborate? That doesn't match my experience, as I install a new compiler on my workstation from a trunk build every night and have only had a problem as a result once in the last year or more. It sounds like you've had a much different experience, which is unfortunate, and perhaps indicates a hole in our buildbot and nightly test infrastructure that could be fixed.

-Jim

Jim Grosbach <grosbach@apple.com> writes:

- How difficult do you expect a backport to 3.1 to be? We have to work
from 3.1. Trunk is too buggy.

You've stated that trunk is too buggy for you to work from on multiple
occasions. Can you elaborate? That doesn't match my experience, as I
install a new compiler on my workstation from a trunk build every
night and have only had a problem as a result once in the last year or
more. It sounds like you've had a much different experience, which is
unfortunate, and perhaps indicates a hole in our buildbot and nightly
test infrastructure that could be fixed.

We do a lot of Fortran which doesn't get covered all that well by the
nightly tester. We also have a completely different frontend and
optimizer meaning that we present code to LLVM that it's never seen
before. We've seen major scalability problems in the past. We compile
some absolutely gigantic codes here. Fortunately it appears most of
these have been fixed but I think LoopStrengthReduce is still a problem
(we've turned it off). Scheduling, regalloc, instcombine and dagcombine
have all presented problems in the past. We've sent patches for the
most egregious cases. Many of those patches get dropped, leading to
less confidence on our side that issues will be fixed quickly. Others
in the LLVM community have fixed other such problems.

Again, these issues get fixed with new releases (and with some hacking
by us in the interim) but the experience makes me very hesitant to make
our development depend on trunk. We cannot afford to waste even a few
days of interrupted release development work tracking down a new
regression introduced in trunk.

We also can't afford to have compiler builds break because someone
upstream decided to change an API we rely on. We make monthly releases.
We have to have some control over when we introduce such changes.

We run hundreds of thousands of tests each week, both simple regression
tests and large applications. I don't think the nightly tester will
ever be capable of covering that.

I have a long-term goal of setting up a tree that uses LLVM trunk for
our builds both so I can track what's happening on trunk to more easily
know what it means for us and to test out some of the new features, like
the MachineInstr scheduler in this case. Unfortunately, time is limited
so I have to do that piecemeal 10 minutes here and there. The git
mirror will make that much easier.

                                -Dave

There’s no great solution. But you’ll have to make a tradeoff between being engaged in the development process vs insulating your development from external changes. I’ve spent plenty of time dealing with a similar situation, and I found it more efficient to keep all development upstream from integration and release as in: LLVM trunk → Cray trunk → Cray release. This way you’re only back porting critical bug fixes and you’re never reverse-engineering a pile of new features to find out why something broke.

-Andy

Jim Grosbach <grosbach@apple.com> writes:

- How difficult do you expect a backport to 3.1 to be? We have to work
from 3.1. Trunk is too buggy.

You've stated that trunk is too buggy for you to work from on multiple
occasions. Can you elaborate? That doesn't match my experience, as I
install a new compiler on my workstation from a trunk build every
night and have only had a problem as a result once in the last year or
more. It sounds like you've had a much different experience, which is
unfortunate, and perhaps indicates a hole in our buildbot and nightly
test infrastructure that could be fixed.

We do a lot of Fortran which doesn't get covered all that well by the
nightly tester. We also have a completely different frontend and
optimizer meaning that we present code to LLVM that it's never seen
before. We've seen major scalability problems in the past. We compile
some absolutely gigantic codes here. Fortunately it appears most of
these have been fixed but I think LoopStrengthReduce is still a problem
(we've turned it off). Scheduling, regalloc, instcombine and dagcombine
have all presented problems in the past. We've sent patches for the
most egregious cases. Many of those patches get dropped, leading to
less confidence on our side that issues will be fixed quickly. Others
in the LLVM community have fixed other such problems.

Again, these issues get fixed with new releases (and with some hacking
by us in the interim) but the experience makes me very hesitant to make
our development depend on trunk. We cannot afford to waste even a few
days of interrupted release development work tracking down a new
regression introduced in trunk.

We also can't afford to have compiler builds break because someone
upstream decided to change an API we rely on. We make monthly releases.
We have to have some control over when we introduce such changes.

This has nothing to do with the scheduler roadmap. If you want to engage the community on API compatibility discussion, it should be a different thread. But given how long you have been involved in LLVM, you should know the philosophy: LLVM development is optimized for people who stay on trunk.

Evan

Jim Grosbach <grosbach@apple.com> writes:

>> - How difficult do you expect a backport to 3.1 to be? We have to
>> work from 3.1. Trunk is too buggy.

> You've stated that trunk is too buggy for you to work from on
> multiple occasions. Can you elaborate? That doesn't match my
> experience, as I install a new compiler on my workstation from a
> trunk build every night and have only had a problem as a result
> once in the last year or more. It sounds like you've had a much
> different experience, which is unfortunate, and perhaps indicates a
> hole in our buildbot and nightly test infrastructure that could be
> fixed.

We do a lot of Fortran which doesn't get covered all that well by the
nightly tester. We also have a completely different frontend and
optimizer meaning that we present code to LLVM that it's never seen
before. We've seen major scalability problems in the past. We
compile some absolutely gigantic codes here. Fortunately it appears
most of these have been fixed but I think LoopStrengthReduce is still
a problem (we've turned it off). Scheduling, regalloc, instcombine
and dagcombine have all presented problems in the past. We've sent
patches for the most egregious cases. Many of those patches get
dropped, leading to less confidence on our side that issues will be
fixed quickly. Others in the LLVM community have fixed other such
problems.

Again, these issues get fixed with new releases (and with some hacking
by us in the interim) but the experience makes me very hesitant to
make our development depend on trunk. We cannot afford to waste even
a few days of interrupted release development work tracking down a new
regression introduced in trunk.

We also can't afford to have compiler builds break because someone
upstream decided to change an API we rely on. We make monthly
releases. We have to have some control over when we introduce such
changes.

We run hundreds of thousands of tests each week, both simple
regression tests and large applications. I don't think the nightly
tester will ever be capable of covering that.

I have a long-term goal of setting up a tree that uses LLVM trunk for
our builds both so I can track what's happening on trunk to more
easily know what it means for us and to test out some of the new
features, like the MachineInstr scheduler in this case.
Unfortunately, time is limited so I have to do that piecemeal 10
minutes here and there. The git mirror will make that much easier.

Out of curiosity, can your frontend be configured to produce LLVM IR
that could be fed though the trunk backend passes, or is a tighter
integration required? If it is possible to produce LLVM IR, then it
might be possible to test a lot, although not all, of the upstream
changes using your internal test suite in combination with the generic
trunk backend in a fairly straightforward manner. It might even be
possible to setup a system to warn the upstream developers when we've
broken something.

-Hal

Andrew Trick <atrick@apple.com> writes:

    We also can't afford to have compiler builds break because someone
    upstream decided to change an API we rely on. We make monthly releases.
    We have to have some control over when we introduce such changes.

I've spent plenty of time dealing with a similar situation, and I
found it more efficient to keep all development upstream from
integration and release as in: LLVM trunk -> Cray trunk -> Cray
release.

No way this will work. It takes FAR too long to get any code accepted
upstream.

Unfortunately, the LLVM project has some pretty major barriers to
contributors. These have been discussed ad nauseum in the past so I
won't repeat it again.

                               -Dave

Evan Cheng <evan.cheng@apple.com> writes:

We also can't afford to have compiler builds break because someone
upstream decided to change an API we rely on. We make monthly releases.
We have to have some control over when we introduce such changes.

This has nothing to do with the scheduler roadmap. If you want to
engage the community on API compatibility discussion, it should be a
different thread.

I was asked why we don't follow trunk. I answered the question.

But given how long you have been involved in LLVM, you should know the
philosophy: LLVM development is optimized for people who stay on
trunk.

It's not very optimal for people outside of Apple who have critical
release schedules.

                                   -Dave

Hal Finkel <hfinkel@anl.gov> writes:

Out of curiosity, can your frontend be configured to produce LLVM IR
that could be fed though the trunk backend passes, or is a tighter
integration required?

Yes, we have this capability.

If it is possible to produce LLVM IR, then it might be possible to
test a lot, although not all, of the upstream changes using your
internal test suite in combination with the generic trunk backend in a
fairly straightforward manner.

Unfortuantely out test system isn't set up to accept LLVM IR files.
This is something I've wanted to have implemented for a long time but
we're all stretched pretty thin here.

It might even be possible to setup a system to warn the upstream
developers when we've broken something.

I would like that very much. It's a resource issue, not a technical
one.

I think it's pretty common for teams producing production-quality code
to want to base pieces off of tested and released software. I had never
head of any company doing otherwise until Apple with LLVM. It's not the
normal way to produce releases off an unstable trunk.

                               -Dave

Hal Finkel <hfinkel@anl.gov> writes:

> Out of curiosity, can your frontend be configured to produce LLVM IR
> that could be fed though the trunk backend passes, or is a tighter
> integration required?

Yes, we have this capability.

> If it is possible to produce LLVM IR, then it might be possible to
> test a lot, although not all, of the upstream changes using your
> internal test suite in combination with the generic trunk backend
> in a fairly straightforward manner.

Unfortuantely out test system isn't set up to accept LLVM IR files.
This is something I've wanted to have implemented for a long time but
we're all stretched pretty thin here.

> It might even be possible to setup a system to warn the upstream
> developers when we've broken something.

I would like that very much. It's a resource issue, not a technical
one.

I think that it might be useful to think about changing the parameters
of this optimization problem. Could you release enough of the testing
system so that others could help? Could you release (some subset of) the
tests as LLVM IR so that they could be integrated with the existing
buildbots? I understand that you may view these things as valuable IP,
but in reality, the opportunity cost of not sharing may far outweigh
any competitive advantage you get by not sharing.

I think it's pretty common for teams producing production-quality code
to want to base pieces off of tested and released software. I had
never head of any company doing otherwise until Apple with LLVM.
It's not the normal way to produce releases off an unstable trunk.

As far as I can tell, Apple tries very hard to keep trunk stable; it is
one of the most stable trunks with which I've worked. Obviously there
are regressions, but these tend to have a short lifespan. One of the
advantages of keeping a relatively stable trunk is that they get more
trunk testers, and that, in turn, help keeps trunk stable. As a result,
they can take releases from trunk, and this gives them a short
turn-around time on new features with low backporting overhead.

The disadvantage for outsiders is, however, that it forces your
releases to follow their releases (because of additional stabilization
activity prior to releases), and that may not be practical if your
release cycle is shorter than theirs. However, if the release cycle is
too short for the project's users, then we should think about
shortening it.

Thanks again,
Hal

Hal Finkel <hfinkel@anl.gov> writes:

I would like that very much. It's a resource issue, not a technical
one.

I think that it might be useful to think about changing the parameters
of this optimization problem. Could you release enough of the testing
system so that others could help?

I can't make that decision and I very much doubt we could even do it.
It's quite tied in to our entire release mechanism. Remember, this is a
process that has been going for 30 years or more.

Could you release (some subset of) the tests as LLVM IR so that they
could be integrated with the existing buildbots? I understand that you
may view these things as valuable IP, but in reality, the opportunity
cost of not sharing may far outweigh any competitive advantage you get
by not sharing.

Actually, we don't have any problem releasing tests. We have done so
before when sending patches. The problem is the people we got the tests
from. Some are from proprietary test suites, others are from sensitive
codes, etc. It's often not up to us at all.

A larger problem, I think, is that patches often get dropped and/or they
take forever to get approved. The red tape is astounding. You ran into
that with your scheduler change, which is obviously a Good Thing to us
and would support users of 3.0 and 3.1 very well even if it were to be
deprecated in 3.2. But yet it hasn't made it in and it looks like it
would not be accepted no matter what. That is a symptom of the problem.

I think it's pretty common for teams producing production-quality code
to want to base pieces off of tested and released software. I had
never head of any company doing otherwise until Apple with LLVM.
It's not the normal way to produce releases off an unstable trunk.

As far as I can tell, Apple tries very hard to keep trunk stable; it is
one of the most stable trunks with which I've worked. Obviously there
are regressions, but these tend to have a short lifespan. One of the
advantages of keeping a relatively stable trunk is that they get more
trunk testers, and that, in turn, help keeps trunk stable. As a result,
they can take releases from trunk, and this gives them a short
turn-around time on new features with low backporting overhead.

Yes, I completely understand the reasoning and in an ideal world I would
like to live off trunk. But if trunk is meant to be stable, why have
releases at all? There must be value added to releases or we just
shouldn't do them. It's that added value that we're counting on.

The disadvantage for outsiders is, however, that it forces your
releases to follow their releases (because of additional stabilization
activity prior to releases), and that may not be practical if your
release cycle is shorter than theirs. However, if the release cycle is
too short for the project's users, then we should think about
shortening it.

As I said, we release monthy updates. Major releases are generally once
a year but sometimes more frequently. LLVM doesn't do point releases
and that's another issue we have to deal with.

As I said, I am working on integrating trunk into our development tree
via git as an experiment. Maybe it will work out really well and we'll
be able to switch but I'm not counting on that. I think the LLVM
project is mature enough that we really should be considering supporting
release users much better. But of course that's coming from a selfish
position. :slight_smile: Still, I think it is a valid discussion to have.

                              -Dave

The disadvantage for outsiders is, however, that it forces your
releases to follow their releases (because of additional stabilization
activity prior to releases), and that may not be practical if your
release cycle is shorter than theirs. However, if the release cycle is
too short for the project's users, then we should think about
shortening it.

Just to clarify here, "Apple" release have nothing to do with llvm.org releases. The LLVM.org schedule is purely time-driven (a ~6 month cycle), and done by volunteers - including the @apple.com people who contribute to the releases.

The Apple schedule is quite variable, and often a lot more frequent than every 6 months. This is why Apple release have always been based off a random revision number, usually irritatingly right in the middle of an llvm.org release.

As far as I can tell, Apple tries very hard to keep trunk stable; it is
one of the most stable trunks with which I've worked. Obviously there

This is just one more reason that it is important to me for the trunk to remain stable.

-Chris

Hal Finkel <hfinkel@anl.gov> writes:

>> I would like that very much. It's a resource issue, not a
>> technical one.
>
> I think that it might be useful to think about changing the
> parameters of this optimization problem. Could you release enough
> of the testing system so that others could help?

I can't make that decision and I very much doubt we could even do it.
It's quite tied in to our entire release mechanism. Remember, this
is a process that has been going for 30 years or more.

> Could you release (some subset of) the tests as LLVM IR so that they
> could be integrated with the existing buildbots? I understand that
> you may view these things as valuable IP, but in reality, the
> opportunity cost of not sharing may far outweigh any competitive
> advantage you get by not sharing.

Actually, we don't have any problem releasing tests. We have done so
before when sending patches. The problem is the people we got the
tests from. Some are from proprietary test suites, others are from
sensitive codes, etc. It's often not up to us at all.

I completely understand. Why don't we start by having you prepare LLVM
IR files, and associated outputs, for x86_64 from your frontends only
from open-source codes. As a first step, you could even just generate
LLVM IR files for us from the codes in the LLVM test suite. We could
setup a buildbot based on those files (which I believe would be easy to
do), and then we can actively test trunk LLVM against those files.

A larger problem, I think, is that patches often get dropped and/or
they take forever to get approved. The red tape is astounding. You
ran into that with your scheduler change, which is obviously a Good
Thing to us and would support users of 3.0 and 3.1 very well even if
it were to be deprecated in 3.2. But yet it hasn't made it in and it
looks like it would not be accepted no matter what. That is a
symptom of the problem.

To be fair, the reason that my patch was not accepted was because it
caused test-suite failures on x86. Does the patch work for you? If it
does, then maybe the situation has changed, and we should reconsider
the status of the patch. The patch actually had two parts: the IR->DAG
modifications and the changes to the ILP scheduling heuristic. Changes
to the ILP scheduling heuristic may be required regardless of how or
where the critical chain is relaxed.

Given that the patch caused test-suite failures on x86, I did not want
to commit it as-is. I would have loved if someone else had worked to
diagnose and/or fix the remaining problems (which may have been
scattered among different backends), but it is hard to ask people to do
that for a feature that would be deprecated in six months time.

-Hal

Andrew Trick <atrick@apple.com> writes:

When I asked about enhancing scheduler heuristics a month or so ago, I
got a response about a MachineInstr scheduler and that that was the way
of the LLVM future. Is that so? Is the ScheduleDAG going away?

You sent a lengthy RFC on Apr 20 that demonstrated you aren't
following developments on trunk. That's perfectly fine, but if you
want to use the new scheduler before it is mature, you'll need to
follow trunk.

Ok, but that doesn't answer the question. Is SchedulerDAG going away?
If so, what's the timeframe for that? 3.2?

SchedulerDAG is used for both SD scheduling and MI scheduling. It's not going away.

SD scheduling is not going away in 3.2--it will be the first release with MI scheduling on by default.

If all goes well, I expect SD scheduling to be removed by 3.3. That has not been discussed.

Consider this the preliminary announcement. I'll post another announcement as soon as we have something that's more broadly interesting. In the current state it's only interesting for someone just beginning to write their own custom scheduler.

Here's a more complete list of the implementation steps, but the real effort will be spent in performance analysis required before flipping the switch. Don't expect it to be an adequate replacement out-of-box for your benchmarks before 3.2.

- Target pass configuration: DONE
- MachineScheduler pass framework: DONE
- MI Scheduling DAG: DONE
- AliasAnalysis aware DAG option: In review (Sergei)
- Bidirectional list scheduling: DONE
- LiveInterval Update: WIP (simple instruction reordering is supported)
- Target-independent precise modeling of register pressure: DONE
- Register pressure reduction scheduler: WIP
- Support for existing HazardChecker plugin
- New target description feature: buffered resources
- Modeling min latency, expected latency, and resources constraints
- Heuristics that balance interlocks, regpressure, latency and buffered resources

For targets where scheduling is critical, I encourage developers who stay in sync with trunk to write their own target-specific scheduler based on the pieces that are already available. Hexagon developers are doing this now. The LLVM toolkit for scheduling is all there--not perfect, but ready for developers.

- Pluggable MachineScheduler pass
- Scheduling DAG
- LiveInterval Update
- RegisterPressure tracking
- InstructionItinerary and HazardChecker (to be extended)

If you would simply like improved X86 scheduling without rolling your own, then providing feedback and test cases is useful so we can incorporate improvements into the standard scheduler while it's being developed.

Feel free to use the patch and send your thanks to Hal. It doesn't
serve any purpose to mainline a partial solution only to replace it
before it can ever be enabled by default, which would require a major
performance investigation and introduces a huge risk (AliasAnalysis in
CodeGen has not be well tested).

Er, but as I understand it the MachineInstr scheduler will also use
alias analysis.

AliasAnalysis is important. We want it fully supported and enabled by default, but that requires effort beyond simply enabling it. Today, that effort is in the MI scheduler.

The last thing we want to do is duplicate what's already planned

to happen.

The information I provided above is the best I can do, and as early as I could provide this level of detail. If you follow trunk, you can see the direction things are heading, but until recently I would not have been able to tell you "plans" in the form of dates or release goals.

-Andy

> Andrew Trick <atrick@apple.com> writes:
>
>>> When I asked about enhancing scheduler heuristics a month or so
>>> ago, I got a response about a MachineInstr scheduler and that
>>> that was the way of the LLVM future. Is that so? Is the
>>> ScheduleDAG going away?
>>
>> You sent a lengthy RFC on Apr 20 that demonstrated you aren't
>> following developments on trunk. That's perfectly fine, but if you
>> want to use the new scheduler before it is mature, you'll need to
>> follow trunk.
>
> Ok, but that doesn't answer the question. Is SchedulerDAG going
> away? If so, what's the timeframe for that? 3.2?

SchedulerDAG is used for both SD scheduling and MI scheduling. It's
not going away.

SD scheduling is not going away in 3.2--it will be the first release
with MI scheduling on by default.

If all goes well, I expect SD scheduling to be removed by 3.3. That
has not been discussed.

Consider this the preliminary announcement. I'll post another
announcement as soon as we have something that's more broadly
interesting. In the current state it's only interesting for someone
just beginning to write their own custom scheduler.

Here's a more complete list of the implementation steps, but the real
effort will be spent in performance analysis required before flipping
the switch. Don't expect it to be an adequate replacement out-of-box
for your benchmarks before 3.2.

- Target pass configuration: DONE
- MachineScheduler pass framework: DONE
- MI Scheduling DAG: DONE
- AliasAnalysis aware DAG option: In review (Sergei)
- Bidirectional list scheduling: DONE
- LiveInterval Update: WIP (simple instruction reordering is
supported)
- Target-independent precise modeling of register pressure: DONE
- Register pressure reduction scheduler: WIP
- Support for existing HazardChecker plugin

Is support for the existing hazard detectors working now? [it does not
say DONE or WIP here, but your comment below implies, I think, that it
is at least partially working].

- New target description feature: buffered resources
- Modeling min latency, expected latency, and resources constraints

Can you comment on how min and expected latency will be used in the
scheduling?

- Heuristics that balance interlocks, regpressure, latency and
buffered resources

For targets where scheduling is critical, I encourage developers who
stay in sync with trunk to write their own target-specific scheduler
based on the pieces that are already available. Hexagon developers
are doing this now. The LLVM toolkit for scheduling is all there--not
perfect, but ready for developers.

- Pluggable MachineScheduler pass
- Scheduling DAG
- LiveInterval Update
- RegisterPressure tracking
- InstructionItinerary and HazardChecker (to be extended)

If you would simply like improved X86 scheduling without rolling your
own, then providing feedback and test cases is useful so we can
incorporate improvements into the standard scheduler while it's being
developed.

Does this mean that we're going to see a new X86 scheduling paradigm,
or is the existing ILP heuristic, in large part, expected to stay?

Thanks again,
Hal

- Target pass configuration: DONE
- MachineScheduler pass framework: DONE
- MI Scheduling DAG: DONE
- AliasAnalysis aware DAG option: In review (Sergei)
- Bidirectional list scheduling: DONE
- LiveInterval Update: WIP (simple instruction reordering is
supported)
- Target-independent precise modeling of register pressure: DONE
- Register pressure reduction scheduler: WIP
- Support for existing HazardChecker plugin

Is support for the existing hazard detectors working now? [it does not
say DONE or WIP here, but your comment below implies, I think, that it
is at least partially working].

Glad you're interested. I can explain. We have several important tools in LLVM that most schedulers will need. That's what I was listing below (Configurable pass, DAG, LI update, RegPressure, Itinerary, HazardChecker--normally called a reservation table).

I really should have also mentioned the DFAPacketizer developed by the Hexagon team. It's being used by their VLIW scheduler, but not by the new "standard" scheduler that I'm working on.

Now that I mentioned that, I should mention MachineInstrBundles, which was a necessary IR feature to support the VLIW scheduler, but has other random uses--sometimes we want to glue machine instructions.

HazardChecker was already being used by the PostRA scheduler before I started working on infrastructure for a new scheduler. So it's there, and can be used by custom schedulers.

My first goal was to complete all of these pieces. They're in pretty good shape now but not well tested. The target independent model for register pressure derived from arbitrary register definitions was by far the most difficult aspect. Now I need to develop a standard scheduling algorithm that will work reasonably well for any target given the register description and optionally a scheduling itinerary.

The register pressure reduction heuristic was the first that I threw into the standard scheduler because it's potentially useful by itself. It's WIP.

I haven't plugged in the HazardChecker, but it's quite straightforward.

At that point, I'll have two competing scheduling constraints and will begin implementing a framework for balancing those constraints. I'll also add fuzzy constraints such as expected latency and other cpu resources. When I get to that point, I'll explain more, and I hope you and others will follow along and help with performance analysis and heuristics.

I will point out one important aspect of the design now. If scheduling is very important for your target's performance, and you are highly confident that you model your microarchitecture effectively and have just the right heuristics, then it might make sense to give the scheduler free reign to shuffle the instructions. The standard MachineScheduler will not make that assumption. It certainly can be tuned to be as aggressive as we like, but unless there is high level of confidence that reordering instructions will be beneficial, we don't want to do it. Rule #1 is not to follow blind heuristics that reschedule reasonable code into something pathologically bad. This notion of confidence is not something schedulers typically have, and is fundamental to the design.

For example, most schedulers have to deal with opposing constraints of register pressure and ILP. An aggressive way to deal with this is by running two separate scheduling passes. First top-down to find the optimal latency, then bottom-up to minimize resources needed to achieve that latency. Naturally, after the first pass, you've shuffled instructions beyond all recognition. Instead, we deal with this problem by scheduling in both directions simultaneously. At each point, we know which resources and constraints are likely to impact the cpu pipeline in both the top and bottom of the scheduling region. Doing this doesn't solve any fundamental problem, but it gives the scheduler great freedom at each point, including the freedom to do absolutely nothing, which is probably exactly what you want for a fair amount of X86 code.

- New target description feature: buffered resources
- Modeling min latency, expected latency, and resources constraints

Can you comment on how min and expected latency will be used in the
scheduling?

In the new scheduler's terminology, min latency is an interlocked resource, and expected latency is a buffered resource. Interlocked resources are used to form instruction groups (for performance only, not correctness). For out-of-order targets with register rename, we can use zero-cycle min latency so there is no interlock within an issue groups. Instead we know expected latency of the scheduled instructions relative to the critical path. We can balance the schedule so that neither the expected latency of the top nor bottom scheduled instructions exceed the overall critical path. This way, we will slice up two very long independent chains into neat chunks, instead of the random shuffling that we do today.

- Heuristics that balance interlocks, regpressure, latency and
buffered resources

For targets where scheduling is critical, I encourage developers who
stay in sync with trunk to write their own target-specific scheduler
based on the pieces that are already available. Hexagon developers
are doing this now. The LLVM toolkit for scheduling is all there--not
perfect, but ready for developers.

- Pluggable MachineScheduler pass
- Scheduling DAG
- LiveInterval Update
- RegisterPressure tracking
- InstructionItinerary and HazardChecker (to be extended)

If you would simply like improved X86 scheduling without rolling your
own, then providing feedback and test cases is useful so we can
incorporate improvements into the standard scheduler while it's being
developed.

Does this mean that we're going to see a new X86 scheduling paradigm,
or is the existing ILP heuristic, in large part, expected to stay?

It's a new paradigm but not a change in focus--we're not modeling the microarchitecture in any greater detail. Although other contributors are encouraged to do that.

Both schedulers will be supported for a time. In fact it will make sense to run both in the same compile, until MISched is good enough to take over. It will be easy to determine when one scheduler is doing better than the other. I'm relying on you to tell me when it's doing the wrong thing.

-Andy

Hal Finkel <hfinkel@anl.gov> writes:

Actually, we don't have any problem releasing tests. We have done so
before when sending patches. The problem is the people we got the
tests from. Some are from proprietary test suites, others are from
sensitive codes, etc. It's often not up to us at all.

I completely understand. Why don't we start by having you prepare LLVM
IR files, and associated outputs, for x86_64 from your frontends only
from open-source codes. As a first step, you could even just generate
LLVM IR files for us from the codes in the LLVM test suite. We could
setup a buildbot based on those files (which I believe would be easy to
do), and then we can actively test trunk LLVM against those files.

I like this idea. It'll work for C/C++ but not Fortran. Since there is
no Fortran ABI one has to link with our Fortran compiler & libraries to
get an executable that actually works.

But let me think about this some more. I would really like to expand
the LLVM testbase if we can. It will be a long process since I'll have
to get all these tests approved for release. I can't give a timeline on
that at all. I think it will be a very gradual process.

To be fair, the reason that my patch was not accepted was because it
caused test-suite failures on x86. Does the patch work for you?

I'm hopefully going to try it within the next few days.

If it does, then maybe the situation has changed, and we should
reconsider the status of the patch. The patch actually had two parts:
the IR->DAG modifications and the changes to the ILP scheduling
heuristic. Changes to the ILP scheduling heuristic may be required
regardless of how or where the critical chain is relaxed.

Ok, I will take a look at that.

Given that the patch caused test-suite failures on x86, I did not want
to commit it as-is.

Yes, I understand that. But from the discussion I got the impression
that the patch wasn't wanted because ScheduleDAG is going to be
deprecated. If that's not the case I will certainly work to get it
going!

I would have loved if someone else had worked to
diagnose and/or fix the remaining problems (which may have been
scattered among different backends), but it is hard to ask people to do
that for a feature that would be deprecated in six months time.

Yeah, I understand. But for those of us working off releases it would
not be deprecated in six months. That's probably moot now since 3.1 is
almost out the door but I think the patch will still be useful for us.

Believe me, I would really like to be able to work off trunk but I have
to convince a lot of people here that that is possible. Starting with
myself. :slight_smile:

                                -Dave

Andrew Trick <atrick@apple.com> writes:

Ok, but that doesn't answer the question. Is SchedulerDAG going away?
If so, what's the timeframe for that? 3.2?

SchedulerDAG is used for both SD scheduling and MI scheduling. It's not going away.

Oh! That's good news!

SD scheduling is not going away in 3.2--it will be the first release with MI scheduling on by default.

Ok, that is helpful. Thanks!

If all goes well, I expect SD scheduling to be removed by 3.3. That has not been discussed.

Is there any particular reason to remove it? Something has to convert
from SDNodes to MachineInstrs so we'll at least need the "original
order" SUnit scheduler, yes?

Here's a more complete list of the implementation steps, but the real
effort will be spent in performance analysis required before flipping
the switch. Don't expect it to be an adequate replacement out-of-box
for your benchmarks before 3.2.

Understood.

- AliasAnalysis aware DAG option: In review (Sergei)

This is going to be very important to us. I believe this accomplishes
the same thing as Hal's patch and work we've done here. I'm really glad
to see this here!

If you would simply like improved X86 scheduling without rolling your
own, then providing feedback and test cases is useful so we can
incorporate improvements into the standard scheduler while it's being
developed.

Yep.

Er, but as I understand it the MachineInstr scheduler will also use
alias analysis.

AliasAnalysis is important. We want it fully supported and enabled by
default, but that requires effort beyond simply enabling it. Today,
that effort is in the MI scheduler.

Yes of course. Does this mean alias analysis in general will be
available at the MachineInstr level? I've run into a desire for that
multiple times.

The information I provided above is the best I can do, and as early as
I could provide this level of detail. If you follow trunk, you can see
the direction things are heading, but until recently I would not have
been able to tell you "plans" in the form of dates or release goals.

This is really helpful, thanks!

                                 -Dave