[RFC] Porting MachinePipeliner to AArch64+SVE

Hi,

I am extending LLVM for HPC applications.
As one of them, I am trying to make MachinePipeliner available on
AArch64 + Scalable Vector Extension environment.

MachinePipeliner is currently used only by Hexagon CPU.
Since it is a very portable implementation, I think that it will
actually work just by adding a little code for many CPUs(See Code [2]).

The current MachinePipeliner is written on the premise that
DFAPacketizer is used for resource management.
However, I’d like to use MachinePipeliner in a way that does not use
DFAPacketizer for the reasons described below(*).
In MachinePipeliner implementation, only a small part is dependent on
DFAPacketizer or Instruction itineraries.
Therefore, I think that one of the following implementations is
possible:

(a) creating a path in MachinePipeliner that does not use DFAPacketizer
(b) making MachinePipeliner inheritable so that anyone can write code
that does not use DFAPacketizer

Since implementations using only Instruction itineraries without
DFAPacketizer are possible, I don’t think that I can use
TargetSchedModel::hasInstrItineraries to select the execution path.
Personally, I think that implementation of (b) is better.

Also, if predicated instructions like SVE are available, prologue and
epilogue code generation using predicated execution as shown in the
reference[1] may be possible.
In this case, if we choose the implementation of (b) and it is
possible to override SwingSchedulerDAG::generatePipelinedLoop, I think
that it can easily be extended.

Comments or suggestions are welcome.

Thank you very much.

Best regards,

sample-code.c (468 Bytes)

Hi,

Masaki Arai via llvm-dev <llvm-dev@lists.llvm.org> writes:

Code:

The sample patch for origin/release_60 [2], which doesn't use
DFAPacketizer, can generate executable files from sample-code.c for
both AArch64 and x86_64.

  ...

[2] https://reviews.llvm.org/D47943

I am sorry that I misunderstood that `origin/release_60' means
`LLVM 6.0.0' and the above link included many irrelevant differences.

I made new

   https://reviews.llvm.org/D47948

so please check this instead.

Best regards,

Hi Masaki,

You can update the diff on the old review, I think it'll be easier, as
we don't have to keep adding all the people to it.

Also, make sure the review is against trunk, not a release.

Hi Renato,

Renato Golin <renato.golin@linaro.org> writes:

You can update the diff on the old review, I think it'll be easier, as
we don't have to keep adding all the people to it.

Thank you very much for your advice.
I updated https://reviews.llvm.org/D47943.
#And I will delete https://reviews.llvm.org/D47948.

Also, make sure the review is against trunk, not a release.

OK.
I will also update it after running tests on trunk.

Thanks.

Best regards,

Hi,

Hi,

I am extending LLVM for HPC applications.
As one of them, I am trying to make MachinePipeliner available on
AArch64 + Scalable Vector Extension environment.

Great, thanks for looking into that.

IIUC from having a first look at your patch, there is nothing SVE specific there so far. Although it potentially will be very useful for SVE, it should also be beneficial for AArch64 without SVE and X86, right? As there are no scheduling models available for SVE in LLVM yet, I suppose it would be a good motivation if you could show some benefit on existing AArch64 or X86 cores with your proposed modelling.

MachinePipeliner is currently used only by Hexagon CPU.
Since it is a very portable implementation, I think that it will
actually work just by adding a little code for many CPUs(See Code [2]).

The current MachinePipeliner is written on the premise that
DFAPacketizer is used for resource management.
However, I'd like to use MachinePipeliner in a way that does not use
DFAPacketizer for the reasons described below(*).
In MachinePipeliner implementation, only a small part is dependent on
DFAPacketizer or Instruction itineraries.
Therefore, I think that one of the following implementations is
possible:

(a) creating a path in MachinePipeliner that does not use DFAPacketizer
(b) making MachinePipeliner inheritable so that anyone can write code
that does not use DFAPacketizer

Since implementations using only Instruction itineraries without
DFAPacketizer are possible, I don't think that I can use
TargetSchedModel::hasInstrItineraries to select the execution path.
Personally, I think that implementation of (b) is better.

IMO it makes sense to go with (b), given that the dispatch overhead should be tiny compared to the other work that's going on and we also added similar hooks to the generic machine scheduler recently. But it seems like this is a smaller implementation detail and making sure we are getting the modelling aspect right is more important.

Thanks,
Florian

Hi,

Thank you very much for your comments.

Florian Hahn <florian.hahn@arm.com> writes:

IIUC from having a first look at your patch, there is nothing SVE
specific there so far. Although it potentially will be very useful for
SVE, it should also be beneficial for AArch64 without SVE and X86,
right?

Yes.
Our significant target is FUJITSU’s AArch64+SVE CPU, but I think
MachinePipeliner is beneficial for AArch 64 without SVE or any ILP
RISC CPUs.
However, I’m not sure for x86.

As there are no scheduling models available for SVE in LLVM
yet, I suppose it would be a good motivation if you could show some
benefit on existing AArch64 or X86 cores with your proposed modelling.

It is easy to make a small test set that can confirm performance
improvement.
However, I think there are many challenges to make MachinePipeliner
really beneficial on AArch64 without SVE for actual applications.
For example,
(a) Preparing the appropriate machine model for scheduling
(b) Consideration of register pressure in AArch64
(Coordination with register allocation pass)
(c) Extending iteration dependence distance (2 or more)
(d) Consideration of the impact of VPlan’s estimation
(Coordination with VPlan)
(e) Consideration of the impact of loop optimizations
(especially loop distribution)
(f) Consideration of the impact of flang

I would like to make it work only when option `-enable-pipeliner’ is
specified until these issues are solved.

IMO it makes sense to go with (b), given that the dispatch overhead
should be tiny compared to the other work that’s going on and we also
added similar hooks to the generic machine scheduler recently. But it
seems like this is a smaller implementation detail and making sure we
are getting the modelling aspect right is more important.

One of the reasons for posting the RFC is that MachinePipeliner is
updated frequently.
Therefore, I would like to hear the opinion of MachinePipeliner
developers.
I am glad to make any patches, but since I do not have a Hexagon
environment, I’m worried whether I can thoroughly test them.

Best regards,