Running GlobaISel passes after SelectionDAG instruction selection

Hi,

I've been experimenting with global isel over the last few weeks and it
is such a vast improvement over the SelectionDAG for the AMDGPU target
that I would really like to begin using it as soon as possible.

Given the lack of a replacement for SelectionDAG's legalizer / combiner,
and how much work this will be to implement, I think the fastest path to
doing this would be to run some of the GlobalISel passes after SelectionDAG
instruction selection.

What I would like to do is modify the AMDGPU target to select SelectionDAG
nodes to generic opcodes, and then run the InstructionSelector pass on
the resulting MachineFunction.

I would start by doing this with loads/stores and then move on to other
opcodes one at a time.

These changes will be a big improvement to AMDGPU, because it will allow
us to replace our SIFixSGPRCopies pass which is essentially just another
instruction selector that re-selects instructions based on their register
classes and also let us do a much better job of matching addressing modes,
which is very important for the AMDGPU target.

I already have a working prototype where I run SelectionDAG ISel and
then the InstructionSelector pass right afterwards, but I think to
make this approach work long-term, we will need to agree that this is a
supported use case of GlobalISel, and we will also need to start
building the GlobalISel code by default.

What do people think about this?

Thanks,
Tom

Enabling GlobalISel by default should be fine now. I think it was mostly disabled because we used to add extra type data to the MachineInstr class, which penalized existing memory usage. Now we store that in MachineRegisterInfo and there's no overhead unless you're actually using GlobalISel.

My main worry is that this form of plumbing would seem to turn the fallback code path into an infinite loop if you're not careful (and not actually simplify SelectionDAG if you are).

Other than that, I suppose the biggest issue would be divergence between what InstructionSelect expects (from RegBankSelect) and what your SelectionDAG shim is providing. I think that should be fairly minimal though (InstructionSelect needs to be able to cope with fully constrained VRegs anyway IMO), so no real objections here.

Tim.

I already have a working prototype where I run SelectionDAG ISel and
then the InstructionSelector pass right afterwards, but I think to
make this approach work long-term, we will need to agree that this is a
supported use case of GlobalISel, and we will also need to start
building the GlobalISel code by default.

Enabling GlobalISel by default should be fine now. I think it was mostly disabled because we used to add extra type data to the MachineInstr class, which penalized existing memory usage. Now we store that in MachineRegisterInfo and there's no overhead unless you're actually using GlobalISel.

My main worry is that this form of plumbing would seem to turn the fallback code path into an infinite loop if you're not careful (and not actually simplify SelectionDAG if you are).

Other than that, I suppose the biggest issue would be divergence between what InstructionSelect expects (from RegBankSelect) and what your SelectionDAG shim is providing. I think that should be fairly minimal though (InstructionSelect needs to be able to cope with fully constrained VRegs anyway IMO), so no real objections here.

+1

Q.

Hi Tom,

Hi,

I've been experimenting with global isel over the last few weeks and it
is such a vast improvement over the SelectionDAG for the AMDGPU target
that I would really like to begin using it as soon as possible.

Given the lack of a replacement for SelectionDAG's legalizer / combiner,

Could you elaborate on what is missing for the legalizer?
As always, we welcome contribution or PR to help guide the development of the framework.

And yeah, the combiner part is missing :).

and how much work this will be to implement, I think the fastest path to
doing this would be to run some of the GlobalISel passes after SelectionDAG
instruction selection.

What I would like to do is modify the AMDGPU target to select SelectionDAG
nodes to generic opcodes, and then run the InstructionSelector pass on
the resulting MachineFunction.

I would start by doing this with loads/stores and then move on to other
opcodes one at a time.

These changes will be a big improvement to AMDGPU, because it will allow
us to replace our SIFixSGPRCopies pass which is essentially just another
instruction selector that re-selects instructions based on their register
classes and also let us do a much better job of matching addressing modes,
which is very important for the AMDGPU target.

I already have a working prototype where I run SelectionDAG ISel and
then the InstructionSelector pass right afterwards, but I think to
make this approach work long-term,

When you say long-term what do you have in mind?
Put differently, I would recommend you help improving the framework instead of having this a supported solution. Indeed, the plan is still to kill SDISel :).

Right now, we are not investing a lot of ressources into the investigation of how do we do combines. If you are interested in that aspect, let me know I will share my vision on those.

Cheers,
-Quentin

I already have a working prototype where I run SelectionDAG ISel and
then the InstructionSelector pass right afterwards, but I think to
make this approach work long-term, we will need to agree that this is a
supported use case of GlobalISel, and we will also need to start
building the GlobalISel code by default.

What do people think about this?

I’m concerned.

The global isel was put in as a way of doing a prototype of the code without having to worry about long term maintenance of a branch, but accordingly I don’t think it’s had the level of scrutiny that we’d have when wanting to make something the default and supported functionality of any target.

-eric

Hi Tom,

>
> Hi,
>
> I've been experimenting with global isel over the last few weeks and it
> is such a vast improvement over the SelectionDAG for the AMDGPU target
> that I would really like to begin using it as soon as possible.
>
> Given the lack of a replacement for SelectionDAG's legalizer / combiner,

Could you elaborate on what is missing for the legalizer?
As always, we welcome contribution or PR to help guide the development of the framework.

And yeah, the combiner part is missing :).

I probably shouldn't have grouped the two together, but I' was mostly
thinking about the combiner. I haven't had much chance to look at the
legalizer yet.

> and how much work this will be to implement, I think the fastest path to
> doing this would be to run some of the GlobalISel passes after SelectionDAG
> instruction selection.
>
> What I would like to do is modify the AMDGPU target to select SelectionDAG
> nodes to generic opcodes, and then run the InstructionSelector pass on
> the resulting MachineFunction.
>
> I would start by doing this with loads/stores and then move on to other
> opcodes one at a time.
>
> These changes will be a big improvement to AMDGPU, because it will allow
> us to replace our SIFixSGPRCopies pass which is essentially just another
> instruction selector that re-selects instructions based on their register
> classes and also let us do a much better job of matching addressing modes,
> which is very important for the AMDGPU target.
>
> I already have a working prototype where I run SelectionDAG ISel and
> then the InstructionSelector pass right afterwards, but I think to
> make this approach work long-term,

When you say long-term what do you have in mind?
Put differently, I would recommend you help improving the framework instead of having this a supported solution. Indeed, the plan is still to kill SDISel :).

long-term means until we can switch to global-isel exclusively.

I'm really just looking for a way to speed our work on global-isel. My
proposal gives me a way to work on global-isel and certain improvements
to our existing backend in parallel. Otherwise, the global-isel work
would have to be given lower priority.

But this is really more of a project planning problem than a technical
issue. I don't want to force a bad technical solution on the community
if people don't think it's useful.

-Tom

Hi Tom,

Hi,

I've been experimenting with global isel over the last few weeks and it
is such a vast improvement over the SelectionDAG for the AMDGPU target
that I would really like to begin using it as soon as possible.

Given the lack of a replacement for SelectionDAG's legalizer / combiner,

Could you elaborate on what is missing for the legalizer?
As always, we welcome contribution or PR to help guide the development of the framework.

And yeah, the combiner part is missing :).

I probably shouldn't have grouped the two together, but I' was mostly
thinking about the combiner. I haven't had much chance to look at the
legalizer yet.

and how much work this will be to implement, I think the fastest path to
doing this would be to run some of the GlobalISel passes after SelectionDAG
instruction selection.

What I would like to do is modify the AMDGPU target to select SelectionDAG
nodes to generic opcodes, and then run the InstructionSelector pass on
the resulting MachineFunction.

I would start by doing this with loads/stores and then move on to other
opcodes one at a time.

These changes will be a big improvement to AMDGPU, because it will allow
us to replace our SIFixSGPRCopies pass which is essentially just another
instruction selector that re-selects instructions based on their register
classes and also let us do a much better job of matching addressing modes,
which is very important for the AMDGPU target.

I already have a working prototype where I run SelectionDAG ISel and
then the InstructionSelector pass right afterwards, but I think to
make this approach work long-term,

When you say long-term what do you have in mind?
Put differently, I would recommend you help improving the framework instead of having this a supported solution. Indeed, the plan is still to kill SDISel :).

long-term means until we can switch to global-isel exclusively.

I'm really just looking for a way to speed our work on global-isel. My
proposal gives me a way to work on global-isel and certain improvements
to our existing backend in parallel. Otherwise, the global-isel work
would have to be given lower priority.

There is still plenty of work in getting the framework ready to replace FastIsel by default on Aarch64. IMHO it makes more sense to revisit this in a few months.