Assign different RegClasses to a virtual register based on 'uniform' attribute?

Hi,

I am working on a new LLVM target for Intel GPU, which also has same kind of scalar/vector register classes used in AMDGPU target. Like for a i32 virtual register, it will be held in scalar register if its value is uniform across a wavefront/warp, otherwise it will be in a vector register. Does AMDGPU already done this? I read the code, but I didn’t figure out how to do this. Anybody has idea on this?

  • Ruiling

Hi,

I am working on a new LLVM target for Intel GPU, which also has same kind
of scalar/vector register classes used in AMDGPU target. Like for a i32
virtual register, it will be held in scalar register if its value is
uniform across a wavefront/warp, otherwise it will be in a vector register.
Does AMDGPU already done this? I read the code, but I didn't figure out how
to do this. Anybody has idea on this?

In the AMDGPU backend we select everything we can to scalar
instructions, and then after instruction selection, we move
non-uniform values to the vector ALU. This is done by
the SIFixSGPRCopiesPass, which relies heavily on
SIInstrInfo::moveToVALU().

-Tom

2016-12-20 22:14 GMT+08:00 Tom Stellard <tom@stellard.net>:

Hi,

I am working on a new LLVM target for Intel GPU, which also has same kind
of scalar/vector register classes used in AMDGPU target. Like for a i32
virtual register, it will be held in scalar register if its value is
uniform across a wavefront/warp, otherwise it will be in a vector register.
Does AMDGPU already done this? I read the code, but I didn’t figure out how
to do this. Anybody has idea on this?

In the AMDGPU backend we select everything we can to scalar
instructions, and then after instruction selection, we move
non-uniform values to the vector ALU. This is done by
the SIFixSGPRCopiesPass, which relies heavily on
SIInstrInfo::moveToVALU().

Hi Tom,

I take a look at the code, it looks like a good idea. It really helps me a lot. Thanks Tom! I have a question for the code, why it only pass copy-like instructions as TopInst to moveToALU()? Is there any special reason to do like this? I thought that iterating through all the MIs and fix regClass if needed would be ok. Am I thinking it too simple?

  • Ruiling

The instruction selector will insert these copies to satisfy the register operand constraints, so by finding all users (and users of users) of the illegal copies you find the same thing. The instruction set is different, so we’re really replacing the instructions and not exactly just changing the register classes.

I think this process logically makes sense, moving things to vector as forced. However I’m uncertain if this is the best approach. I’ve debated going the other direction and selecting everything to vector instruction, and having an optimization pass move parts to scalars. This is what the AMD compiler does. There are different trade offs, but one advantage is you immediately have something resembling a legal program to begin with.

-Matt

I'm not sure how far along you are in the backend, but the new
GlobalISel solves this problem pretty well by assigning register
banks to instructions before instruction selection.

If you're just getting started you may want to look at using GlobalISel
from the start I think it will make things much easier for you.

-Tom

2016年12月21日星期三,Matt Arsenault <arsenm2@gmail.com> 写道:

2016-12-20 22:14 GMT+08:00 Tom Stellard <tom@stellard.net>:

Hi,

I am working on a new LLVM target for Intel GPU, which also has same kind
of scalar/vector register classes used in AMDGPU target. Like for a i32
virtual register, it will be held in scalar register if its value is
uniform across a wavefront/warp, otherwise it will be in a vector register.
Does AMDGPU already done this? I read the code, but I didn’t figure out how
to do this. Anybody has idea on this?

In the AMDGPU backend we select everything we can to scalar
instructions, and then after instruction selection, we move
non-uniform values to the vector ALU. This is done by
the SIFixSGPRCopiesPass, which relies heavily on
SIInstrInfo::moveToVALU().

Hi Tom,

I take a look at the code, it looks like a good idea. It really helps me a lot. Thanks Tom! I have a question for the code, why it only pass copy-like instructions as TopInst to moveToALU()? Is there any special reason to do like this? I thought that iterating through all the MIs and fix regClass if needed would be ok. Am I thinking it too simple?

  • Ruiling

-Tom

  • Ruiling

  • Ruiling

The instruction selector will insert these copies to satisfy the register operand constraints, so by finding all users (and users of users) of the illegal copies you find the same thing.
Only checking copy-like MIs and their further user sounds more reasonable. if iterating through all MIs and checking all the MIs’ the Src&Dst register class to find out instructions that need to fix, which is what I previously thought, would just waste some compile time.

The instruction set is different, so we’re really replacing the instructions and not exactly just changing the register classes.

I think this process logically makes sense, moving things to vector as forced. However I’m uncertain if this is the best approach. I’ve debated going the other direction and selecting everything to vector instruction, and having an optimization pass move parts to scalars.
I don’t know what makes you thinking about the change. Is “the other direction” more safe because it only optimize known pattern? Or do you ever meet some situation that you cannot handle it smart using current solution? I will do some try in my target, may be we can have some further discussion after I know more about LLVM on this.

  • Ruiling
    This is what the AMD compiler does. There are different trade offs, but one advantage is you immediately have something resembling a legal program to begin with.

2016年12月22日星期四,Tom Stellard <tom@stellard.net> 写道:

2016-12-20 22:14 GMT+08:00 Tom Stellard <tom@stellard.net mailto:[tom@stellard.net](mailto:tom@stellard.net)>:

Hi,

I am working on a new LLVM target for Intel GPU, which also has same kind
of scalar/vector register classes used in AMDGPU target. Like for a i32
virtual register, it will be held in scalar register if its value is
uniform across a wavefront/warp, otherwise it will be in a vector register.
Does AMDGPU already done this? I read the code, but I didn’t figure out how
to do this. Anybody has idea on this?

In the AMDGPU backend we select everything we can to scalar
instructions, and then after instruction selection, we move
non-uniform values to the vector ALU. This is done by
the SIFixSGPRCopiesPass, which relies heavily on
SIInstrInfo::moveToVALU().

Hi Tom,

I take a look at the code, it looks like a good idea. It really helps me a lot. Thanks Tom! I have a question for the code, why it only pass copy-like instructions as TopInst to moveToALU()? Is there any special reason to do like this? I thought that iterating through all the MIs and fix regClass if needed would be ok. Am I thinking it too simple?

  • Ruiling

-Tom

  • Ruiling

  • Ruiling

The instruction selector will insert these copies to satisfy the register operand constraints, so by finding all users (and users of users) of the illegal copies you find the same thing. The instruction set is different, so we’re really replacing the instructions and not exactly just changing the register classes.

I think this process logically makes sense, moving things to vector as forced. However I’m uncertain if this is the best approach. I’ve debated going the other direction and selecting everything to vector instruction, and having an optimization pass move parts to scalars. This is what the AMD compiler does. There are different trade offs, but one advantage is you immediately have something resembling a legal program to begin with.

I’m not sure how far along you are in the backend, but the new
GlobalISel solves this problem pretty well by assigning register
banks to instructions before instruction selection.
Hi Tom,

I am still at early stage, I will take a look at GlobalISel. Thanks for pointing out this.

  • Ruiling