Dealing with illegal operand mappings in RegBankSelect

Hi,

Some operations on AMDGPU require operands which must be in a register bank that is impossible to copy from another. The operation needs to be rewritten in a complex way to avoid the illegal copy.

Currently if I correctly report the required register banks in the operand mapping, RegBankSelect happily inserts the illegal copies, somehow concluding they are cheap (I would at least hope there’s an assert or something). So far I’ve worked around this by lying and reporting all of the invalid source register banks as legal. applyMapping then checks these registers and creates the new vregs and rewrites as necessary. Is this the way this is intended to work? This sort of makes sense, since it is legal in the sense that applyMapping can deal with it. The cases that need to be rewritten can then have a very high, but not impossible cost in the case of getInstrAlternativeMappings.

Lying about the actually legal banks seems counterintuitive to me. Should there be something like a new RepairingPlacement type for this to just create the vregs without inserting the copy?

-Matt

Hi Matt,

Hi,

Some operations on AMDGPU require operands which must be in a register bank that is impossible to copy from another. The operation needs to be rewritten in a complex way to avoid the illegal copy.

Currently if I correctly report the required register banks in the operand mapping, RegBankSelect happily inserts the illegal copies, somehow concluding they are cheap (I would at least hope there’s an assert or something).

I would have hoped that we end up with a huge cost.
The assumption is that you can copy to/from any pair of register bank, then the idea is that the copy should be lowered in something that makes sense (e.g., a pair of store/load) by the target.
I am guessing that we could directly ask the target what pseudo-copy should be inserted instead of always adding plain copy.

Would that work for you?

So far I’ve worked around this by lying and reporting all of the invalid source register banks as legal. applyMapping then checks these registers and creates the new vregs and rewrites as necessary. Is this the way this is intended to work?

I am not sure I understand the work around.
Let me try to illustrate this with an example.
Let say we have:
= inst a(bank1)
And inst only accepts bank2.
Are you saying that you lie by saying that bank1 is legal in that case?
If so, why applyMapping would do any rewriting since this is legal.

This sort of makes sense, since it is legal in the sense that applyMapping can deal with it. The cases that need to be rewritten can then have a very high, but not impossible cost in the case of getInstrAlternativeMappings.

Lying about the actually legal banks seems counterintuitive to me.

Agree, that’s not the way it’s supposed to work.

Should there be something like a new RepairingPlacement type for this to just create the vregs without inserting the copy?

See my previous comment on what the copies are supposed to mean. I guess we could imagine a new kind of repairing placement but I am not sure what it would bring us, in the sense of lowering the (pseudo) copy differently should already do what we want.

Cheers,
-Quentin

Hi Matt,

Hi,

Some operations on AMDGPU require operands which must be in a register bank that is impossible to copy from another. The operation needs to be rewritten in a complex way to avoid the illegal copy.

Currently if I correctly report the required register banks in the operand mapping, RegBankSelect happily inserts the illegal copies, somehow concluding they are cheap (I would at least hope there’s an assert or something).

I would have hoped that we end up with a huge cost.
The assumption is that you can copy to/from any pair of register bank, then the idea is that the copy should be lowered in something that makes sense (e.g., a pair of store/load) by the target.
I am guessing that we could directly ask the target what pseudo-copy should be inserted instead of always adding plain copy.

Would that work for you?

You can’t legitimately copy from vector to scalar. It conceptually doesn’t work, and going through memory doesn’t help. The use instruction needs to be rewritten to (in the worst case) scalarize the operation for every workitem. A pseudocopy would still be some illegal operation which cannot exist which would need to be guaranteed to be removed, so I don’t think this would be any cleaner than allowing the illegal copies.

So far I’ve worked around this by lying and reporting all of the invalid source register banks as legal. applyMapping then checks these registers and creates the new vregs and rewrites as necessary. Is this the way this is intended to work?

I am not sure I understand the work around.
Let me try to illustrate this with an example.
Let say we have:
= inst a(bank1)
And inst only accepts bank2.
Are you saying that you lie by saying that bank1 is legal in that case?
If so, why applyMapping would do any rewriting since this is legal.

Yes. It’s what happens here: https://reviews.llvm.org/D58511

For extract_vector_elt, the indirect register index type must be an SGPR. I’ve implemented the mapping as just copying the original source bank and allowing applyMapping to deal with it. There might be cases that aren’t correctly being rewritten, but I haven’t finished handling every case to run into them yet.

This sort of makes sense, since it is legal in the sense that applyMapping can deal with it. The cases that need to be rewritten can then have a very high, but not impossible cost in the case of getInstrAlternativeMappings.

Lying about the actually legal banks seems counterintuitive to me.

Agree, that’s not the way it’s supposed to work.

Should there be something like a new RepairingPlacement type for this to just create the vregs without inserting the copy?

See my previous comment on what the copies are supposed to mean. I guess we could imagine a new kind of repairing placement but I am not sure what it would bring us, in the sense of lowering the (pseudo) copy differently should already do what we want.

It would avoid producing a totally illegal operation that something would need to be responsible for ensuring is deleted

-Matt

Hi Matt,

Hi,

Some operations on AMDGPU require operands which must be in a register bank that is impossible to copy from another. The operation needs to be rewritten in a complex way to avoid the illegal copy.

Currently if I correctly report the required register banks in the operand mapping, RegBankSelect happily inserts the illegal copies, somehow concluding they are cheap (I would at least hope there’s an assert or something).

I would have hoped that we end up with a huge cost.
The assumption is that you can copy to/from any pair of register bank, then the idea is that the copy should be lowered in something that makes sense (e.g., a pair of store/load) by the target.
I am guessing that we could directly ask the target what pseudo-copy should be inserted instead of always adding plain copy.

Would that work for you?

You can’t legitimately copy from vector to scalar. It conceptually doesn’t work, and going through memory doesn’t help. The use instruction needs to be rewritten to (in the worst case) scalarize the operation for every workitem. A pseudocopy would still be some illegal operation which cannot exist which would need to be guaranteed to be removed, so I don’t think this would be any cleaner than allowing the illegal copies.

So far I’ve worked around this by lying and reporting all of the invalid source register banks as legal. applyMapping then checks these registers and creates the new vregs and rewrites as necessary. Is this the way this is intended to work?

I am not sure I understand the work around.
Let me try to illustrate this with an example.
Let say we have:
= inst a(bank1)
And inst only accepts bank2.
Are you saying that you lie by saying that bank1 is legal in that case?
If so, why applyMapping would do any rewriting since this is legal.

Yes. It’s what happens here: https://reviews.llvm.org/D58511

For extract_vector_elt, the indirect register index type must be an SGPR. I’ve implemented the mapping as just copying the original source bank and allowing applyMapping to deal with it.

I don’t get what you mean by “applyMapping to deal with it”. Won’t applyMapping will insert copies to rewrite these, hence how is this different from what the repairing code does?
Unless, maybe, you’re talking about the target specific RegisterBankInfo::applyMapping not RegBankSelect::applyMapping.

If that’s the case what do you do in here that we could maybe generalize?

I don’t get what you mean by “applyMapping to deal with it”. Won’t applyMapping will insert copies to rewrite these, hence how is this different from what the repairing code does?
Unless, maybe, you’re talking about the target specific RegisterBankInfo::applyMapping not RegBankSelect::applyMapping.

If that’s the case what do you do in here that we could maybe generalize?

Sorry, I mean report them as legal so they are left as-is with no copy inserted. I can see the operand is illegal and do everything needed in the target applyMappingImpl (first sample case is https://reviews.llvm.org/D58512)

Wow, that’s quite a lot of code to fix a couple of operands :slight_smile:

Couldn’t we do that expansion as part of the selection of the “pseudo copy”?

That would require looking for all the uses of the copy and applying this to them. It’s really the use needs to be rewritten, and not a property of the copy. The only use I would have for the copy is as as a means of passing which registers were already created for the new mapping, after which point I would need to delete it.

-Matt

I don’t get what you mean by “applyMapping to deal with it”. Won’t applyMapping will insert copies to rewrite these, hence how is this different from what the repairing code does?
Unless, maybe, you’re talking about the target specific RegisterBankInfo::applyMapping not RegBankSelect::applyMapping.

If that’s the case what do you do in here that we could maybe generalize?

Sorry, I mean report them as legal so they are left as-is with no copy inserted. I can see the operand is illegal and do everything needed in the target applyMappingImpl (first sample case is https://reviews.llvm.org/D58512)

Wow, that’s quite a lot of code to fix a couple of operands :slight_smile:

Couldn’t we do that expansion as part of the selection of the “pseudo copy”?

That would require looking for all the uses of the copy and applying this to them. It’s really the use needs to be rewritten, and not a property of the copy.

Ok so what you’re saying is:
1 sgpr = copy vgpr
2 = use sgpr

We wouldn’t change the expansion of the copy in 1, but the use in 2. That’s interesting because from RBS point of view, the use in 2 is supposed to be final (in the sense not needed any rewriting).
I am starting to think that the way you handle it is probably the right way since it is so off of what RBS’ expectations are.

Let me sleep on this to see if I can come up with something else.

The only use I would have for the copy is as as a means of passing which registers were already created for the new mapping, after which point I would need to delete it.

Could you describe in pseudo code what the expansion of vgpr into sgpr looks like?
e.g., = use vgpr
And you only support = use sgpr

It’s serializing the vector operation. There’s an additional optimization to reduce the number of loop iterations when multiple work items/lanes/threads have the same value in them which happens in practice, but essentially it does:

Save Execution Mask
For (Lane : Wavefront/Warp) {
Enable Lane, Disable all other lanes
SGPR = read SGPR value for current lane from VGPR
VGPRResult[Lane] = use_op SGPR
}
Restore Execution Mask

Eventually it might be nice to have optimizations to only emit one of these loops when multiple consecutive instructions need the same register handled (which I suspect will happen frequently with image samplers), but I haven’t really thought about what that should look like yet.

-Matt

Thanks for the details Matt, now I think I can suggest an alternative or provide more reasonable explanation why what you’re doing is right!

Assuming we stick to your current approach, it makes sense to mark the operation legal for vgpr because from RBS point of view the boundaries of the expansion of the instruction are indeed vgpr.

Now, if we want to leverage the infrastructure provided by RBS for this case, a possible alternative would require to unroll the loop (or use a pseudo that would fake the loop unrolling).
Here is what it would look like:
Essentially we would replace:

= use <NxsM> vgpr

into

sgpr1, …, sgprN = extract_reg vgpr
= use sM sgpr1

…
= use sM sgprN

In terms of mapping, you would describe vgpr as being broken down in N element of sgpr. The applyMapping would insert the extract_reg for repairing and expand the vector use into the scalar uses (or one pseudo with N sgpr as input).

Honestly, I don’t know how much benefit you would get to expose these details to RBS today.
The advantages I see moving forward are:

  • At some point I wanted to teach RBS to materialize the repairing code next to the definition (as opposed to the use like we do right now) so that we can reuse that repairing for something else (what you’ve pointed out in your previous reply)
  • In the future RBS is supposed to be smart enough to decide that something needs to be scalarized because its uses are cheaper that way. I.e., in your case, the definition of vgpr could be scalarized from the start and we wouldn’t have to insert this repairing code (in other words, RBS would have chosen to scalarize the def of the the vgpr into sgpr instead of repairing the use of the vgpr into sgpr).

Therefore, if we choose not to expose these details to RBS, we shut the door to potential improvements on how the repairing points are inserted/shared and how the cost model is able to deal with choosing the best instructions based on its uses.

Obviously, I haven’t spent a lot of time thinking about for your case, hence it could be completely bogus!

Cheers,
-Quentin