RFC: [GlobalISel] propagating int/float type information

Hi,

GlobalISel currently drops all type information relating to the integer/FP distinction during the IR translation pass, as the LLT types only represent whether a value is a scalar/vector/pointer and it’s size/shape. To compensate, later passes use the FP operations on those values to guess what kind of value is being stored within that virtual register.

This means that i32/float loads get translated into the same thing, and only when that value is used, say by an fadd, then will we know that it was an FP value. The regbankselect pass on AArch64 currently tries to walk uses/defs in order to guess what kind fo regbank to assign to vregs. This however doesn’t work all the time, and most commonly, it doesn’t work when a load of an FP value is used in a loop. In that case, the FP users are obscured by PHIs which make it difficult (although not strictly impossible) to guess what regbank to assign. This has drastic consequences for performance on FP workloads.

But this isn’t the first time we’ve had this kind of issue, and it probably won’t be the last [1]. propose that we have some form of type hint propagation done at the IRTranslator stage in order to make this whole situation easier (and faster in compile-time).

Option 1) We use some form of metadata on the MIR instructions like G_LOADs to signify that the vreg defined likely has an FP IR type. IIUC the current Metadata MachineOperand type is only intended for debug info. This approach is probably the cheapest in compile time/complexity and is the least invasive, but we’d need to find somewhere in MachineInstr to store this extra information.

Option 2) Store the type hints in an analysis. In its simplest form at translation time we could keep a set of all the vregs that we know have FP types and then try to maintain that as new vregs are created to replace those throughout the pipeline. Keeping it updated might turn out to be expensive during passes like the legalizer.

Any thoughts?

Cheers,
Amara

[1] Currently we have a workaround for the specific case in https://reviews.llvm.org/D79207, but as Matt correctly points out, this isn’t viable in the long term because using the IR value type from the MachineMemOperand won’t work when opaque pointers finally land.

[AMD Public Use]

It seems to me like you’re looking for a workaround for the fact that nobody has put any serious optimization effort into RegBankSelect

-Matt

[AMD Public Use]

It seems to me like you’re looking for a workaround for the fact that nobody has put any serious optimization effort into RegBankSelect

Practically speaking, we have a compile time budget, and spending that on reconstructing information which we willingly dropped doesn’t make sense when the solution can be cheap. I don’t propose that we force the type distinction back, just to allow RBS to make fast, reasonably optimal decisions in most cases. If we then want to spend the rest of that CT budget in making even better decisions, then great.

The other thing we could do is to assign speculative regbanks to vregs during translation (if the target wants to opt-in), and then RBS can finalize the regbanks, changing some if it deems it necessary/optimal.

This seems reasonable to me. It wouldn’t require any new infrastructure

-Matt

The other thing we could do is to assign speculative regbanks to vregs during translation (if the target wants to opt-in), and then RBS can finalize the regbanks, changing some if it deems it necessary/optimal.

This seems reasonable to me. It wouldn’t require any new infrastructure

How would that work with illegal operations?

Would the legalizer have to preserve them?

Ditto with combines.

What I am saying is although this does not require any new infrastructure, this is not free adopt.

Cheers,
-Quentin

My thinking was no, any regbanks assigned before RBS are only hints and do not have to be preserved. But for good code it would help if the legalizer preserves them.

This issue would be there with the other approaches too. Propagating information through destructive passes like the legalizer/combined is not going to be free in any case.

Amara

Hmm, I feel we would be better at fixing RBS.

I don’t think it would be necessarily expensive compile time wise to do something more sensible in RBS when running in fast:
We could defer the assignment of a register bank until we see the first real use (i.e., not a PHI or copy).

As with every heuristics, there are downsides too, but it feels to me that this is more robust than channeling metadata through a bunch of passes.

I don't know too much about GlobalIsel, but I'm working on adding a new bfloat IR/MVT type (16-bit float type) to LLVM, and on one of the patches Amara raised the issue what we would to to disambiguate between a half and a bfloat for GlobalIsel.

Just wanted to highlight that BFloat might be another use-case for this.

Cheers,
/Ties

I don’t think bfloat should be handled this way. What Amara is suggesting is an optimization, i.e., if we drop the information we are still correct.
With bfloat, if we do an operation on float16 instead of bfloat16 this is a correctness problem.

So that means that either we need to have new opcodes for bfloat or we need to carry around the floating point type in MIR. I think it would be more manageable to have the floating point type long term.
That said, it also depends on what we decide to do at the IR level. For instance, if bfloat support in the IR is limited to intrinsics, we wouldn’t need to go down that road.

I don’t think bfloat should be handled this way. What Amara is suggesting is an optimization, i.e., if we drop the information we are still correct.
With bfloat, if we do an operation on float16 instead of bfloat16 this is a correctness problem.

So that means that either we need to have new opcodes for bfloat or we need to carry around the floating point type in MIR. I think it would be more manageable to have the floating point type long term.
That said, it also depends on what we decide to do at the IR level. For instance, if bfloat support in the IR is limited to intrinsics, we wouldn’t need to go down that road.

I don't know too much about GlobalIsel, but I'm working on adding a new bfloat IR/MVT type (16-bit float type) to LLVM, and on one of the patches Amara raised the issue what we would to to disambiguate between a half and a bfloat for GlobalIsel.

I don’t think it was me who said that but it’s a good point.

I see no good reason not to re-use the existing IR instructions for computation on BFloat, given that we do so for other uncommon FP formats as well. Which I think in turn leaves us with little choice but to mirror the IR here (adding separate opcodes for this case just seems wrong).

Quentin: Thanks for the info. I was under the impression that the LLVM community at large would prefer to extend the IR type to a bfloat MVT type. I've made a number of patches to implement this up to a point for AArch64. I can post those on Phab and start a thread to sample opinions.

Amara: Ah yes, I see now it was Matt Arsenault who made the comment. And I see he happens to be CC'ed on this issue. your Phab abbreviations are quite similar.

I don’t think bfloat should be handled this way. What Amara is suggesting is an optimization, i.e., if we drop the information we are still correct.
With bfloat, if we do an operation on float16 instead of bfloat16 this is a correctness problem.

So that means that either we need to have new opcodes for bfloat or we need to carry around the floating point type in MIR. I think it would be more manageable to have the floating point type long term.
That said, it also depends on what we decide to do at the IR level. For instance, if bfloat support in the IR is limited to intrinsics, we wouldn’t need to go down that road.

I don’t know too much about GlobalIsel, but I’m working on adding a new bfloat IR/MVT type (16-bit float type) to LLVM, and on one of the patches Amara raised the issue what we would to to disambiguate between a half and a bfloat for GlobalIsel.

I don’t think it was me who said that but it’s a good point.

I see no good reason not to re-use the existing IR instructions for computation on BFloat, given that we do so for other uncommon FP formats as well. Which I think in turn leaves us with little choice but to mirror the IR here (adding separate opcodes for this case just seems wrong).

Agree!

Quentin: Thanks for the info. I was under the impression that the LLVM community at large would prefer to extend the IR type to a bfloat MVT type. I've made a number of patches to implement this up to a point for AArch64. I can post those on Phab and start a thread to sample opinions.

Sounds good to me!

I also think having bfloat (all floating point types actually) in GISel (i.e., LLT) is also the right long term plan. Maybe something like a pair of two numbers for the number of bits in the mantissa and the number of bits in the exponent would be enough to represent all of them.

+1 to this. We have a few hacks in our out of tree backend to infer if a reg is floating point or not. It would be good to get rid of that.

Considering LLVM IR is going with bfloat as it’s own type, it might make sense to go beyond just storing scalar size in LLT (preferably something that allows easy extensions going forward).

Looks like we’re converging on a decision to add (extensible) fp types to LLT. Matt: any objections to this?

Amara

I thought about this a bit and I think adding separate LLTs is probably the right approach. We were ignoring the existence of ppcf128 anyway, so bfloat16 doesn’t really introduce a new issue. However, I do want to deviate from the IR and SelectionDAG’s treatment of integer vs. FP operations to preserve the current property GlobalISel has where integer operations are allowed to freely operate directly on FP values. As long as I’m not required to insert bitcasts to/from integer LLTs just to operate on the bits of an FP value, I’m OK with it. We would only consider these different types in floating point contexts, and they would implicitly behave as the equivalent sized integers elsewhere. The current intermediate bitcasts needed in various FP legalization code are quite annoying, and they’ve always been an obstacle to some combines.

-Matt

I realize I’m very late to this thread but while adding an extra set of opcodes is more in keeping with the existing design I’m inclined to agree that things like bfloat make that look like the wrong direction. I also want to keep most things using scalars rather than ints/floats though so that they’re specializations of scalar rather than completely independent types.