[GISel][X86] FP vs Int distinction for load/store

msanghi · March 7, 2024, 4:57am

Hi,

We are working on porting X86 backend from SelectionDag to GlobalISel. This is about a problem relating to mapping of load and store operations to the correct data type.

On X86, when we have a load/store operation, we do not know in register bank selection if we should map it to a float to an integral value. We conservatively map it to integer as the first choice and insert cross register class copies when it is used as a floating-point type. We then will have to fix the load/store in later passes.

This is a known problem and I see it has been discussed in the past:
[llvm-dev] RFC: [GlobalISel] propagating int/float type information

This looks like a general issue any architecture would face.

A summary of the discussion on the old RFC:

It seems they converged on adding fp type to LLType. However I don’t think it was implemented. Please correct me if I am wrong.
Another suggestion was new GMIR opcodes for fp load/stores similar to G_ADD vs G_FADD. This was rejected.

I see that AARCH64 implements 2 solutions for load

Looking into MachineMemoryOperand to derive the type to load. This doesn’t work in the presence of opaque pointers.
Traversing the users of the load to see if it maps to a floating point. They prioritize fp mapping if true. A thing to note here is that the users are not yet assigned register bank and so the inference is only based on opcodes.

For store, they look at the type of the defining MachineInstr

RISCV has similar solution.

I myself do not have better ideas to implement this. I don’t see a reason why the8 traversal based solution on Aarch64/RISCV solution wouldn’t work in the majority of cases. The only shortcoming I see is the compile time impact due to multiple traversals of use-def chains.

The questions I have are

Are there cheaper ways to capture the float/int type information? As discussed in the referred RFC, It seems that we have this information in the IR, we lose it in IRTranslator and we recreate in RegisterBankSelect. Please correct me if I am wrong.
If not, should the aarch64 method be “generalized” i.e. RegisterBankSelect have virtual functions for onlyDefinesFP, onlyUsesFP and maybe traverseDUToInferType ?

Tagging @amara @arsenm @davemgreen @topperc @tschuett @qcolombet @e-kud for inputs.

Thanks,
Malay

topperc · March 7, 2024, 5:05am

I haven’t looked at this closely, but why doesn’t this work with opaque ptrs? The load/store in IR have the type available independent of the pointer.

msanghi · March 7, 2024, 6:03am

I am not the best person to answer this but I’ll provide my understanding.

MachineMemOperand (MMO) only holds information about the location where we load from/store to. So it contains a pointer to the base memory location with byte size and offset. There is an associated type (LLT) to infer the scalar vs vector information.

The load/store in IR are associated with the data, not the location. So MMO is not passed (or concerned with) this information.

topperc · March 7, 2024, 6:20am

Oh right. I was thinking of SelectionDAG where we also store MemoryVT separate from MachineMemOperand. MachineMemOperand only has LLT as you noted.

qcolombet · March 7, 2024, 9:55am

This is correct and I believe it is still the right thing to do.
I think the problem is that nobody had the time to go implement it.

jayfoad · March 7, 2024, 10:36am

Isn’t this the kind of problem that you’re supposed to solve by defining different register banks for integer and floating point, and letting RegBankSelect sort it out?

msanghi · March 7, 2024, 10:46am

I think you mean the greedy mode of RegBankSelect? I haven’t worked with greedy mode yet and I don’t know if it always accurately recreates the correct type.

I have 2 concerns:

We are recomputing (expensively) information we had in the IR.
We also have a default assignment (fast mode only has the default assignment) where RegBankSelect won’t attempt to sort out based on copy costs.

amara · March 7, 2024, 10:26pm

Yes we do need to add FP types to LLT. Matt IIRC already started laying a bit of groundwork but the main part needs a lot of time.

qcolombet · March 8, 2024, 10:30am

Another thing that I would mention is in terms of design, one thing that was mentioned by several people is we don’t want to have to add a bunch of bitcasts everywhere or have to tell something like a load is valid for i16, bf16, fp16. It is valid for a size, period.

One way we could do that (I haven’t given it much thoughts though) is by keeping the scalar type but having a specialized version of it (The idea comes from a comment that @dsanders made in the past). E.g., s16 is the based type, but it can be specialized in i16, bf16, fp16.

E.g.,

v1(bf16) = G_LOAD
v2(i16) = G_LOAD
v3(bf16) = G_FCONSTANT 1.0
v4(bf16) = G_FADD v1(bf16), v3(bf16) # G_FADD requires a typed scalar with a floating-point semantic 
v5(s16) = G_OR v4(bf16), v2(i16) # this is okay G_OR accepts s16 kind of type (the base type is the same)
v6(bf16) = G_BITCAST v5(s16) to bf16 # re-qualify v5's type.

I don’t know if it is something we’d like as the end state, but this may help the transition. The issue I see is it makes RegBankSelect potentially more important/complicated since it will potentially have to break down things like:

a(bf16) = G_OR b(bf16), c(i16)

The BITCAST-everything road is well understood but I heard several times that people did not want that because it becomes too verbose.

e-kud · March 18, 2024, 4:19pm

The problem is that during RegBankSelect we may not have all the information required to assign a certain register bank. That’s the reason why we bring this problem up after 4 years. We want to understand whether we need to follow the current approach of inspecting def-use chains or we want to provide extra info to simplify RegBankSelect process for all backends.

e-kud · March 18, 2024, 4:30pm

Interesting, it was my first concern, how much different float formats we want to support and how to provide an opporunity for backends to support them incrementally. So, we may introduce a enum field or inherit LLT and backends may ignore it. But it seems we have to introduce i16 and fp16 simultaneusly, to be able to distinguish them.

Is it correct that these types exist only between IRTranslator and RegBankSelect and have no impact on SelectionDAG pattern matching? It seems so but maybe I’m missing something.

I’d like to start from adding float types for loads and stores on X86 but such approach may deteriorate into something unclear when I’m only concentrated on several opcodes. So @msanghi and I are inclined to repeat def-use approach for the beginning.

qcolombet:

The issue I see is it makes RegBankSelect potentially more important/complicated since it will potentially have to break down things like:
a(bf16) = G_OR b(bf16), c(i16) 
The BITCAST-everything road is well understood but I heard several times that people did not want that because it becomes too verbose.

Hmm, yeah, we may end up with operands from incompatible register banks in scope of a single instruction. So, we’ll have cross register bank copies in other words – casts. Or it will be extra efforts for combiners to clean them up.

Topic		Replies	Views
RFC: [GlobalISel] propagating int/float type information LLVM Dev List Archives	16	281	June 29, 2020
[RFC][GlobalISel] Adding FP type information to LLT Common CodeGen Infrastructure globalisel	40	973	September 11, 2025
[RFC][GlobalISel] Encoding type information into FP operations LLVM Project	13	562	November 23, 2023
[GlobalISel] Legalize generic instructions that also depend on type of scalar, not only scalar size LLVM Dev List Archives	3	110	September 28, 2018
Why can't atomic loads and stores handle floats? LLVM Dev List Archives	6	125	May 27, 2014

[GISel][X86] FP vs Int distinction for load/store

Related topics