We are working on porting X86 backend from SelectionDag to GlobalISel. This is about a problem relating to mapping of load and store operations to the correct data type.
On X86, when we have a load/store operation, we do not know in register bank selection if we should map it to a float to an integral value. We conservatively map it to integer as the first choice and insert cross register class copies when it is used as a floating-point type. We then will have to fix the load/store in later passes.
This looks like a general issue any architecture would face.
A summary of the discussion on the old RFC:
It seems they converged on adding fp type to LLType. However I don’t think it was implemented. Please correct me if I am wrong.
Another suggestion was new GMIR opcodes for fp load/stores similar to G_ADD vs G_FADD. This was rejected.
I see that AARCH64 implements 2 solutions for load
Looking into MachineMemoryOperand to derive the type to load. This doesn’t work in the presence of opaque pointers.
Traversing the users of the load to see if it maps to a floating point. They prioritize fp mapping if true. A thing to note here is that the users are not yet assigned register bank and so the inference is only based on opcodes.
For store, they look at the type of the defining MachineInstr
RISCV has similar solution.
I myself do not have better ideas to implement this. I don’t see a reason why the8 traversal based solution on Aarch64/RISCV solution wouldn’t work in the majority of cases. The only shortcoming I see is the compile time impact due to multiple traversals of use-def chains.
The questions I have are
Are there cheaper ways to capture the float/int type information? As discussed in the referred RFC, It seems that we have this information in the IR, we lose it in IRTranslator and we recreate in RegisterBankSelect. Please correct me if I am wrong.
If not, should the aarch64 method be “generalized” i.e. RegisterBankSelect have virtual functions for onlyDefinesFP, onlyUsesFP and maybe traverseDUToInferType ?
I haven’t looked at this closely, but why doesn’t this work with opaque ptrs? The load/store in IR have the type available independent of the pointer.
I am not the best person to answer this but I’ll provide my understanding.
MachineMemOperand (MMO) only holds information about the location where we load from/store to. So it contains a pointer to the base memory location with byte size and offset. There is an associated type (LLT) to infer the scalar vs vector information.
The load/store in IR are associated with the data, not the location. So MMO is not passed (or concerned with) this information.
Isn’t this the kind of problem that you’re supposed to solve by defining different register banks for integer and floating point, and letting RegBankSelect sort it out?
I think you mean the greedy mode of RegBankSelect? I haven’t worked with greedy mode yet and I don’t know if it always accurately recreates the correct type.
I have 2 concerns:
We are recomputing (expensively) information we had in the IR.
We also have a default assignment (fast mode only has the default assignment) where RegBankSelect won’t attempt to sort out based on copy costs.
Another thing that I would mention is in terms of design, one thing that was mentioned by several people is we don’t want to have to add a bunch of bitcasts everywhere or have to tell something like a load is valid for i16, bf16, fp16. It is valid for a size, period.
One way we could do that (I haven’t given it much thoughts though) is by keeping the scalar type but having a specialized version of it (The idea comes from a comment that @dsanders made in the past). E.g., s16 is the based type, but it can be specialized in i16, bf16, fp16.
E.g.,
v1(bf16) = G_LOAD
v2(i16) = G_LOAD
v3(bf16) = G_FCONSTANT 1.0
v4(bf16) = G_FADD v1(bf16), v3(bf16) # G_FADD requires a typed scalar with a floating-point semantic
v5(s16) = G_OR v4(bf16), v2(i16) # this is okay G_OR accepts s16 kind of type (the base type is the same)
v6(bf16) = G_BITCAST v5(s16) to bf16 # re-qualify v5's type.
I don’t know if it is something we’d like as the end state, but this may help the transition. The issue I see is it makes RegBankSelect potentially more important/complicated since it will potentially have to break down things like:
a(bf16) = G_OR b(bf16), c(i16)
The BITCAST-everything road is well understood but I heard several times that people did not want that because it becomes too verbose.
The problem is that during RegBankSelect we may not have all the information required to assign a certain register bank. That’s the reason why we bring this problem up after 4 years. We want to understand whether we need to follow the current approach of inspecting def-use chains or we want to provide extra info to simplify RegBankSelect process for all backends.
Interesting, it was my first concern, how much different float formats we want to support and how to provide an opporunity for backends to support them incrementally. So, we may introduce a enum field or inherit LLT and backends may ignore it. But it seems we have to introduce i16 and fp16 simultaneusly, to be able to distinguish them.
Is it correct that these types exist only between IRTranslator and RegBankSelect and have no impact on SelectionDAG pattern matching? It seems so but maybe I’m missing something.
I’d like to start from adding float types for loads and stores on X86 but such approach may deteriorate into something unclear when I’m only concentrated on several opcodes. So @msanghi and I are inclined to repeat def-use approach for the beginning.
Hmm, yeah, we may end up with operands from incompatible register banks in scope of a single instruction. So, we’ll have cross register bank copies in other words – casts. Or it will be extra efforts for combiners to clean them up.