[RFC][GlobalISel] Adding FP type information to LLT

TLDR: We propose modifying LLT by introducing separate integer and floating point kinds to enable non-IEEE floating point types for instruction selection and register bank selection.

Background:

GlobalISel is currently unable to represent the multitude of different floating point types available on modern hardware like BF16, TF32, and FP8 (E5M2 / E4M3).
GlobalISel uses LLTs to represent type information of virtual registers. LLTs only capture the kind, e.g. scalar, pointer, or vector, and the size or shape during IR translation from LLVM IR to gMIR.
Information about the concrete floating point type is inferred from the operation used and the size of the operands.
For example, a G_FMUL on two S32 values implies that the operands must be 32-bit IEEE floating point numbers. This way of deducing the actual type fails, however, with the advent of new floating point types like BF16, for example.

Information about the concrete floating point type is both relevant for register bank selection and instruction selection. Restoring this information is costly and often lossy. On AArch64 for example, this lost information is restored by walking uses/defs in order to guess which virtual register may hold a floating point value.
There is currently no way to restore the actual floating point type. Whenever the IRTranslator reaches a BF16 value in LLVM IR, we just bail out. [GlobalISel] Fall back for bf16 conversions. by aemerson · Pull Request #71470 · llvm/llvm-project · GitHub

Proposed changes:

This proposal is loosely based on @bogner’s original RFC with some slight modifications, mainly to ease adoption:

  • Replace IsPointer, IsVector and IsScalar with a new kind enumeration that allows for 4 new kinds: FLOAT, INTEGER, VECTOR_FLOAT, VECTOR_INTEGER.
  • Keep SCALAR and VECTOR_SCALAR kinds for incremental adoption
  • Add Pass that drops integer and float kinds back to scalar kinds for incremental adoption.
  • Type conversion between scalar and float / integer is not legal. Passes should either only use scalar kinds or only use integer / float kinds.
  • Conversion between integer and float kinds requires G_BITCAST.

We propose reusing the 3 bits currently used to encode the kind of LLT (IsScalar, IsPointer, IsVector) more efficiently to encode a total of 8 different LLT kinds: POINTER, INTEGER, FLOAT, SCALAR, VECTOR_POINTER, VECTOR_INTEGER, VECTOR_FLOAT, VECTOR_SCALAR. We can additionally use the fact that RawData will never be zero for the above kinds as an extra 4th bit to encode some additional kinds like INVALID, TOMBSTONE, EMPTY, and TOKEN.

enum class Kind : uint64_t {
    POINTER = 0b000,
    INTEGER = 0b001,
    FLOAT = 0b010,
    SCALAR = 0b011,
    VECTOR_POINTER = 0b100,
    VECTOR_INTEGER = 0b101,
    VECTOR_FLOAT = 0b110,
    VECTOR_SCALAR = 0b111,
  };

We add a 2-bit field to LLTs with the VECTOR_FLOAT or FLOAT kinds which will be used to indicate the type of floating-point number. We do not aim to exactly represent floating-point semantics, which is why we decided to just use 2 bits to represent IEEE floats and 3 other floating-point variants. Each backend may choose how to map scalar sizes together with the floating-point info to actual floating-point types. This design could be simplified by making this mapping global at the expense of some flexibility / number of total FP types we can represent.

 enum class FPInfo {
    IEEE_FLOAT = 0x0,
    VARIANT_FLOAT_1 = 0x1,
    VARIANT_FLOAT_2 = 0x2,
    VARIANT_FLOAT_3 = 0x3,
  };

We aim for incremental adoption of these new LLT kinds, which can be toggled by a command-line option at runtime. The SCALAR and VECTOR_SCALAR kinds remain for compatibility and are only to be used by backends/passes that have not yet enabled floating-point information. All other backends should use INTEGER or FLOAT instead of SCALAR and VECTOR_INTEGER or VECTOR_FLOAT instead of VECTOR_SCALAR.

To ease incremental adoption, we would like to first convert single passes to use FPInfo inside of tests only. Later on, we aim to integrate everything by enabling FPInfo for a range of passes beginning with the IRTranslator by introducing a pass that would drop types from integer / float back to just scalar.

To ensure consistency in the IR and to allow register bank selection to easily determine where to insert G_COPY instructions between different register banks, we require G_BITCAST instructions whenever an integer LLT is used in a floating-point instruction or vice versa.

Patch: gist:9902f652792ea26ce15aa46c6692fce7 · GitHub

The above patch details changes to LLT. If we can agree on a path forward in this RFC we will follow up with a PR for AMDGPU using the new LLT kinds.

Previous RFCs:

cc @bogner @arsenm @amara @qcolombet @jayfoad

2 Likes

What is the textual representation of the proposed float LLTs:

Do you want to support the LLVM-IR float types or something closer to to APFloatBase::Semantics:

1 Like

What is the textual representation of the proposed float LLTs:

For the textual representation in MIR I see 3 options:

  1. Global mapping
    • there may not be enough variants
    • easier to implement
  2. Target local mapping
    • more variants
  3. Enumerate FPInfo variants
    • might be confusing / hard to come up with a good prefix

A local mapping seems like the best option since a global mapping might require us to be more specific about the implied semantics which is hard due to the limited number of variants we can represent.

Do you want to support the LLVM-IR float types or something closer to to APFloatBase::Semantics:

I am not sure if we have already decided on whether it makes sense to have an LLVM IR Type for each possible floating point format used in the backend. Assuming it does not make sense we probably want this mapping to represent something closer to APFloat semantics rather than LLVM IR types.

The IRTRanslator has to be able to translate all LLVM-IR types (ppc_fp128, bfloat) to gMIR.

If we want to add support for the new F8 types, who is going to generate MIR with them? It is not the IRTranslator.

Hi @tschuett ,

If we want to add support for the new F8 types, who is going to generate MIR with them? It is not the IRTranslator.

It is, but what Tim is saying is how these are translated could be target specific. IIUC. For instance, the default float 16-bit type could mean different things for one backend and the other. For instance, I’ve seen backends that today uses float 16-bit as bf16, not fp16.

To step back a little bit, the problem we have today is that the LLVM IR doesn’t even support the FP8 types, so I would rather not push the design too far at the Machine level before a solution is decided at the LLVM IR level.

Specifically, when the LLVM IR supports FP8 types, I can well see that we just wrap the LLVM IR type in LLT instead of coming up with a fancy way to represent it.

Cheers,
-Quentin

1 Like

I totally agree. LLVM-IR can distinguish between i16, f16, and bf16. We should teach LLT the same.
https://llvm.org/doxygen/classllvm_1_1Type.html#a5e9e1c0dd93557be1b4ad72860f3cbda

There will be many benefits including in legalization and regbankselect.

At a high level my only concern is that I don’t want this to make it harder to handle operations like loads, stores, selects etc which only care about the size of a type and not its interpretation. This is a constant pain with SelectionDAG where lots of C++ and TableGen code has to remember to handle e.g. all 64-bit types instead of just i64 (… and did you remember v4f16???).

we will follow up with a PR for AMDGPU using the new LLT kinds

Cool! That should really help to see what it will look like in practice.

As for the low-level details…

Instead of a 2-bit FPInfo with target-specific interpretation for each bit width, why not have a (say) 8-bit field that can represent any of the FP types supported by IR? You could even use the values from Type::TypeID directly. (If I understood Quentin correctly, he is saying that there is no need to represent FP types that are not supported by IR.) There would be a constraint that the fp type field and the scalar size field must agree, e.g. if the type is bfloat then the size must be 16.

(You could even get more ambitious and set this type field to IntegerTyID or PointerTyID instead of putting that information in your new Kind field. But maybe that’s taking it too far.)

Incidentally there are other things that could be cleaned up in the bit layout of LLT. E.g. the scalar size field has four different positions depending on the type of LLT, instead of just one or two, and when it is 32-bit it straddles the low and high halves of the 64-bit LLT which is pretty ugly.

It is actually the opposite. For most CPUs, we have to regbankselect loads to GPR and FPR. Without being able to distinguish between ints and floats, it becomes a hard problem and we and up with copies.

Selects are one of favourite arguments for FP types. We cannot distinguish between i16, f16, and bf16. If you have a target where i16 are illegal, we have to legalize the selects blindly with s16.

If I understood Quentin correctly, he is saying that there is no need to represent FP types that are not supported by IR

What I’m saying is a bit different.

There may be some need to support types that are not yet supported by the IR but I don’t expect that to be particularly useful. (These may appear because of intrinsics for instance if a backend wants to go ahead and do fancy things with them.)

The important point was that I would rather not good too far in supporting the types that are not yet supported by the IR because we risk to have a divergent path here. Ultimately having the same underlying representation between LLVM IR and (G)MIR would be great (that being TypeID or something else.)

I think this is orthogonal as in we can always interact with these types on a size basis (like what you describe legalIf in the legalizer, it can be purely size-based). We just need to provide some facilities for that.
By default, you would need to work with the full types though.

Anyway, any this is orthogonal IMHO.

Probably should make integer value 0 (and explicitly mark the scalar cases as deprecated?)

These should be explicit in what the type is (i.e. this must state bfloat, not “variant”). I agree we should not be trying to support types outside of the existing high level IR types. I also envisioned the encoding slightly differently. Instead having an explicit float and integer type, we would maintain scalar as-is and add the FP info discriminator bits.

This is just wrong. The IR has no concept of choose your own type, there’s always a concrete interpretation. LLT::scalar(16) unambiguously means llvm half. Any target doing something different is broken.

The type is still a poor proxy for the regbank, that only mostly works for CPU targets. The problem is regbankselect was never actually implemented. It isn’t attempting to be intelligent.

You’re not wrong, but it makes me smile because this particular hack comes from the AIE backend (AMD :slight_smile: GitHub - Xilinx/llvm-aie: Fork of LLVM to support AMD AIEngine processors).

My worry with being opinionated is given that we have a limited number of types that we can represent, some backend will not be able to use what they want. (How many FP8 types can we create still x)).

Maybe I’m overthinking it and we can just be opinionated.

If you don’t have a common definition for a type or operation, no generic code can usefully do anything with it.The infrastructure needs to provide the union of cases anybody would want to use. Otherwise you’ll just end up growing some side channel of information at every use point, and are restricted in how you can use it.

2 Likes

Good point.

I’m not sure what’s orthogonal to what. The proposal introduces a distinction between integer and floating point LLTs. I assume this means I will no longer be able to write:

getActionDefinitionsBuilder(G_SELECT)
    .legalFor({S32, S32})

… since there’s no longer a single S32 type. I’d like some idea of what I will have to replace it with.

Some more high level thoughts: I understand the impetus to do this, in order to fix the fundamental problem that GMIR cannot distinguish fp instructions on different types with the same bit width, like half and bfloat.

My worry is that it will be used (is already being used) as an excuse to overturn the GlobalISel design decision that handling separate integer/fp register banks is a job for RegBankSelect, and not for types in the GMIR. (Maybe we will want to overturn that descision, but I am pretty sure we have not reached consensus on that yet.)

More concretely, I am a bit worried about the GMIR becoming cluttered with things like bitcasts between integer and fp types, which complicate the IR and get in the way of pattern matching with zero benefit to targets that do not have separate integer/fp register banks (of course I am thinking of AMDGPU here).

Is there still time to examine the alternative approach of attaching fp types to the fp instructions in GMIR, instead of putting it on the registers?

2 Likes

Correct.

The reason I’m saying this is orthogonal is because this type of things are still possible with the new LLT, for instance you could do something like (in pseudo code):

legalIf([=](QueryType&Q) { return sizeInBit() == 32 && !vector()})

I guessing there are ways to make this easier (e.g., introduce a legalForScalar(32)), but again I don’t think this changes how LLT are designed.

I’m not seeing that and I would argue that RegBankSelect job is much more than that (although today it is not).
RegBankSelect was supposed to do scalarization for instance, based on a the cost of the different instructions used after a cross bank copy.
In any case, people use the infrastructure the way they see fit and if people want to use types in this way, this is fine by me.

This has been discussed already (see the RFCs that @tgymnich listed) and the consensus was that we would augment the LLT.
We have bit casts for integer to vector, why would floating points be different?

This is a rhetorical question, all the conversations we had on this FP problem invariably landed on bit casts being the better solution.

Personally I would rather not rediscuss this.

OK.

Firstly, it is surprising how often these kinds of RFCs come up and nothing happened yet.

If we have a richer type system with integers and f16/bf16, we make the life easier for other passes down the pipeline, (regbankselect). The difference is between a G_LOAD(s64), G_LOAD(i64), and G_LOAD(f64). The latter should put less load on regbankselect and maybe we get less copies.

I am in favour of adding the LLVM-IR fp types to LLT.