[RFC][GlobalISel] Encoding type information into FP operations

I’m bringing up an old idea I had because at the first GlobalISel sync up there was a question about whether this idea had been explored before, and my answer was that we seemed to converge on FP types ad-hoc. I’d like to revisit this one more time to hear thoughts before we invest time into a particular direction.

Background

GlobalISel was initially proposed circa 2015. It was a simpler time, when the phrase “could you pass me that 16 bit floating point value?” clearly meant it was IEEE-754 fp16. As a result, LLT didn’t make the distinction between integer and floating-point types. The type of the operation was inferred from the other properties of the value: size, vector/scalar and the opcode of the instruction, e.g. G_FMUL.

Why not extend LLT?

Extending LLT seems a natural thing to do, but doing so when we have years of development operating on plain scalars/vectors is going to be a long and error prone task. Not infeasible, but certainly arduous. The main issues I see are:

  • We need to be much stricter about how we deal with vreg values in general. No longer is it possible to look through copies or propagate values without checking each time that we don’t violate the type semantics. Of course we’ll have the verifier to do this, but I’m sure we’ll have lots of places that need fixing.
  • Related to the above, operations which previously didn’t need to care about types, such as merges, will have additional checks needed to propagate types to new values. This is additional cost for no benefit.
  • Transitioning is expensive. We can mitigate the cost by trying to work backwards from the selector and adding support for FP types incrementally, but it’s still a large amount of work.

Proposal: FP type operands

We can keep the current design of GISel mostly unchanged by making changes to a specifier set of operations that truly care about FP types. This shouldn’t be anywhere close to as risky of a change as FP types as it’s truly a more incremental extension rather than a complete overhaul of the MIR.

The simplest idea is to add an immediate value to all FP operations as the first input operand. This is effectively just an enum that specifies the particular FP format. Without some special handling in MIR printing this will look ugly but it should do the job. We’ll need to fix up places that deal with FP types, but these will be simple mechanical changes and, most importantly, errors will result in compile failures vs miscompiles. So FP operations will look like this:

%res:_(s16) = G_FMUL 5, %op1, %op2

where 5 is some encoding of the type format, or bf16 if we implement some pretty printing.

Another route may be a complete new MachineOperand kind that specially encodes this, which should be fairly cheap I think, but I haven’t looked into this option.

Finally, I’m assuming that adding a field to MachineInstr is out of the question due to memory impacts, but someone can correct me if I’m wrong.

Hi @amara,

Thanks for writing this up!

I hear the argument with transitions are expensive, but at the same time, I think it is important to think about the long term cost of not transitioning.

As it stands, I don’t have a good grasp on the implications of going with the FP type operands (I simply didn’t think about it), but there are a couple of things that, from the get go, look like potential pain points/hazard to me:

  1. We have a stacked way of checking the type: vreg type (LLT), opcode (integer vs. floating point), the FP type operands (actual FP type).
  2. Given the FP type operands are by design bringing “non-binding” contraints, it is possible to create implicit type conversions.

These two points are what worries me for the future health of the system.

I think we can see the complexity/inconsistence that #1 brings to the table right away (think about the implication on legalization, or any code that need to reconstruct a type). I’m just unclear if this is something we want to accept.

For #2, the problem is more subtle. We can add checks in a verifier to forbid non-bitcast implicit conversions, but this creates other problems:
A. these implicit conversions may need to be materialized somewhere if they actually need to be backed up by instructions (e.g., a regclass copy), so we’re effectively pushing the complexity elsewhere.
B. how should we verify target specific operations?

I like the separation between the size of what a register type hold and how this register is being used. I haven’t though about it much, but maybe this is something we could generalize?

Anyhow, not arguing for either solution at this point, just sharing my initial gut feeling about the approach. I’m sure you already thought about this, so looking forward for your clarifications :slight_smile: .

Cheers,
-Quentin

Essentially FP semantic is a property of operand, not instruction and passing it as an operand looks more like a workaround. It can create problems when operands may be of different types. Operations like conversions, multiply-and-accumulate, reductions may have different types for sources and results. In this case an instruction needs two semantic operands and in general it is not possible to deduce, which semantic describes a particular operand.

This way is also invasive, - it requires change of all existing FP instructions. It is not obvious that these changes are smaller than modification of LLT.

As we want to avoid large changes made at once, we need some solution that would allow gradual development of FP type support. A potential solution is to use three category of types instead existing scalar type:

  • “scalar type”, - a type that is characterized by size only, it is almost the same as the existing scalar type.
  • “integer type”, - a scalar type for integers,
  • “floating type”, - a scalar type for floating-point values. Such type keeps an associated value that represents FP semantic.

It is important that the last two categories are “scalar types”, they may be used where generic “scalar type” is expected.
Than we can add MIR transformations that take into account particular FP semantic of operands. For other parts of the selector that are unaware of FP semantic these FP values would be usual scalar values as they are in present. Such solution should allow incremental development of FP semantic support in GlobalISel.

Inability to represent different formats of same-size FP value is a serious drawback of GlobalISel. In DAG selector MVT can be extended with new types and such problem does not exist. In GlobalISel there is no solution.

I don’t think extending LLT would be as bad as it sounds. I think there should be a serious attempt at it before immediately giving up on it

I think the impact of a MachineOperand is pretty high

Right, this does worry me a bit. I thought one of the advantages of LLT was that if you write (say) a selection pattern for an s32 load, it works for both integer and floating point. You don’t need to repeat it for i32 and f32 like you do in SelectionDAG.

(Actually I guess this has never been completely true, since you do need to repeat the pattern for other LLTs with the same size like v2s16.)

If we do extend LLT with floating point types, what remaining advantages (or even differences) does it have over MVT?

Arm learned this year two 8-bit floats: E5M2 and E4M3. I still need to legalize G_BUILD_VECTOR for both of them. I could fill the lanes with G_FCONSTANT of both of them.

In RegBankSelect, I can distinguish between integers and floats for optimization. The kind of the 8-bit float is less important there.

I thought one of the advantages of LLT was that if you write (say) a selection pattern for an s32 load, it works for both integer and floating point.

Just to react on that.

It was always a goal for me that instructions that don’t care about the actual type wouldn’t need to specify them. More precisely, the idea was that the size should be enough for such instructions (e.g., load, store, merge, unmerge, bitwise-ops, etc.)

Now, in practice, I believe that we backed the actual type everywhere and this property actually doesn’t hold with today’s code base. For instance, I would have expected that the legality of AND would be expressed in terms of sizes, but in practice it is expressed with types.

Anyhow, just a digression!

+1 on extending LLT doesn’t sound that hard.

In terms of adoption, we can go gradually:

  • Introduce the floating point types
  • Backends start to adopt it
  • Opt-in for floating point types checks in the verifier
  • Transition over => verifier checks on floating point types mandatory

@bogner didn’t you have a sketch of a patch for introducing floating point types in LLT?
@TNorthover, I believe you looked at that at some point too.

(Could be wrong though, and in that case, sorry for the noise!)

Yes there is a patch attached to RFC: [GlobalISel] Representing fp types in LLT

The enum for floating point kinds was discussed before, but there was no success.

https://reviews.llvm.org/D150605

This was specifically the concern brought up at the sync up by @davemgreen. In particular that as new variants of FP types are added to the SDAG backend, sometimes there’s holes in functionality because patterns which need to be duplicated for each FP type have been missed.

I think this use case is an important one to get right with FP types. Having to duplicate patterns when the underlying semantics of the bits don’t matter would be a step backwards. That said, I don’t see why it would be particularly hard to support this use case for selection. We could have a new union type, like <8 x a16> where the a denotes “any” form of scalar. Or re-use the existing <4 x s16> syntax but we’d need to then introduce something for integers, like i32.

LLT still represents pointer types.

I don’t think that the situation would be any different to today. We already don’t verify target specific opcodes, and we simply don’t look into the semantics of an FP value at all. Adding an enum to FP operations wouldn’t impact the semantics of copies etc because as long as the operations which care about semantics know which one they are, things should just work.

Fundamentally, I understand that there were particular design philosophies and assumptions that motivated the current system. IMO if we’re going to abandon the current design, we should think about whether those original assumptions are no longer valid and if we’re ok with making the trade-offs.

My current stance is leaning towards FP types if we have a feasible solution to the issue of redundant patterns for bit-semantics agnostic operations.

Also, one other argument in favor of FP types is that RegBankSelect, at least for AArch64, will become much simpler than today. We’ll no longer have to jump through hoops to guess the provenance of a value.

I’m not especially familiar with FP types internals and I have no strong opinion on this topic, so I’m just giving my thoughts after reading this topic, as a contributor to the Combiner infrastructure.

(On adding the information to instructions) We don’t have to think of the FP semantic as an operand. It could just be an attached “FP Type Profile”. e.g. %y = G_FADD %a, %b :: fp(f32, f32, f32) (like the fp flags but fancier) to describe operands. If that’s a uniqued array of types it’s just an additional pointer in the MI’s ExtraInfo so it’s basically free I think?
We shouldn’t be afraid to be creative there.

(On adding the information to LLTs) There is a good case for doing things in the LLT directly - we already have pointers separated from other scalar types. It’d also make sense to have FP separated as it’s grown in complexity and bitwidth is no longer enough to describe a FP type (much like bitwidth isn’t enough to describe a pointer, you need its AS).

On a personal note, I’m generally a fan of keeping type systems simple in IRs and encoding more data into instructions. However I know reality is often disappointing so my opinion here is that if we choose to put things in LLT, we should document somewhere why we went in that direction (e.g. in a .rst doc or something)

In this particular example I think it’s “”“just”“” an infrastructure problem. A lot of GlobalISel stuff is piggybacking on top of DAG (+Target-specific MIR) infrastructure .
e.g. I think G_AND legality is determined by (ins type0:$src1, type0:$src2) because it’s re-using Instruction & all its attached logic.