GlobalISel: Ambiguous intrinsic semantics problem

Hi GlobalISel interested parties,

A recent bug report (https://bugs.llvm.org/show_bug.cgi?id=40968) on AArch64 exposed a problem with our modeling of intrinsic semantics when dealing with type overloaded calls. The crux of the matter is that because GlobalISel’s LowLevelTypes only carry size and vector layout information, and not any information about whether a type is integer or fp, we lose information during IR translation of type overloaded intrinsics.

Most of the time, we don’t run into this issue because the generic opcodes distinguish between fp and int operations, and for most intrinsics the regbank can normally disambiguate the operation (e.g. the source operands are assigned either gpr or fpr banks). However, this isn’t enough in the case of vectors on arm64, which can contain both fp and int elements, and if target intrinsics have been defined to use their type overloading to determine the semantics, then we can select the wrong instruction.

Some approaches to fix this:

  1. We can try to propagate the missing information through the G_INTRINSIC instruction. We could do this either through some additional storage in MachineInstr, or through an additional operand to store type information (a single bit per operand should suffice). These come with the drawbacks of either increasing the size of MachineInstr, or breaking existing code that deals with intrinsics by adding a new operand somewhere.
  2. Change the LLT types in GISel to distinguish between fp and integer types. This seems like the most natural fix in that we match the IR type system more closely, but it also has very significant impacts on the existing GISel codebase. An fp “flag” on a type that only affects types used by intrinsics sounds tempting, but it doesn’t work when we have to propagate those types through other users and copies. This means that if we decide to add an fp variant, we have to change the entire design of GISel to now handle both integer and fp types, whereas before we only cared about their size/vector structure.
  3. Somehow do target intrinsic translation so that any ambiguous operations get split into sub-intrinsics to ensure they’re distinct. This approach has the disadvantages of requiring manual handling of these intrinsics, and that means that we can’t handle any ambiguous intrinsics automatically through the current pattern importer or the future GISel-native pattern selector. We could potentially detect these by re-using the importer to do analysis and emit warnings when two different output instructions map to the same input pattern. This is more of a workaround rather than an actual fix of the underlying problem.

Any strong preferences for a route forward?

Thanks,
Amara

We could also just change the intrinsic at the IR level so it isn’t ambiguous. It’s confusing to have an intrinsic that does either an integer add or a floating-point add depending on the type, anyway. Is it just llvm.aarch64.neon.addp that has this issue?

-Eli

Although that would be my preference too, I wonder how well that scales.
E.g., let say we have an intrinsic with m operand and each operand has 2 variants. If all the variants are valid we need 2^m different opcodes.

I expect we don’t have that freedom most of the time (two operands are generally not independent type wise), but if a target ever has, we basically screw them.

We could just wait for this to happen though and I am happy with making the intrinsic non-ambiguous :P.

I’m pretty strongly opposed to #2. I think the more relaxed type system is one of the advantages over SelectionDAG, and want to avoid the mess of bitcasts we have to insert now to satisfy the artificial type constraints. The original intrinsic seems misspecified to me, and should be spilt into separate FP/int versions.

-Matt

Matt: that’s fair. We’re generally apprehensive of option 2 as well.

Eli: Yes, currently we believe that aarch64.neon.addp is the only arm64 one affected, but we don’t know how prevalent this is on other targets. Splitting it is certainly possible combined with the autoupgrader.

If disambiguating the intrinsics is the preferred solution, then I think we should also have the langref also specify that there’s no guarantee of correction codegen solely on the basis of int/fp type overloading for intrinsics. What do you think?

Amara

I think it’s fair to say that overloading int vs. fp shouldn’t change the semantics of an operation. Not sure where we would document it though… it’s sort of a meta-rule for constructing intrinsics, rather than a rule about any specific intrinsic. I guess we could stick it into the paragraph on intrinsic overloading.

That said, thinking about it a bit more, even if we fix the AArch64 intrinsic, you might run into a related issue eventually, though… specifically, on PowerPC, you’ll need to eventually figure out some way to distinguish fp128 intrinsics from ppc_fp128 intrinsics.

-Eli

I think it’s fair to say that overloading int vs. fp shouldn’t change the semantics of an operation. Not sure where we would document it though… it’s sort of a meta-rule for constructing intrinsics, rather than a rule about any specific intrinsic. I guess we could stick it into the paragraph on intrinsic overloading.

Yes, at least having some kind of documentation on what’s expected to work would be helpful.

That said, thinking about it a bit more, even if we fix the AArch64 intrinsic, you might run into a related issue eventually, though… specifically, on PowerPC, you’ll need to eventually figure out some way to distinguish fp128 intrinsics from ppc_fp128 intrinsics.

I think for those kinds of types, if they’re ever supported, we’d have to introduce distinct LLTs for them.

Amara