Complex types and FPMathOperator

I’ve got a problem with the way that FPMathOperator determines which call instructions should be considered this type of operator. Specifically, there are cases where it doesn’t recognize calls to the standard math library complex calls as FPMathOperator calls, and so fast-math flags can’t be attached to these calls.

Consider the following code:

double complex foo(double complex x) {
  double complex ex = cexp(x);
  return ex + x;
}

float complex bar(float complex x) {
  float complex ex = cexpf(x);
  return ex + x;
}

The LLVM IR representation of this varies with ABI, so it’s not consistent across targets. On x86-64 Linux, it’s relatively simple. If I compile with “-O2 -ffast-math” I get this:

define dso_local { double, double } @foo(double noundef nofpclass(nan inf) %x.coerce0, double noundef nofpclass(nan inf) %x.coerce1) local_unnamed_addr #0 {
entry:
  %call = tail call { double, double } @cexp(double noundef nofpclass(nan inf) %x.coerce0, double noundef nofpclass(nan inf) %x.coerce1) #3
  %0 = extractvalue { double, double } %call, 0
  %1 = extractvalue { double, double } %call, 1
  %add.r = fadd fast double %0, %x.coerce0
  %add.i = fadd fast double %1, %x.coerce1
  %.fca.0.insert = insertvalue { double, double } poison, double %add.r, 0
  %.fca.1.insert = insertvalue { double, double } %.fca.0.insert, double %add.i, 1
  ret { double, double } %.fca.1.insert
}

define dso_local nofpclass(nan inf) <2 x float> @bar(<2 x float> noundef nofpclass(nan inf) %x.coerce) local_unnamed_addr #2 {
entry:
  %call = tail call fast nofpclass(nan inf) <2 x float> @cexpf(<2 x float> noundef nofpclass(nan inf) %x.coerce) #3
  %0 = fadd fast <2 x float> %call, %x.coerce
  ret <2 x float> %0
}

Notice that in bar() the call to cexpf() is marked with fast-math flags, but in foo() the call to cexp() is not. (I’m particularly interested in the ‘afn’ flag here, BTW.) We recognize cexpf() as an FPMathOperator() because it returns a vector of floats. However, we do not recognize cexp() as an FPMathOperator because it returns a structure with two doubles.

Obviously, it would be easy enough to update FPMathOperator to also accept calls which return structures with two floating point types or vectors of such structures as fp math operators. The problem is that on x86-64 Windows double complex is returned via an sret argument as a pointer to a structure with two doubles, and (much worse) complex float is returned as an i64.

So, I’m looking for a good way to handle this. The possibilities I have thought of so far are:

  1. Introduce new intrinsics for standard math library complex calls and require front ends to use these when it wants the call to have fast-math flags
  2. Introduce new attributes that the front end can use to indicate when a parameter or return type is representing a complex type or some component thereof
  3. Make first class types for complex floating point

(Yes, I know the history behind the third option in that list, but it would solve this problem, so I put it there.)

Option 1 above is a bit of a problem, because the front end handles setting up the ABI for arguments and return types, so we’d either need variations of the intrinsic to handle all possible ABI representations, or we’d need to teach back ends to generate the ABI-compliant calls when lowering these intrinsics.

Option 2 is also not without difficulty because in some cases the real and imaginary components of a complex value are passed as separate arguments. That’s not a deal-break, but it would require parameter attributes like ‘complex(float)’, ‘complex_real(float)’, ‘complex_imaginary(float)’, and so on.

At this point, apart from making first class complex types (which I know a lot of people oppose), I’m not entirely happy with any of the options I’ve thought of.

Does anyone have any other ideas?

Well, I’m already on record as advocating for option 3, so take this with that grain of salt. :blush:

The root issue here, as I see it, is that the C ABI for complex types in functions is fundamentally cursed. Focusing on the return type in particular, depending on the ABI, it’s either returned as two FP registers, returned as a single vector FP register, or returned as a struct containing two floating-point types (which is cursed and is returned as an integer register in a few ABIs). Oh, and on pretty much every architecture, there’s one FP type whose corresponding complex type is handled specially. If you want to optimize on complex operations, you’ll either have to handle the complexity of all the ABIs, or create some mechanism that allow complex types to be represented uniformly.

I don’t think there’s all that much distance between the different options as presented: the real question is do you use a uniform representation of a complex type that pushes ABI representation into LLVM codegen (option 1 and option 3), or do you try to make things work with the existing ABI representation (option 2, or a variant of option 1).

Of those two considerations, I think it’s preferable to work towards a more uniform complex type representation. You don’t necessarily need a dedicated complex type for this representation; I’ve gotten a lot of mileage in my complex intrinsics proposal by using vector types of floating-point. Having an LLVM attribute that indicated a vector parameter is actually a complex type for ABI purposes would allow the backends to perform the necessary insanity it needs to do. The biggest downside I see to this approach is that it means that there are times when an LLVM IR function will gain a hidden sret parameter during codegen. I don’t think there’s much that can be done to avoid this situation, however, short of making all optimizations and things like FPMathOperator worry about sret to return a complex type.

Obviously, it would be easy enough to update FPMathOperator to also accept calls which return structures with two floating point types or vectors of such structures as fp math operators.

FWIW, structs are not valid element types for vectors.

Option 1 above is a bit of a problem, because the front end handles setting up the ABI for arguments and return types, so we’d either need variations of the intrinsic to handle all possible ABI representations, or we’d need to teach back ends to generate the ABI-compliant calls when lowering these intrinsics.

The backend is going to need to generate the ABI-compliant calls even in option 3 anyways, so I don’t think it’s a dealbreaker. Adding intrinsics for the standard complex math functions is I think the right long-term solution anyways (and I’m hoping to get back to pushing on my complex mul/div proposal in the next few months), so I guess this means that this option has my vote. Although it may also behoove us to have a complex attribute for ABI purposes (or just outright create a new complex type) that allow frontends to uniformly lower C complex types to LLVM without having to duplicate the ABI logic between Clang and LLVM codegen.

I honestly forgot what the blockers for a complex type were, though I would very much prefer it over (a partial set of) intrinsics. As has been mentioned, a complex type would help all other frontends and middle end passes that want to deal with complex naturally. The ABI logic needs to be moved, but that is a doable task, IMHO.

In many architectures, yes, in a few no. So the curse is (more or less) self inflicted (often by history) by the creator of the ABI.

In my architecture any struct less than 8-doublewords long is passed both ways in registers and the register can contain any kind of data (int, pointer, float). Passing and returning complex falls out for free in this case.

IIRC, the bf16 type came in very fast with almost no discussion.

1 Like

This may be a gross generalization, but my impression from having participated in multiple discussions about this is that most people (if not all) who focus on floating-point optimization supported having a first-class complex type, but some people who don’t often work with complex operations preferred not to have it as a first-class type. My memory is less clear on the reasoning, but I think it was mostly concerns around the pervasiveness of adding support for a new type and the impact that it might have on optimizations that don’t intend to do anything with it.

Here’s a link to what I think was the most recent mailing list discussion: RFC: Complex in LLVM

If I recall correctly, we had a round table discussion on this topic at one of the LLVM Dev Meetings and the conclusion was that if we could show an optimization that couldn’t be performed just as well with intrinsics we’d revisit the topic. Although I think that was the same conclusion we reached about matrix types, so I might be conflating them a bit.

What if we build complex types on top of the new target types ([RFC] Target type classes for extensibility of LLVM IR).
We’d still need to move the ABI-related code out of clang but that is (IMHO) a good thing anyway.
Overall, there would be no new first class type in IR, and support for that extension is necessary anyway.

Here’s a link to what I think was the most recent mailing list discussion: RFC: Complex in LLVM

Unfortunately, the conversion to discourse ate the entire proposal. The mailman archive has a non-eaten version: [llvm-dev] Complex proposal v3 + roundtable agenda

What if we build complex types on top of the new target types ([RFC] Target type classes for extensibility of LLVM IR).

My usual caveat about target extension types is that you lose the ability to support most existing instructions (this is why it’s not generally a good idea for custom floating-point types). However, for complex types, that might not be the worst thing in the world. Anything other than addition and subtraction is moderately divergent from existing floating-point code in semantics. You would need to add support for bitcast (so you can bitcast a complex type to a vector or array type), literals, and into the llvm Intrinsics.td infrastructure, but that’s not really a dealbreaker I think.

We’d still need to move the ABI-related code out of clang but that is (IMHO) a good thing anyway.

+1

How does this work with the LLVM IR Dialect in MLIR? Flang uses the LLVM IR Dialect, e.g. ⚙ D149546 [Flang] Change complex divide lowering.

Is this the way forward for new types: add a target class, when it settles down, add a real IR type? Tensors?

ABI stuff must be in llvm/FrontEnd/TargetInfo.

I’m not sure how to parse what you’re saying here. Currently the ABI handling is mostly in the clang front end. I say mostly because it involves a bit of a handshake with the target-specific backend. The front end does most of the work to set things up based on its expectations about what the code generator will do. There’s a handshake there. For instance, the x86-64 ABI says “complex T” arguments should be passed as it they were a structure with two elements of type T, so the front end generates calls with two double arguments for complex double and a single argument of <2 x float> for complex float. That’s not what the ABI says, but it gets the result that the ABI wants and it matches what clang does with structures of the sort the ABI describes.

I’m not a big fan of this design. I kind of understand why it was done this way, because the front end is handling calling conventions and related semantics, but it is a bit of a headache when the final target is unknown in the front end (such as when you’re going through SPIR-V) or if the optimizer wants to generate vector calls. It’s also a problem when other front ends want to use the same ABI.

Johannes and colleagues are moving OpenMP codegen out of Clang into llvm/Frontend/OpenMP. It can now be shared between Clang and Flang.

If you store the complex layout for the different targets in llvm/Frontend/TargetInfo, then it can be shared between Clang and Flang.

I would favor complex being first class types in the LLVM IR.
Then, things like “%1 = sqrt(-1)” would just work and %1 would just be a complex type. (= 0+i)
It would certainly make the readability of LLVM IR a lot easier when using complex types.
One could then add some methods for translating between the different ABI representations of complex.
When using complex values, standard functions like sqrt() accept a much wider range of input values so there are fewer pitfalls. I think it is partly because complex has not been thought of as a first class type up until now has resulting in the wide variety of ABI for it on the various platforms.
I think we should still keep the option to pass a complex by value and also pass a complex by reference, depending on what the programmer wishes.
Its going to always have many different ways of doing it because there are also many different type of floating point representations. We always need to keep in mind how best to pass an array/vector of complex numbers.
E.g.
fp8 data types (F8E4M3FN and F8E5M2FN)

I think that fp16 and fp32 and fp64 and fp80 and fp128 types should also follow the F8E4M3FN naming convention in MLIR IR, instead of bfloat16 etc.

Thanks! I was not aware of that. I wasn’t sure if you meant to represent a path with “llvm/Frontend/TargetInfo” or if you intended it as something else.

Are you suggesting that the ABI handling from clang/lib/CodeGen/TargetInfo.cpp should be moved to “llvm/FrontEnd/TargetInfo”? Or is there a patch somewhere that starts that?

If we make complex a first-class type, I’m not sure what there would be for the front end ABI handling to do beyond deciding if arguments and return values need to be pointers to the complex data. On the other hand, there are some other cases (like vector parameters) where I think having the ABI handling in the core LLVM library would be useful.

If we don’t make complex a first-class type, I’m still at a loss for how to recognize complex math library functions as FPMathOperators.

Long-term, it would be great, if Flang, Clang, and other out-of-tree compilers could share ABI information. “llvm/Frontend/TargetInfo" could be a possible location.

BF16 went in really fast, matrices ended up as vectors, the new ml-F8 types did not happen, tensors will never happen. Is complex a good comprise?