RFC: Complex in LLVM

Hey all,

I volunteered to put together a proposal regard complex in LLVM.
Consider the following to be a strawman meant to spark discussion. It's
based on real-world experience with complex but is not expected to cover
all use-cases.

Proposal to Support Complex Operations in LLVM

Hi David —

What do you intend the semantics of the fmul and fdiv operations to be for these types? Do them implement the C semantics (avoid spurious overflow/underflow)? The naive arithmetic (some fortran implementations)? Is FMA licensed in their evaluation?

– Steve

Hi David,

IIUC your proposal preserves the current ABI?

Thanks,

JF

Hey all,

I volunteered to put together a proposal regard complex in LLVM.
Consider the following to be a strawman meant to spark discussion. It's
based on real-world experience with complex but is not expected to cover
all use-cases.

Proposal to Support Complex Operations in LLVM
----------------------------------------------

Abstract

Several vendors and individuals have proposed first-class complex
support in LLVM. Goals of this proposal include better optimization,
diagnostics and general user experience.

Introduction and Motivation

Recently the topic of complex numbers arose on llvm-dev with several
developers expressing a desire for first-class IR support for complex
[1] [2]. Interest in complex numbers in LLVM goes back much further
[3].

Currently clang chooses to represent standard types like "double
complex" and "std::complex<float>" as structure types containing two
scalar fields, for example {double, double}.

To supplement this, on some ABIs (e.g., PPC64 ELFv2), Clang represents
complex numbers as two-element arrays. This makes backend pattern
matching even more fragile because the representation is different for
different ABIs. Having an IR type would resolve this issue and making
writing IR-level transformations that operate on complex numbers more
robust.

   Consequently, arrays of
complex type are represented as, for example, [8 x {double, double}].
This has consequences for how clang converts complex operations to LLVM
IR. In general, clang emits loads of the individual real and imaginary
parts and feeds them into arithmetic operations. Vectorization results
in many shufflevector operations to massage the data into sequences
suitable for vector arithmetic.

All of the real/imaginary data manipulation obscures the underlying
arithmetic. It makes it difficult to reason about the algebraic
properties of expressions. For expressiveness and optimization ability,
it will be nice to have a higher-level representation for complex in
LLVM IR. In general, it is desirable to defer lowering of complex until
the optimizer has had a reasonable chance to exploit its properties.

I think that it's really important that we're specific about the goals
here. Exactly what kinds of optimizations are we aiming to (more-easily)
enable? There certainly exists hardware with instructions that help
vectorize complex multiplication, for example, and having a builtin
complex type would make writing patterns for those instructions easier
(as opposed to trying to build matching into the SLP vectorizer or
elsewhere). This probably makes constant-folding calls to complex libm
functions easier.

Does this make loop vectorization easier or harder? Do you expect the
vectorizer to form vectors of these complex types?

First-class support for complex can also improve the user experience.
Diagnostics could express concepts in the complex domain instead of
referring to expressions containing shuffles and other low-level data
manipulation. Users that wish to examine IR directly will see much less
gobbbledygook and can more easily reason about the IR.

Types

This proposal introduces new Single Value types to represent complex
numbers.

c32 - like float complex or std::complex<float>
c64 - like double complex or std::complex<double>

We defer a c128 type (like std::complex<long double>) for a future
RFC.

Why? I'd prefer we avoid introducing even more special cases. Is there
any reason why we should not define "complex <scalar type>", or to be
more restrictive, "complex <floating-point type>"? I really don't like
the idea of excluding 128-bit complex types, and I think that we can
have a generic facility.

-Hal

Why? I'd prefer we avoid introducing even more special cases. Is there
any reason why we should not define "complex <scalar type>", or to be
more restrictive, "complex <floating-point type>"? I really don't like
the idea of excluding 128-bit complex types, and I think that we can
have a generic facility.

Hal, we had 128-bit complex in an earlier draft of David's proposal, but thought it to be an unnecessary distraction from our main interest, which is complex composed of 32-bit and 64-bit FP. I'll take responsibility for having prodded him to remove it. :slight_smile:

We're most interested in seeing complex supported in LLVM to help with compiling complex types in C and C++, where there is general understanding of complex when it's commonly composed of a pair of floats or doubles. Once you get beyond 32-bit and 64-bit FP as the constituent type though, you're in "complex long double" territory, and that introduces confusion. Many expect complex long double to have 128-bit parts, but long double can mean 80-bit FP. Although we personally haven't seen much practical use of a complex type composed of 80-bit FP, we thought that proposing complex for 128-bit FP might require also covering the 80-bit FP case. What long double means is, after all, a Clang issue and not an LLVM issue, but we figured that proposing more types would have a harder path to success than proposing fewer types.

Do you think it's best to have the full set (c32, c64, c80, c128) or just (c32, c64, c128)?

-Troy

If I understand the proposal correctly these new types aren't really
limited to floating point.

We probably want to use them for fixed point types as well

  c32 %res = smul.fix c32 %a, c32 %b, i16 31
  v4c32 %res = sadd.sat v4c32 %a, v4c32 %b

at least if that simplifies loop vectorization etc.

Then it would be neat to also have c16.

(btw, downstream we'd probably try to add c24 and c40 as well,
similar to what we do today for i24 and i40 types).

Regards,
Björn

What are your plans for the reverse? I assume we don't want the only
way to materialize a complex to be via memory so an insertvalue
equivalent (or maybe using insertvalue/extractvalue directly?) and a
literal value would probably be useful.

Cheers.

Tim.

That's the intent. I'm only really familiar with the X86 and AArch64
ABIs so feedback on whether this is sufficient for other targets is
welcome!

                      -David

JF Bastien <jfbastien@apple.com> writes:

I think they should probably implement the C standard semantics and
various pragmas or compiler flags could allow different algorithms. The
intent would be to preserve the current behavior of clang. That should
be fine for Fortran as well. I don't know about other languages like go
that have LLVM frontends.

As for FMAs, I think that could also be tied to pragmas and compiler
flags.

                     -David

Stephen Canon <scanon@apple.com> writes:

"Finkel, Hal J." <hfinkel@anl.gov> writes:

I think that it's really important that we're specific about the goals
here. Exactly what kinds of optimizations are we aiming to (more-easily)
enable? There certainly exists hardware with instructions that help
vectorize complex multiplication, for example, and having a builtin
complex type would make writing patterns for those instructions easier
(as opposed to trying to build matching into the SLP vectorizer or
elsewhere). This probably makes constant-folding calls to complex libm
functions easier.

Yes, all of that. Plus things like instcombine, expression
rewrites/simplification and so on.

Does this make loop vectorization easier or harder? Do you expect the
vectorizer to form vectors of these complex types?

I expect the vectorizer to form vectors of complex types, yes. I
suspect having a first-class complex type will make vectorization
easier.

We defer a c128 type (like std::complex<long double>) for a future
RFC.

Why? I'd prefer we avoid introducing even more special cases. Is there
any reason why we should not define "complex <scalar type>", or to be
more restrictive, "complex <floating-point type>"? I really don't like
the idea of excluding 128-bit complex types, and I think that we can
have a generic facility.

Troy already addressed this but I'm very happy to re-add c128. I think
complex <floating-point-type> would be fine. Of course some
floating-point-types only make sense on certain targets. How would we
legalize a c24 on a target that doesn't support it natively? Calls into
compiler-rt?

We had quite a bit of discussion around naming here. If we're expanding
this to allow general floating point types, is c<bit-width> still a good
name?

                        -David

Björn Pettersson A <bjorn.a.pettersson@ericsson.com> writes:

If I understand the proposal correctly these new types aren't really
limited to floating point.

Currently they are but we could relax that.

We probably want to use them for fixed point types as well

  c32 %res = smul.fix c32 %a, c32 %b, i16 31
  v4c32 %res = sadd.sat v4c32 %a, v4c32 %b

Wouldn't we want a different type for such a thing?

I'm trying to come up with a good name fox complex<fixed-point>.
Originally I thought maybe ci32 but that might be interpreted as a
Gaussian integer. Is a Gaussian integer type useful? I haven't ever
seen it used in my time here (though I can't claim to have super-deep
knowledge of full applications) but maybe it's more prominent in other
domains.

at least if that simplifies loop vectorization etc.

It probably would.

Then it would be neat to also have c16.

Agreed.

(btw, downstream we'd probably try to add c24 and c40 as well,
similar to what we do today for i24 and i40 types).

As I said in my reply to Hal, a general complex<floating-or-fixed-type>
may be the way to go.

                         -David

Tim Northover <t.p.northover@gmail.com> writes:

llvm.creal.* - Overloaded intrinsic to extract the real part of a
               complex value
declare float @llvm.creal.c32(c32 %Val)
declare double @llvm.creal.c64(c64 %Val)

What are your plans for the reverse? I assume we don't want the only
way to materialize a complex to be via memory so an insertvalue
equivalent (or maybe using insertvalue/extractvalue directly?) and a
literal value would probably be useful.

Good points. Originally I put the creal/cimag intrinsics in the
proposal when the layout of the complex type was left unspecified.
After internal feedback, I changed it to an explicitly-specified layout
(real followed by imaginary). Maybe creal/cimag should go away. Then
we wouldn't have to teach the optimizer about them and it already
understands insertvalue/extractvalue. Of course it might have to be
taught about insertvalue/extractvalue on a complex type anyway. So I
dunno, is there a strong preference one way or the other?

                            -David

I expect the vectorizer to form vectors of complex types, yes. I suspect having a first-class complex type will make vectorization easier.

Yes, in ICC, complex is the first class data type, it makes vectorization easier. It enables code generator to avoid "gathers ... " with "load-shuffle ..." sequence for a better performance.

I agree with Hal, I would suggest to add c128 as well, as LLVM is being used for developing Fortran compiler as well.

Thanks,
Xinmin

This is the important part, and there is nothing in this RFC that helps alleviate it.

Vectorization must know the data layout: whether we have vectors (r1, i1, r2, i2...) or (r1, r2, ...), (i1, i2, ...). These two approaches are not compatible. If you have vector registers that can hold 8 floats, with the first approach you can load 4 complex numbers in a single instruction, then multiply by another 4 numbers, and store. With the second approach, the minimum unit of work is 8 numbers, and each input to the multiplication has to be loaded in two instructions, loading real and imaginary parts from two separate locations. On most architectures the second approach would be vastly superior, but the v4c32 type mentioned in the RFC suggests the first one.

In addition to that, we shouldn't limit complex types to floating point only. What we care about is keeping the "ac-bd" together, not what type a,b,c,d are.

-Krzysztof

I think they should probably implement the C standard semantics and
various pragmas or compiler flags could allow different algorithms. The
intent would be to preserve the current behavior of clang. That should
be fine for Fortran as well. I don't know about other languages like go
that have LLVM frontends.

IIRC, a Fortran compiler must use a numerically-stable division
algorithm or else the LAPACK regression tests won't pass. I think that
we'll want a non-naive division lowering (e.g., Smith's algorithm,
something along the lines of the one in
https://github.com/llvm-flang/flang-old/blob/master/lib/CodeGen/CGExprComplex.cpp).

As for FMAs, I think that could also be tied to pragmas and compiler
flags.

I agree. This should be tied to the existing fast-math flags (contract
for FMAs, for example).

-Hal

Tim Northover <t.p.northover@gmail.com> writes:

llvm.creal.* - Overloaded intrinsic to extract the real part of a
                complex value
declare float @llvm.creal.c32(c32 %Val)
declare double @llvm.creal.c64(c64 %Val)

What are your plans for the reverse? I assume we don't want the only
way to materialize a complex to be via memory so an insertvalue
equivalent (or maybe using insertvalue/extractvalue directly?) and a
literal value would probably be useful.

Good points. Originally I put the creal/cimag intrinsics in the
proposal when the layout of the complex type was left unspecified.
After internal feedback, I changed it to an explicitly-specified layout
(real followed by imaginary). Maybe creal/cimag should go away. Then
we wouldn't have to teach the optimizer about them and it already
understands insertvalue/extractvalue. Of course it might have to be
taught about insertvalue/extractvalue on a complex type anyway. So I
dunno, is there a strong preference one way or the other?

One option is to make the complex type a special kind of vector, or a
special kind of aggregate (I have a slight preference for the latter).
That gives us an existing set of accessors.

-Hal

I agree non-vector. If nothing else a vector of complexes seems like a
sensible concept which would be harder if a complex was itself
vectorial.

Myself, I think I do favour insertvalue/extractvalue over intrinsics
(which is probably a +1 for isAggregate too by extension).

If the notation for the literal is especially re/im rather than 0/1 we
might think about adding "extractvalue %v, real" as sugar (that last
arg is already weird). All just things to think about right now
though.

Cheers.

Tim.

"Finkel, Hal J." <hfinkel@anl.gov> writes:

I think that it's really important that we're specific about the goals
here. Exactly what kinds of optimizations are we aiming to (more-easily)
enable? There certainly exists hardware with instructions that help
vectorize complex multiplication, for example, and having a builtin
complex type would make writing patterns for those instructions easier
(as opposed to trying to build matching into the SLP vectorizer or
elsewhere). This probably makes constant-folding calls to complex libm
functions easier.

Yes, all of that. Plus things like instcombine, expression
rewrites/simplification and so on.

Does this make loop vectorization easier or harder? Do you expect the
vectorizer to form vectors of these complex types?

I expect the vectorizer to form vectors of complex types, yes. I
suspect having a first-class complex type will make vectorization
easier.

We defer a c128 type (like std::complex<long double>) for a future
RFC.

Why? I'd prefer we avoid introducing even more special cases. Is there
any reason why we should not define "complex <scalar type>", or to be
more restrictive, "complex <floating-point type>"? I really don't like
the idea of excluding 128-bit complex types, and I think that we can
have a generic facility.

Troy already addressed this but I'm very happy to re-add c128. I think
complex <floating-point-type> would be fine.

Great.

  Of course some
floating-point-types only make sense on certain targets. How would we
legalize a c24 on a target that doesn't support it natively? Calls into
compiler-rt?

I think that we can have a sensible system for all current types: For
the various types, we have:

- x86_fp80 (and perhaps x86_mmx): Lowering for these is only
supported in relevant x86 configurations, and lowering the complex
variants will likewise work only in any relevant target configurations .

- float, double: Lowering for these has an ABI, and we can use that
ABI (runtime calls, expansions).

- fp128 and ppc_fp128: On systems which support these, they're used
for _Complex long double, and so there is an ABI to follow.

- half: half may have an ABI on some systems, in which case we can
follow it, and on other systems it is treated as a "storage only" type,
which operations being promoted to single precision, and we can do the
same for the default lowering.

-Hal

"Finkel, Hal J. via llvm-dev" <llvm-dev@lists.llvm.org> writes:

  Of course some
floating-point-types only make sense on certain targets. How would we
legalize a c24 on a target that doesn't support it natively? Calls into
compiler-rt?

I think that we can have a sensible system for all current types: For
the various types, we have:

- x86_fp80 (and perhaps x86_mmx): Lowering for these is only
supported in relevant x86 configurations, and lowering the complex
variants will likewise work only in any relevant target configurations .

- float, double: Lowering for these has an ABI, and we can use that
ABI (runtime calls, expansions).

- fp128 and ppc_fp128: On systems which support these, they're used
for _Complex long double, and so there is an ABI to follow.

- half: half may have an ABI on some systems, in which case we can
follow it, and on other systems it is treated as a "storage only" type,
which operations being promoted to single precision, and we can do the
same for the default lowering.

Sounds reasonable.

                      -David

Tim Northover via llvm-dev <llvm-dev@lists.llvm.org> writes: