Fixed Point Support in LLVM

We would like to discuss the possibility of adding support for fixed
point operations, either through new types, instructions, or
intrinsics. This is an extension to a previous proposal
(http://lists.llvm.org/pipermail/cfe-dev/2018-April/057756.html) for
implementing fixed point arithmetic entirely in clang, but John McCall
brings up good points in the comments thread of
https://reviews.llvm.org/D50616 for adding LLVM support.

Just to summarize the proposal, Embedded-C lays out an implementation
for fixed point data types and operations on them. A fixed point
number is a number that contains a fractional and integral part. These
types can essentially be represented as scaled integers, that is, a
radix point exists somewhere in the integer that divides it into
integral and fractional bits.

Adding a new type seems unnecessary since all fixed point types in the
spec can be represented as integers. For operations involving these
integers, the scale could ideally be passed as an argument to whatever
instruction or intrinsic is performing a fixed point operation. I’m
not sure of the difference in work between adding a new instruction vs
intrinsic, but this would ideally be done for complex operations.

Namely, intrinsics/instructions would be added for these operations:
- signed multiplication
- signed saturating multiplication
- signed division
- unsigned division
- signed saturating division
- unsigned saturating division
- saturation
- saturating addition
- saturating subtraction
- floating-point to fixed-point conversion

Bevin Hansson has implemented these in his downstream version of
clang/llvm (http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html),
and I imagine this is all we may need for intrinsics. We would like to
offset complicated instructions to llvm for targets that provide
native support for these operations while still being able to manage
most of the semantics on the frontend since many simple operations can
be done using existing instructions. These operations will default to
code generated IR for architectures that do not support fixed points
natively.

Does anyone have any more thoughts on how to correctly approach this?

Thanks,
Leo

A couple of thoughts here. None particularly well thought out, so please take this more as brainstorming then specific recommendations.

In terms of typing, have you considered representing your fixed points as structs in IR as apposed to packed integers? I suspect - but am by no means sure - that exposing the parts to the optimizer separately may lead to better overall optimization quality. SROA is quite good at destroying small structs.

In terms of ABI, how are fixed points passed as arguments and return values? This may impact your choice of IR representation. If you can get away with something simple (e.g. passed as iN, bitcast to desired types) you'll be much better off then if you have a complicated ABI.

If I understand your operations, you're going to need to have the scale as an explicit parameter to your intrinsics right? If so, you're essentially going to end up with needing a family of intrinsics for each operation. You have a couple of approaches for modeling that: either constant parameters or name manging might work.

Have you considered phrasing this as builtin functions as opposed to intrinsics? We have fairly decent support in TLI for conditionally available builtins where as intrinsics are generally assumed to be always supported. Phrasing your operations as a set of builtins implemented by a runtime library may allow both easier prototyping and easier per target integration. In particular, pattern matching IR to the operations would be much more natural if framed as TLI functions since we have the concept of an unsupported builtin already supported.

Philip

We would like to discuss the possibility of adding support for fixed
point operations, either through new types, instructions, or
intrinsics. This is an extension to a previous proposal
(http://lists.llvm.org/pipermail/cfe-dev/2018-April/057756.html) for
implementing fixed point arithmetic entirely in clang, but John McCall
brings up good points in the comments thread of
https://reviews.llvm.org/D50616 for adding LLVM support.

Just to summarize the proposal, Embedded-C lays out an implementation
for fixed point data types and operations on them. A fixed point
number is a number that contains a fractional and integral part. These
types can essentially be represented as scaled integers, that is, a
radix point exists somewhere in the integer that divides it into
integral and fractional bits.

Adding a new type seems unnecessary since all fixed point types in the
spec can be represented as integers. For operations involving these
integers, the scale could ideally be passed as an argument to whatever
instruction or intrinsic is performing a fixed point operation. I’m
not sure of the difference in work between adding a new instruction vs
intrinsic, but this would ideally be done for complex operations.

At the risk of distracting from your narrower proposal, I'd like to pitch the idea of adding
fixed-point values as a new fundamental type in IR. I think it's really easy to assume
that this would be a dauntingly complicated task just because it's not something that
most people have experience doing. Most of the complexity would be in targets, and
we could easily eliminate these types in a generic legalization pass in the backend to
reduce that impact.

And "all fixed point types in the spec can be represented as integers" is true
of literally everything in LLVM. Every type we've got (except for opaque structs,
I guess) is a finite, fixed-size sequence of bits; nonetheless, we have direct support
for representing pointers, vectors, FP, and so on, when we could absolutely instead
use an appropriately-sized integer type, a large collection of intrinsics, and some sort
of "fpreg" attribute on call arguments to mark the difference in CCs.

In general, I think LLVM has gotten way too addicted to the pretense that adding things
as intrinsics or funky instruction attributes isn't adding just as much complexity as adding
new types and instructions. LLVM should clearly provide portable support for fixed-point
operations. We can do that with weird integer constants and a bunch of intrinsics, but
that seems to me like it's just hiding the complexity and making things harder for everyone
involved.

Fixed-point types would be quite similar to floating-point types in terms of their
overall impact on the complexity of IR — which is to say, not very big — except
predictably smaller because there's no analogue to the complexity around the
floating-point special cases (i.e. NaNs, negative zero, and infinities) and the various
features relating to them (e.g. the fast-math flags). Type layout would just default to
using the rules for an integer of the same width. There just isn't that much code in
LLVM that tries to do something different for every possible type instead of assuming
that

If we did this, I would suggest separating types by representation differences, not the
semantics of the operations on them. For example, we'd have different operations for
saturating and non-saturating arithmetic, but saturating and non-saturing types would
get lowered to the same IR type. Unlike integers, though, I think maybe we wouldn't
want to unify signed and unsigned types because of the padded-representation issue;
or maybe we'd only unify types with the same internal layout, so that a padded unsigned
type would be different from an unpadded one of the same overall width.

Note that having saturating-arithmetic instructions would also be useful for integers
and integer vectors, and could similarly be legalized generically in the backend for
targets that don't have direct ISA support for them.

John.

A couple of thoughts here. None particularly well thought out, so please take this more as brainstorming then specific recommendations.

In terms of typing, have you considered representing your fixed points as structs in IR as apposed to packed integers? I suspect - but am by no means sure - that exposing the parts to the optimizer separately may lead to better overall optimization quality. SROA is quite good at destroying small structs.

This might be an interesting idea if you're working on a C level (say, you could have structs with bitfields containing the parts of the fixed-point number), but I don't know if there's a benefit to it on IR level, since most of the operations you'd want to do on fixed-point numbers really are regular integer operations. In some basic experiments we've done with _Complex (which is represented with a struct), we find that the optimizer doesn't always do as good a job with structs as it does with just integers, even when SROA unpacks things.

In terms of ABI, how are fixed points passed as arguments and return values? This may impact your choice of IR representation. If you can get away with something simple (e.g. passed as iN, bitcast to desired types) you'll be much better off then if you have a complicated ABI.

I think the discussion of ABI came up in review of one of the Clang patches; unsure what the verdict was. Most targets likely don't have a well defined ABI for fixed-point numbers, so it would be up for discussion, but I think the easiest way is just to say that fixed-point types have the same ABI as their same-sized integer counterparts. If their representation is also in the form of integers, then this is trivial.

If I understand your operations, you're going to need to have the scale as an explicit parameter to your intrinsics right? If so, you're essentially going to end up with needing a family of intrinsics for each operation. You have a couple of approaches for modeling that: either constant parameters or name manging might work.

In our implementation, the intrinsics that require a scale have one as an extra constant parameter. This is multiplication and division, and the saturating counterparts.

I think that our intrinsic set would have to be extended a bit to work with the design that Leonard is working on, since we are implementing DSP-C rather than Embedded-C. Some intrinsics would have to be added (unsigned multiplication; possibly differentiating between signed and unsigned saturating addition/subtraction?) and the saturating addition/subtraction would need a scale parameter.

Have you considered phrasing this as builtin functions as opposed to intrinsics? We have fairly decent support in TLI for conditionally available builtins where as intrinsics are generally assumed to be always supported. Phrasing your operations as a set of builtins implemented by a runtime library may allow both easier prototyping and easier per target integration. In particular, pattern matching IR to the operations would be much more natural if framed as TLI functions since we have the concept of an unsupported builtin already supported.

Builtins for unsupported combinations of width/scale would be interesting, but most of these operations can be expanded fairly easily, and they can be implemented in terms of integer operations anyway which already have builtins for unsupported widths/operations.

Furthermore, it needs to be possible to instruction select them easily for targets with native support, so intrinsics is the simplest approach there.

/ Bevin

If we were to create a new type down the line, I think the main
features that would distinguish them from other types are the
arbitrary width and scale. Saturation can be handled through
instructions since saturation really only takes effect after an
operation and doesn’t really describe anything about the bits in the
resulting type. Signage can similarly be managed through operations
and would be consistent with the separation in signed and unsigned int
operations.

The unsigned padding is a result of the data bits not taking the whole
width of the underlying llvm integer in the frontend for optimization
purposes, so (I don’t think) this would be a problem if fixed points
were represented as a native type of arbitrary width and scale
(similar to how llvm represents arbitrary width integers like i33).

I’m unsure if I should stop what I’m working on now though to
implement this type. Although it seems correct, there also doesn’t
seem to be a very high demand for a new llvm type. I imagine another
reason one would add a new type, in addition to your reasons, is that
it represents a common type that could be used in multiple frontends,
but it doesn’t seem like other llvm frontends actively demand this.
For now I imagine intrinsics would be a nice middle ground and down
the line once fixed point types are fully fleshed out, we can explore
adding a new llvm type.

I would strongly advise getting to a working end to end implementation - ideally entirely upstream - before proposing any new types if you can. I disagree with John about the impact of a new type, not so much from a LOC perspective, but from a review burden/conceptual burden perspective. Starting with something which works (say integers), implementing some of the basic missing optimizations, and having a working end-to-end system before proposing any major IR extensions is strongly advised.

OT: I noticed saturation coming up a couple of times. For integers, we handle saturation by using a op.with.overflow/select idiom. Does the same basic notion work for the fixed point operations? If it does, that reduces the number of required operations by half.

Philip

If we were to create a new type down the line, I think the main
features that would distinguish them from other types are the
arbitrary width and scale. Saturation can be handled through
instructions since saturation really only takes effect after an
operation and doesn’t really describe anything about the bits in the
resulting type. Signage can similarly be managed through operations
and would be consistent with the separation in signed and unsigned int
operations.

The unsigned padding is a result of the data bits not taking the whole
width of the underlying llvm integer in the frontend for optimization
purposes, so (I don’t think) this would be a problem if fixed points
were represented as a native type of arbitrary width and scale
(similar to how llvm represents arbitrary width integers like i33).

I agree with all of this, with the caveat that we do have to make a decision
about the definedness of the padding bit. But I think it's okay to just assume
that all targets will follow whatever decision we make; if someone blinks and
wants the opposite rule, they get to go add an extra argument to all the intrinsics.

I’m unsure if I should stop what I’m working on now though to
implement this type. Although it seems correct, there also doesn’t
seem to be a very high demand for a new llvm type. I imagine another
reason one would add a new type, in addition to your reasons, is that
it represents a common type that could be used in multiple frontends,
but it doesn’t seem like other llvm frontends actively demand this.
For now I imagine intrinsics would be a nice middle ground and down
the line once fixed point types are fully fleshed out, we can explore
adding a new llvm type.

It's fine to start with intrinsics, I think. I do think it would be better to add a new
type, but I don't want to derail your project with a political argument over it.

I *will* derail your project if necessary with an argument that these ought to be
portable intrinsics, though. :slight_smile:

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

The main downsides of not having a type are:

  - Every operation that would've been overloaded by operand type instead has
    to be parameterized. That is, your intrinsics all have to take width and scale in
    addition to signed-ness and saturating-ness; that's a lot of parameters, which
    tends to make testing and debugging harder.

  - Constants have to be written as decimal integers, which tends to make testing
    and debugging harder.

  - Targets that want to pass fixed-point values differently from integers have to
    invent some extra way of specifying that a value is fixed-point.

So it's annoying not to have types, especially if you want special CC treatment,
but it's not the end of the world.

John.

I would strongly advise getting to a working end to end implementation - ideally entirely upstream - before proposing any new types if you can. I disagree with John about the impact of a new type, not so much from a LOC perspective, but from a review burden/conceptual burden perspective.

On the "conceptual burden" point, can you elaborate on why you think you'd be
forced to think about fixed-point types if they were added to IR?

There are a number of things in the current design of IR that are real burdens
because you have to remember them every time you work with certain operations,
like how a load can be volatile or atomic. But those are flaws in the design of those
operations — they should really be different instructions — rather than inherent
problems that would have to be duplicated for any new feature. I imagine fixed-point
types would have their own set of operations, like floating-point types do, as opposed
to (say) overloading the existing add, mul, icmp, etc. instructions.

Starting with something which works (say integers), implementing some of the basic missing optimizations, and having a working end-to-end system before proposing any major IR extensions is strongly advised.

In my experience, that's not a good approach to designing an IR. Pretty much
any representation can be made to work with enough hacks in the right places.
And nobody wants to rewrite a working end-to-end system; you just keep extending
the hacks.

OT: I noticed saturation coming up a couple of times. For integers, we handle saturation by using a op.with.overflow/select idiom. Does the same basic notion work for the fixed point operations? If it does, that reduces the number of required operations by half.

Do you find that you can reliably select a saturating instruction from such a pattern?
I imagine that the tree of selects would get pretty complicated for, say, a saturating
signed multiplication.

John.

If we were to create a new type down the line, I think the main
features that would distinguish them from other types are the
arbitrary width and scale. Saturation can be handled through
instructions since saturation really only takes effect after an
operation and doesn’t really describe anything about the bits in the
resulting type. Signage can similarly be managed through operations
and would be consistent with the separation in signed and unsigned int
operations.

The unsigned padding is a result of the data bits not taking the whole
width of the underlying llvm integer in the frontend for optimization
purposes, so (I don’t think) this would be a problem if fixed points
were represented as a native type of arbitrary width and scale
(similar to how llvm represents arbitrary width integers like i33).

I agree with all of this, with the caveat that we do have to make a decision
about the definedness of the padding bit. But I think it's okay to just assume
that all targets will follow whatever decision we make; if someone blinks and
wants the opposite rule, they get to go add an extra argument to all the intrinsics.

Personally I would prefer to see the bit defined as zero, as that is how our implementation works. But I don't know the precise implications of leaving it undefined other than that overflow on unsigned fixed-point numbers could produce undefined results.

I’m unsure if I should stop what I’m working on now though to
implement this type. Although it seems correct, there also doesn’t
seem to be a very high demand for a new llvm type. I imagine another
reason one would add a new type, in addition to your reasons, is that
it represents a common type that could be used in multiple frontends,
but it doesn’t seem like other llvm frontends actively demand this.
For now I imagine intrinsics would be a nice middle ground and down
the line once fixed point types are fully fleshed out, we can explore
adding a new llvm type.

It's fine to start with intrinsics, I think. I do think it would be better to add a new
type, but I don't want to derail your project with a political argument over it.

I *will* derail your project if necessary with an argument that these ought to be
portable intrinsics, though. :slight_smile:

Can you clarify what you mean by portable? Portable from a target standpoint or from a language standpoint, or both? Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.

I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.

There's always the possibility of adding them to IRBuilder if we think they might need to be reused.

The main downsides of not having a type are:

   - Every operation that would've been overloaded by operand type instead has
     to be parameterized. That is, your intrinsics all have to take width and scale in
     addition to signed-ness and saturating-ness; that's a lot of parameters, which
     tends to make testing and debugging harder.

The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.

It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.

   - Constants have to be written as decimal integers, which tends to make testing
     and debugging harder.

It would be possible to add a decimal fixed-point format to the possible integer constant representations in textual IR, but this doesn't help when you're printing.

   - Targets that want to pass fixed-point values differently from integers have to
     invent some extra way of specifying that a value is fixed-point.

Another function attribute would probably be fine for that.

/ Bevin

If we were to create a new type down the line, I think the main
features that would distinguish them from other types are the
arbitrary width and scale. Saturation can be handled through
instructions since saturation really only takes effect after an
operation and doesn’t really describe anything about the bits in the
resulting type. Signage can similarly be managed through operations
and would be consistent with the separation in signed and unsigned int
operations.

The unsigned padding is a result of the data bits not taking the whole
width of the underlying llvm integer in the frontend for optimization
purposes, so (I don’t think) this would be a problem if fixed points
were represented as a native type of arbitrary width and scale
(similar to how llvm represents arbitrary width integers like i33).

I agree with all of this, with the caveat that we do have to make a decision
about the definedness of the padding bit. But I think it's okay to just assume
that all targets will follow whatever decision we make; if someone blinks and
wants the opposite rule, they get to go add an extra argument to all the intrinsics.

Personally I would prefer to see the bit defined as zero, as that is how our implementation works. But I don't know the precise implications of leaving it undefined other than that overflow on unsigned fixed-point numbers could produce undefined results.

As I understand things, leaving it undefined would allow non-saturating addition/subtraction to just freely overflow into the bit. I don't know if other arithmetic would equally benefit, and defining it to be zero will probably simplify most other operations even if it does sometimes require some amount of extra masking.

I don't know what the dynamic balance of fixed-point operations is in a real program. It's certainly plausible that optimizing non-saturating arithmetic is worth penalizing other operations.

I’m unsure if I should stop what I’m working on now though to
implement this type. Although it seems correct, there also doesn’t
seem to be a very high demand for a new llvm type. I imagine another
reason one would add a new type, in addition to your reasons, is that
it represents a common type that could be used in multiple frontends,
but it doesn’t seem like other llvm frontends actively demand this.
For now I imagine intrinsics would be a nice middle ground and down
the line once fixed point types are fully fleshed out, we can explore
adding a new llvm type.

It's fine to start with intrinsics, I think. I do think it would be better to add a new
type, but I don't want to derail your project with a political argument over it.

I *will* derail your project if necessary with an argument that these ought to be
portable intrinsics, though. :slight_smile:

Can you clarify what you mean by portable? Portable from a target standpoint or from a language standpoint, or both?

I just mean that I want there to be intrinsics called llvm.fixadd or llvm.fixmul or whatever with well-specified, target-independent semantics instead of a million target-specific intrinsics called something like llvm.supermips37.addfixvlq_16.

Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.

I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

"Lots of parameterization" sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.

If the frontend generates a ton of requests for operations that have to be carried out completely in software, that's its problem.

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.

Why not? I mean, sure, they're easy to define with extends and shifts, but it doesn't really seem *bad* to have intrinsics for them. I guess you'd lose some small amount of free integer optimization from the middle-end, but the important cases will probably all still get done for free when they get legalized.

There's always the possibility of adding them to IRBuilder if we think they might need to be reused.

Please at least do this, yes.

The main downsides of not having a type are:

  - Every operation that would've been overloaded by operand type instead has
    to be parameterized. That is, your intrinsics all have to take width and scale in
    addition to signed-ness and saturating-ness; that's a lot of parameters, which
    tends to make testing and debugging harder.

The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.

Well, width can differ from the width of the integer type for these padded types, right? I mean, you can represent those differently if you want, but I would hope it's represented explicitly and not just as a target difference.

It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.

In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It's particularly annoying when generating code, which also affects middle-end optimizers. And in the end you're going to be pattern-matching the constant arguments anyway.

But ultimately, I'm not trying to dictate the design to the people actually doing the work. I just wanted to ward off the implementation that seemed to be evolving, which was a bunch of hand-lowering in Clang's IR-generation, modified by target hooks to emit target-specific intrinsic calls.

  - Constants have to be written as decimal integers, which tends to make testing
    and debugging harder.

It would be possible to add a decimal fixed-point format to the possible integer constant representations in textual IR, but this doesn't help when you're printing.

Right.

  - Targets that want to pass fixed-point values differently from integers have to
    invent some extra way of specifying that a value is fixed-point.

Another function attribute would probably be fine for that.

It depends. It wouldn't work well with compound results, at least, but people often tend not to care about those because they're awkward to work with in C.

John.

If we were to create a new type down the line, I think the main
features that would distinguish them from other types are the
arbitrary width and scale. Saturation can be handled through
instructions since saturation really only takes effect after an
operation and doesn’t really describe anything about the bits in the
resulting type. Signage can similarly be managed through operations
and would be consistent with the separation in signed and unsigned int
operations.

The unsigned padding is a result of the data bits not taking the whole
width of the underlying llvm integer in the frontend for optimization
purposes, so (I don’t think) this would be a problem if fixed points
were represented as a native type of arbitrary width and scale
(similar to how llvm represents arbitrary width integers like i33).

I agree with all of this, with the caveat that we do have to make a decision
about the definedness of the padding bit. But I think it's okay to just assume
that all targets will follow whatever decision we make; if someone blinks and
wants the opposite rule, they get to go add an extra argument to all the intrinsics.

Personally I would prefer to see the bit defined as zero, as that is how our implementation works. But I don't know the precise implications of leaving it undefined other than that overflow on unsigned fixed-point numbers could produce undefined results.

As I understand things, leaving it undefined would allow non-saturating addition/subtraction to just freely overflow into the bit. I don't know if other arithmetic would equally benefit, and defining it to be zero will probably simplify most other operations even if it does sometimes require some amount of extra masking.

I don't know what the dynamic balance of fixed-point operations is in a real program. It's certainly plausible that optimizing non-saturating arithmetic is worth penalizing other operations.

Yes, if we don't clear the unsigned padding bit after every operation (C operation, not IR operation) we can get garbage in the bit that could affect later operations (multiplication, comparison). Locking it to always be zero would ensure that you always produce a valid, representable value.

I think we already discussed this in review and settled on keeping it undefined, but from my side it's only because I don't really have a good argument for making it zeroed; technically this is overflow and therefore undefined anyway. I think that ensuring that we don't produce unrepresentable values is good, though.

I’m unsure if I should stop what I’m working on now though to
implement this type. Although it seems correct, there also doesn’t
seem to be a very high demand for a new llvm type. I imagine another
reason one would add a new type, in addition to your reasons, is that
it represents a common type that could be used in multiple frontends,
but it doesn’t seem like other llvm frontends actively demand this.
For now I imagine intrinsics would be a nice middle ground and down
the line once fixed point types are fully fleshed out, we can explore
adding a new llvm type.

It's fine to start with intrinsics, I think. I do think it would be better to add a new
type, but I don't want to derail your project with a political argument over it.

I *will* derail your project if necessary with an argument that these ought to be
portable intrinsics, though. :slight_smile:

Can you clarify what you mean by portable? Portable from a target standpoint or from a language standpoint, or both?

I just mean that I want there to be intrinsics called llvm.fixadd or llvm.fixmul or whatever with well-specified, target-independent semantics instead of a million target-specific intrinsics called something like llvm.supermips37.addfixvlq_16.

Okay, then we're pretty much on the same page. In our implementation we have llvm.sat, llvm.fixsmul, llvm.add.sat, llvm.fixsmul.sat, etc. as described in Leonard's first mail in the thread. We have omitted many intrinsics that we don't need for our particular target/language (like unsigned multiplication, unsigned variations of saturating ops, and plain addition) but the basic design is not hard to extend with this.

Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.
I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

"Lots of parameterization" sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.

Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.

I'm just not really a fan of something like 'llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)'. I don't think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.

IMHO it should be a number of separate intrinsics with well-defined semantics for each one, preferably semantics that can be described in a single line. That makes it easy to implement and understand. Like so:

llvm.ssat.iN(iN A, i32 SatWidth) = saturate anything above SatWidth bits as a signed value
llvm.usat.iN(iN B, i32 SatWidth) = saturate anything above SatWidth bits as an unsigned value
llvm.fixsmul.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->trunc
llvm.fixumul.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->trunc
llvm.fixsmul.sat.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->ssat->trunc
llvm.fixumul.sat.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->usat->trunc
llvm.sadd.sat.iN(iN A, iN B) = saddo->check sign&ovf->selects
llvm.uadd.sat.iN(iN A, iN B) = uaddo->check ??&ovf->selects
etc.

The operations that the intrinsics are expressed in are strictly different anyway (sext vs zext, saddo vs uaddo etc), so there isn't any room for generalization on constant parameter values in the first place. You'll need to disambiguate manually regardless. The semantics for the saturation intrinsics aren't that straightforward... but it's probably possible to come up with a good description with a bit of thinking.

If the frontend generates a ton of requests for operations that have to be carried out completely in software, that's its problem.

An example of what I mean by extra code is if you look at the intrinsics I described. If you have an unsigned padding bit, there is no way to perform an unsigned saturating addition with those intrinsics. You need to emit an llvm.sadd.sat, add an extra check for a set sign bit (because that means that we went below 0) and clamp to zero in that case.

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.

Why not? I mean, sure, they're easy to define with extends and shifts, but it doesn't really seem *bad* to have intrinsics for them. I guess you'd lose some small amount of free integer optimization from the middle-end, but the important cases will probably all still get done for free when they get legalized.

Perhaps. The odd thing is though, if a target just wants the shifts and extends, they would have to say that these operations/intrinsics are illegal in order to get legalization, which is a bit strange to me.

If a target has a more efficient way of doing shift+trunc or shift+ext, then it should just have patterns for that in the first place.

There's always the possibility of adding them to IRBuilder if we think they might need to be reused.

Please at least do this, yes.

The main downsides of not having a type are:

   - Every operation that would've been overloaded by operand type instead has
     to be parameterized. That is, your intrinsics all have to take width and scale in
     addition to signed-ness and saturating-ness; that's a lot of parameters, which
     tends to make testing and debugging harder.

The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.

Well, width can differ from the width of the integer type for these padded types, right? I mean, you can represent those differently if you want, but I would hope it's represented explicitly and not just as a target difference.

Well, only by a bit. There's no reason to have more bits in your representation than you have value bits. The 'store size' of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.

If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.

It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.

In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It's particularly annoying when generating code, which also affects middle-end optimizers. And in the end you're going to be pattern-matching the constant arguments anyway.

I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.

if (Ty->isSigned())
IID = fixsmul
else
IID = fixumul

if (Ty->isSigned())
SignedParam = ConstantInt(1)
else
SignedParam = ConstantInt(0)

The same goes for legalization/expansion. The semantics would be described in specific operations, not generalized ones ('ext') so there isn't much you can do there anyway.

But ultimately, I'm not trying to dictate the design to the people actually doing the work. I just wanted to ward off the implementation that seemed to be evolving, which was a bunch of hand-lowering in Clang's IR-generation, modified by target hooks to emit target-specific intrinsic calls.

   - Constants have to be written as decimal integers, which tends to make testing
     and debugging harder.

It would be possible to add a decimal fixed-point format to the possible integer constant representations in textual IR, but this doesn't help when you're printing.

Right.

   - Targets that want to pass fixed-point values differently from integers have to
     invent some extra way of specifying that a value is fixed-point.

Another function attribute would probably be fine for that.

It depends. It wouldn't work well with compound results, at least, but people often tend not to care about those because they're awkward to work with in C.

Oh, I didn't think of those.

Personally I think most of the CC handling in Lowering is a huge pain to do anything with, even for today's types.

/ Bevin

As I understand things, leaving it undefined would allow non-saturating addition/subtraction to just freely overflow into the bit. I don't know if other arithmetic would equally benefit, and defining it to be zero will probably simplify most other operations even if it does sometimes require some amount of extra masking.

I don't know what the dynamic balance of fixed-point operations is in a real program. It's certainly plausible that optimizing non-saturating arithmetic is worth penalizing other operations.

Yes, if we don't clear the unsigned padding bit after every operation (C operation, not IR operation) we can get garbage in the bit that could affect later operations (multiplication, comparison). Locking it to always be zero would ensure that you always produce a valid, representable value.

I think we already discussed this in review and settled on keeping it undefined, but from my side it's only because I don't really have a good argument for making it zeroed; technically this is overflow and therefore undefined anyway. I think that ensuring that we don't produce unrepresentable values is good, though.

Oh, sorry, I didn't realize that the semantics were that overflow is UB when it isn't saturating. Yes, in that case, assuming that the bit is zero, but not doing anything to ensure that it's zero in non-saturating operations, seems like the right policy.

Note that that strengthens the arguments for using intrinsics over frontend expansions, because there are stronger preconditions on these operations than the integer optimizer understands.

I just mean that I want there to be intrinsics called llvm.fixadd or llvm.fixmul or whatever with well-specified, target-independent semantics instead of a million target-specific intrinsics called something like llvm.supermips37.addfixvlq_16.

Okay, then we're pretty much on the same page. In our implementation we have llvm.sat, llvm.fixsmul, llvm.add.sat, llvm.fixsmul.sat, etc. as described in Leonard's first mail in the thread. We have omitted many intrinsics that we don't need for our particular target/language (like unsigned multiplication, unsigned variations of saturating ops, and plain addition) but the basic design is not hard to extend with this.

Okay.

Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.
I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

"Lots of parameterization" sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.

Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.

I'm just not really a fan of something like 'llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)'. I don't think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.

Well, yes, the readability problem here is part of why I've been encouraging the use of a type + instructions. Something like:
  %1 = fixumul saturate fix32_16p %0, 7.5
is much, much more approachable than:
  %1 = call @llvm.fixmul.i32(i32 %0, i32 491520, i32 15, i1 0, i1 1, i2 0, i1 1)

But sure, we can do finer-grained intrinsics like this:
  %1 = call @llvm.fixumul.sat.i32(i32 %0, i32 491520, i32 16, i2 0, i1 1)
That still seems like a lot of parameters, though, and it's not hard to imagine it getting worse over time (if, say, people want a non-UB but still non-saturating semantics).

IMHO it should be a number of separate intrinsics with well-defined semantics for each one, preferably semantics that can be described in a single line. That makes it easy to implement and understand. Like so:

llvm.ssat.iN(iN A, i32 SatWidth) = saturate anything above SatWidth bits as a signed value
llvm.usat.iN(iN B, i32 SatWidth) = saturate anything above SatWidth bits as an unsigned value
llvm.fixsmul.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->trunc
llvm.fixumul.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->trunc
llvm.fixsmul.sat.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->ssat->trunc
llvm.fixumul.sat.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->usat->trunc
llvm.sadd.sat.iN(iN A, iN B) = saddo->check sign&ovf->selects
llvm.uadd.sat.iN(iN A, iN B) = uaddo->check ??&ovf->selects
etc.

The operations that the intrinsics are expressed in are strictly different anyway (sext vs zext, saddo vs uaddo etc), so there isn't any room for generalization on constant parameter values in the first place. You'll need to disambiguate manually regardless. The semantics for the saturation intrinsics aren't that straightforward... but it's probably possible to come up with a good description with a bit of thinking.

Well, the semantics are the high-level arithmetic semantics, which are stronger than any particular lowering sequence.

If the frontend generates a ton of requests for operations that have to be carried out completely in software, that's its problem.

An example of what I mean by extra code is if you look at the intrinsics I described. If you have an unsigned padding bit, there is no way to perform an unsigned saturating addition with those intrinsics. You need to emit an llvm.sadd.sat, add an extra check for a set sign bit (because that means that we went below 0) and clamp to zero in that case.

I don't see why this wouldn't be part of the intrinsic.

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.

Why not? I mean, sure, they're easy to define with extends and shifts, but it doesn't really seem *bad* to have intrinsics for them. I guess you'd lose some small amount of free integer optimization from the middle-end, but the important cases will probably all still get done for free when they get legalized.

Perhaps. The odd thing is though, if a target just wants the shifts and extends, they would have to say that these operations/intrinsics are illegal in order to get legalization, which is a bit strange to me.

Agreed, it's a little strange, but I think it's not hard to understand.

If a target has a more efficient way of doing shift+trunc or shift+ext, then it should just have patterns for that in the first place.

True.

There's always the possibility of adding them to IRBuilder if we think they might need to be reused.

Please at least do this, yes.

The main downsides of not having a type are:

  - Every operation that would've been overloaded by operand type instead has
    to be parameterized. That is, your intrinsics all have to take width and scale in
    addition to signed-ness and saturating-ness; that's a lot of parameters, which
    tends to make testing and debugging harder.

The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.

Well, width can differ from the width of the integer type for these padded types, right? I mean, you can represent those differently if you want, but I would hope it's represented explicitly and not just as a target difference.

Well, only by a bit. There's no reason to have more bits in your representation than you have value bits. The 'store size' of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.

Well, there are similar restrictions on the scale, too, right? In theory the scale could be an arbitrary positive or negative number, but I assume we would actually constrain it to (0,width].

If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.

For targets that want to use that representation, yes.

It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.

In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It's particularly annoying when generating code, which also affects middle-end optimizers. And in the end you're going to be pattern-matching the constant arguments anyway.

I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.

if (Ty->isSigned())
  IID = fixsmul
else
  IID = fixumul

if (Ty->isSigned())
  SignedParam = ConstantInt(1)
else
  SignedParam = ConstantInt(0)

Well, it's more like:
  IID = E->isSaturating()
             ? (Ty->isSigned() ? fixsmul_sat : fixumul_sat)
              : (Ty->isSigned() ? fixsmul : fixumul);
vs.
  ConstantInt(Ty->isSigned()), ConstantInt(E->isSaturating())
and that's assuming just the two dimensions of variation encoded in the intrinsic name, not all the stuff with scale and padding bits and rounding mode.

The same goes for legalization/expansion. The semantics would be described in specific operations, not generalized ones ('ext') so there isn't much you can do there anyway.

But ultimately, I'm not trying to dictate the design to the people actually doing the work. I just wanted to ward off the implementation that seemed to be evolving, which was a bunch of hand-lowering in Clang's IR-generation, modified by target hooks to emit target-specific intrinsic calls.

  - Constants have to be written as decimal integers, which tends to make testing
    and debugging harder.

It would be possible to add a decimal fixed-point format to the possible integer constant representations in textual IR, but this doesn't help when you're printing.

Right.

  - Targets that want to pass fixed-point values differently from integers have to
    invent some extra way of specifying that a value is fixed-point.

Another function attribute would probably be fine for that.

It depends. It wouldn't work well with compound results, at least, but people often tend not to care about those because they're awkward to work with in C.

Oh, I didn't think of those.

Personally I think most of the CC handling in Lowering is a huge pain to do anything with, even for today's types.

Definitely.

John.

As I understand things, leaving it undefined would allow non-saturating addition/subtraction to just freely overflow into the bit. I don't know if other arithmetic would equally benefit, and defining it to be zero will probably simplify most other operations even if it does sometimes require some amount of extra masking.

I don't know what the dynamic balance of fixed-point operations is in a real program. It's certainly plausible that optimizing non-saturating arithmetic is worth penalizing other operations.

Yes, if we don't clear the unsigned padding bit after every operation (C operation, not IR operation) we can get garbage in the bit that could affect later operations (multiplication, comparison). Locking it to always be zero would ensure that you always produce a valid, representable value.

I think we already discussed this in review and settled on keeping it undefined, but from my side it's only because I don't really have a good argument for making it zeroed; technically this is overflow and therefore undefined anyway. I think that ensuring that we don't produce unrepresentable values is good, though.

Oh, sorry, I didn't realize that the semantics were that overflow is UB when it isn't saturating. Yes, in that case, assuming that the bit is zero, but not doing anything to ensure that it's zero in non-saturating operations, seems like the right policy.

Alright. It is the pragmatic choice.

Note that that strengthens the arguments for using intrinsics over frontend expansions, because there are stronger preconditions on these operations than the integer optimizer understands.

I just mean that I want there to be intrinsics called llvm.fixadd or llvm.fixmul or whatever with well-specified, target-independent semantics instead of a million target-specific intrinsics called something like llvm.supermips37.addfixvlq_16.

Okay, then we're pretty much on the same page. In our implementation we have llvm.sat, llvm.fixsmul, llvm.add.sat, llvm.fixsmul.sat, etc. as described in Leonard's first mail in the thread. We have omitted many intrinsics that we don't need for our particular target/language (like unsigned multiplication, unsigned variations of saturating ops, and plain addition) but the basic design is not hard to extend with this.

Okay.

Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.
I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

"Lots of parameterization" sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.

Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.

I'm just not really a fan of something like 'llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)'. I don't think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.

Well, yes, the readability problem here is part of why I've been encouraging the use of a type + instructions. Something like:
   %1 = fixumul saturate fix32_16p %0, 7.5
is much, much more approachable than:
   %1 = call @llvm.fixmul.i32(i32 %0, i32 491520, i32 15, i1 0, i1 1, i2 0, i1 1)

True, but if we go the intrinsic route then we should strive to keep the intrinsic as simple as possible while still encapsulating the functionality we need for our purpose here and now.

But sure, we can do finer-grained intrinsics like this:
   %1 = call @llvm.fixumul.sat.i32(i32 %0, i32 491520, i32 16, i2 0, i1 1)
That still seems like a lot of parameters, though, and it's not hard to imagine it getting worse over time (if, say, people want a non-UB but still non-saturating semantics).

The example I gave with the rounding and padding argument might have been more of a 'slippery slope' example. I think that if we settle on a specific implementation semantic for rounding and padding on the (Embedded-)C source language level, we should make intrinsics that reflect that and not too much more.

We could add lots of knobs to the intrinsics to make them more 'portable' but I don't think the portability benefit outweighs the complexity cost. If a language decides they want to add fixed-point support but want their rounding to be towards zero instead of down, then I think they should just emit more to handle that on top of the existing intrinsics, or just go with the existing semantics. (The former might not even be reasonably possible depending on the design; I haven't considered the example much.)

IMHO it should be a number of separate intrinsics with well-defined semantics for each one, preferably semantics that can be described in a single line. That makes it easy to implement and understand. Like so:

llvm.ssat.iN(iN A, i32 SatWidth) = saturate anything above SatWidth bits as a signed value
llvm.usat.iN(iN B, i32 SatWidth) = saturate anything above SatWidth bits as an unsigned value
llvm.fixsmul.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->trunc
llvm.fixumul.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->trunc
llvm.fixsmul.sat.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->ssat->trunc
llvm.fixumul.sat.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->usat->trunc
llvm.sadd.sat.iN(iN A, iN B) = saddo->check sign&ovf->selects
llvm.uadd.sat.iN(iN A, iN B) = uaddo->check ??&ovf->selects
etc.

The operations that the intrinsics are expressed in are strictly different anyway (sext vs zext, saddo vs uaddo etc), so there isn't any room for generalization on constant parameter values in the first place. You'll need to disambiguate manually regardless. The semantics for the saturation intrinsics aren't that straightforward... but it's probably possible to come up with a good description with a bit of thinking.

Well, the semantics are the high-level arithmetic semantics, which are stronger than any particular lowering sequence.

Hm, I suppose it would be that way. I just naturally think of fixed-point operations as integer operations, so the lowering sequence makes more sense to me mentally (at least for multiplication and division).

If the frontend generates a ton of requests for operations that have to be carried out completely in software, that's its problem.

An example of what I mean by extra code is if you look at the intrinsics I described. If you have an unsigned padding bit, there is no way to perform an unsigned saturating addition with those intrinsics. You need to emit an llvm.sadd.sat, add an extra check for a set sign bit (because that means that we went below 0) and clamp to zero in that case.

I don't see why this wouldn't be part of the intrinsic.

As for other frontends, I can only speak for Swift. Fixed-point types are not a high
priority for Swift, just like they haven't been a high priority for Clang — it's not like
Embedded C is a brand-new specification. But if we had a reason to add them to
Swift, I would be pretty upset as a frontend author to discover that LLVM's support
was scattered and target-specific and that my best implementation option was to
copy a ton of code from Clang.

It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.

Why not? I mean, sure, they're easy to define with extends and shifts, but it doesn't really seem *bad* to have intrinsics for them. I guess you'd lose some small amount of free integer optimization from the middle-end, but the important cases will probably all still get done for free when they get legalized.

Perhaps. The odd thing is though, if a target just wants the shifts and extends, they would have to say that these operations/intrinsics are illegal in order to get legalization, which is a bit strange to me.

Agreed, it's a little strange, but I think it's not hard to understand.

If a target has a more efficient way of doing shift+trunc or shift+ext, then it should just have patterns for that in the first place.

True.

There's always the possibility of adding them to IRBuilder if we think they might need to be reused.

Please at least do this, yes.

The main downsides of not having a type are:

   - Every operation that would've been overloaded by operand type instead has
     to be parameterized. That is, your intrinsics all have to take width and scale in
     addition to signed-ness and saturating-ness; that's a lot of parameters, which
     tends to make testing and debugging harder.

The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.

Well, width can differ from the width of the integer type for these padded types, right? I mean, you can represent those differently if you want, but I would hope it's represented explicitly and not just as a target difference.

Well, only by a bit. There's no reason to have more bits in your representation than you have value bits. The 'store size' of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.

Well, there are similar restrictions on the scale, too, right? In theory the scale could be an arbitrary positive or negative number, but I assume we would actually constrain it to (0,width].

Yes, that would be the restriction on the scale parameter. Technically it can be 0 as well, but then you just have an integer anyway?

My point is basically, if you have a padded type and you want to perform a fixed-point signed multiplication (for example) on fewer bits than the width, you have to emit code to do that specially, such as trunc first, then operate, then extend back. All other operations operate on all bits; there's no integer 'mul with padding', for example.

If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.

For targets that want to use that representation, yes.

Yes, true.

I guess I mostly meant the general C level representation here with its restriction on a single padding bit. Any language with a different representation (such as more padding bits) would have to figure out some way of using the provided intrinsics to work on the padded types; likely by truncing before performing the operation, as I mentioned.

It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.

In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It's particularly annoying when generating code, which also affects middle-end optimizers. And in the end you're going to be pattern-matching the constant arguments anyway.

I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.

if (Ty->isSigned())
   IID = fixsmul
else
   IID = fixumul

if (Ty->isSigned())
   SignedParam = ConstantInt(1)
else
   SignedParam = ConstantInt(0)

Well, it's more like:
   IID = E->isSaturating()
              ? (Ty->isSigned() ? fixsmul_sat : fixumul_sat)
               : (Ty->isSigned() ? fixsmul : fixumul);
vs.
   ConstantInt(Ty->isSigned()), ConstantInt(E->isSaturating())
and that's assuming just the two dimensions of variation encoded in the intrinsic name, not all the stuff with scale and padding bits and rounding mode.

True, I omitted the saturating variants...

I still prefer multiple intrinsics but I don't really have a good counterargument, except for the Clean Code boolean function argument recommendation :slight_smile:

Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined (“fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))”) rather than “fixsmul does a signed fixed-point multiplication”. If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.
I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of ‘mismatch’ between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.

“Lots of parameterization” sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren’t directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won’t map directly to a single instruction, so LLVM legalizes it to operations that are directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.

Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.

I’m just not really a fan of something like ‘llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)’. I don’t think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.

Well, yes, the readability problem here is part of why I’ve been encouraging the use of a type + instructions. Something like:
%1 = fixumul saturate fix32_16p %0, 7.5
is much, much more approachable than:
%1 = call @llvm.fixmul.i32(i32 %0, i32 491520, i32 15, i1 0, i1 1, i2 0, i1 1)

True, but if we go the intrinsic route then we should strive to keep the intrinsic as simple as possible while still encapsulating the functionality we need for our purpose here and now.

Sure.

But sure, we can do finer-grained intrinsics like this:
%1 = call @llvm.fixumul.sat.i32(i32 %0, i32 491520, i32 16, i2 0, i1 1)
That still seems like a lot of parameters, though, and it’s not hard to imagine it getting worse over time (if, say, people want a non-UB but still non-saturating semantics).

The example I gave with the rounding and padding argument might have been more of a ‘slippery slope’ example. I think that if we settle on a specific implementation semantic for rounding and padding on the (Embedded-)C source language level, we should make intrinsics that reflect that and not too much more.

I’m not trying to argue for pre-emptive generalization, but I’m not sure we’re going to reach universal agreement on at least the padding question. Targets with ISA support probably want to use whatever format lets them best exploit the hardware, but targets without it probably want to use an un-padded format because it generally means they get to use ordinary overflow detection. (There’s a nice coincidence where padding is an issue only for unsigned arithmetic, where saturation is generally easier because any given operation can only overflow in one direction. Although maybe you get the same advantages with signed overflow when the value just happens to be known non-negative — I haven’t really thought this through.)

We could add lots of knobs to the intrinsics to make them more ‘portable’ but I don’t think the portability benefit outweighs the complexity cost. If a language decides they want to add fixed-point support but want their rounding to be towards zero instead of down, then I think they should just emit more to handle that on top of the existing intrinsics, or just go with the existing semantics. (The former might not even be reasonably possible depending on the design; I haven’t considered the example much.)

I just want the semantics to not include an implicit reference to the current target. We don’t have to pre-emptively generalize beyond what’s necessary to implement the spec.

The spec says that rounding is basically unspecified:
Whether rounding is up or down is implementation-defined and may differ for different values and different situations; an implementation may specify that the rounding is indeterminable.

I’m fine with using those semantics at a high level even if means that e.g. constant-folding might differ from what would actually be done by the processor. But I think that kind of difference has been a problem for optimizations in the past.

Well, only by a bit. There’s no reason to have more bits in your representation than you have value bits. The ‘store size’ of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.

Well, there are similar restrictions on the scale, too, right? In theory the scale could be an arbitrary positive or negative number, but I assume we would actually constrain it to (0,width].

Yes, that would be the restriction on the scale parameter. Technically it can be 0 as well, but then you just have an integer anyway?

Right.

My point is basically, if you have a padded type and you want to perform a fixed-point signed multiplication (for example) on fewer bits than the width, you have to emit code to do that specially, such as trunc first, then operate, then extend back. All other operations operate on all bits; there’s no integer ‘mul with padding’, for example.

Okay. So a target that wants to do padded unsigned operations on a 32-bit type should be truncating to i31 before invoking the intrinsic? I guess I’m a little worried about the code-generation consequences of that, but it’s definitely cleaner, so if you’re okay with it, it’s fine with me.

If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.

For targets that want to use that representation, yes.

Yes, true.

I guess I mostly meant the general C level representation here with its restriction on a single padding bit. Any language with a different representation (such as more padding bits) would have to figure out some way of using the provided intrinsics to work on the padded types; likely by truncing before performing the operation, as I mentioned.

Yeah, I don’t think we need to generalize to arbitrary formats here.

It means more intrinsics, but I think it’s a better design than having a single intrinsic with flags for different cases.

In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It’s particularly annoying when generating code, which also affects middle-end optimizers. And in the end you’re going to be pattern-matching the constant arguments anyway.

I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.

if (Ty->isSigned())
IID = fixsmul
else
IID = fixumul

if (Ty->isSigned())
SignedParam = ConstantInt(1)
else
SignedParam = ConstantInt(0)

Well, it’s more like:
IID = E->isSaturating()
? (Ty->isSigned() ? fixsmul_sat : fixumul_sat)
: (Ty->isSigned() ? fixsmul : fixumul);
vs.
ConstantInt(Ty->isSigned()), ConstantInt(E->isSaturating())
and that’s assuming just the two dimensions of variation encoded in the intrinsic name, not all the stuff with scale and padding bits and rounding mode.

True, I omitted the saturating variants…

I still prefer multiple intrinsics but I don’t really have a good counterargument, except for the Clean Code boolean function argument recommendation :slight_smile:

I think that one went past me, sorry.

John.

Hi Leonard,

I’m sorry for the delay responding to this. Please let me +1 a few downstream comments:

+1 for adding a first class type. I agree with John that this is not as bad as it might seem. This allows the core LLVM IR types (e.g. add sub etc) to be defined on these types, and for legalize to do the right thing. This makes it easier for producers of IR (e.g. clang) to handle them in a uniform way. I don’t see a downside to this.

+1 to Philip’s point about getting much of an implementation in place before starting integration. This is a big enough piece of work that we should be confident in the design direction, and I don’t want to get another partial transition into the codebase that may or may not get finished.

+1 to the point about figuring out the saturation situation separately from the fixed point situation. This has come up before and gets one-off solutions (e.g. in the vector space) and it would be better to have a holistic solution to this that applies everywhere - we can migrate the existing intrinsics over to a better builtin answer once it exists.

In short, I’m very strongly in the camp of supporting a new builtin type. I don’t see any downside at all to it.

-Chris

Hi,

Hi Leonard,

I’m sorry for the delay responding to this. Please let me +1 a few downstream comments:

+1 for adding a first class type. I agree with John that this is not as bad as it might seem. This allows the core LLVM IR types (e.g. add sub etc) to be defined on these types, and for legalize to do the right thing. This makes it easier for producers of IR (e.g. clang) to handle them in a uniform way. I don’t see a downside to this.

Assuming you meant 'operators' rather than 'types', there's an issue with that. Every line of code in LLVM today assumes that the add, sub et al. operate on integer types. Fixing that assumption sounds like quite a bit of work to me, so I think we would likely have to add new operators for all of the fixed point types. Some of those operators (like non-saturating add and sub) would essentially do the exact same thing as regular add and sub anyway, so we'd get needless duplication.

Regarding legalization: depending on where we do legalization of fixed-point types/operations, it could get hairy. The type system during lowering is pretty simplistic. Fixed-point types would add multiple dimensions to the system (width, scale, possibly padding if we include that), which I think might be hard to represent efficiently in the MVT/EVT system.

+1 to Philip’s point about getting much of an implementation in place before starting integration. This is a big enough piece of work that we should be confident in the design direction, and I don’t want to get another partial transition into the codebase that may or may not get finished.

So does this mean it would be implemented using integers and intrinsics first, and then moved to a new type solution?

If the first solution ends up working well, would it be an unreasonable option to use it instead?

/ Bevin

I would like to add that this would very useful for supporting Ada.

Luke.

Hi all,

We will be adding intrinsics for now to get a working version of fixed
point arithmetic implemented in clang faster, but will still consider
either adding a new fixed point type in LLVM or new operations that
perform these fixed point operations on integers. This is a list of
all the intrinsics that I plan to implement and wanted to see if
anyone had any feedback on their signatures/semantics.

; Signed and unsigned saturation
; Unsigned saturation would primarily just saturate to W_MAX since you
can’t really saturate towards 0 for an unsigned number
declare iN @llvm.ssaturate(iN %Val, i32 %W)
declare iN @llvm.usaturate(iN %Val, i32 %W)

; Saturated addition and subtraction.
; Scale is not necessary since fixed addition/subtraction does not
care about the scale as long as the operands have the same scale. This
will require a shift beforehand regardless.
; We cannot use the existing overflow add/sub intrinsics since they
only indicate overflow but not direction to saturate in.
declare iN @llvm.satsadd(iN %L, iN %R)
declare iN @llvm.satuadd(iN %L, iN %R)
declare iN @llvm.satssub(iN %L, iN %R)
declare iN @llvm.satusub(iN %L, iN %R)

; Multiplication and division
declare iN @llvm.fixsmul(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixumul(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixsdiv(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixudiv(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixsmul.sat(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixumul.sat(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixsdiv.sat(iN %L, %iN R, i32 %Scale)
declare iN @llvm.fixudiv.sat(iN %L, %iN R, i32 %Scale)

- Leo