[RFC] Add a simple soft-float class

I'm currently working on stripping usage of `UnsignedFloat` out of
`-block-freq`. Before I finish...

I propose adding a soft-float class to llvm/Support/ or llvm/Analysis/.

  - Portable (unlike hard-floats).
      - Well-defined for all platforms and safe to use in code.

  - Easy to use and reason about (unlike `APFloat`).
      - Uses operators.
      - No special numbers.
      - Every operation well-defined (even divide-by-zero).
      - No rounding modes.
      - Digits represented simply as a 32-bit or 64-bit integer.

  - Lowers barrier to entry for writing passes with weight-like logic
    (and other uses of numbers).
      - Mapping to `uint64_t` is often a hard problem in itself.
      - First iteration can focus on the pass logic; second iteration
        can remove the floats.

  - Already written and (mostly) tested.
      - There's currently one in `-block-freq` called `UnsignedFloat`.
        Tests are out of tree, though.

IIUC, the consensus here is: hard-floats are worse than soft-floats,
and both are worse integers. However, we have passes in the tree (e.g.,
spill placement) that use hard-floats anyway.

With a small amount of effort, I can finish testing `UnsignedFloat`
(and/or `SignedFloat`) and we can remove hard-floats entirely by using
these classes as drop-in replacements. The long-term answer (for most
passes) is mapping to `uint64_t`, but this gets rid of undefined
behaviour *now*, and provides a simple and practical alternative to
hard-floats going forward.

Thoughts?

-- dpnes

I would just like to preemptively reject any argument that having a convenient soft-float is bad because it discourages moving to a more efficient fixed-point representation. The evidence shows otherwise. It took two years to get a working fixed point representation for block frequency and still don’t have one for spill weight. We should have been using a convenient and relatively efficient soft-float, as you have implemented, all this time. In the case of block frequency we would have been able to focus on design and fixing the algorithm instead of representational problems from the beginning. In the case of spill weight, I believe it would have made it more apparent and obvious that we still need to develop a better representation, and we wouldn’t have wasted time debugging buildbot unpredictability.

I think it should be easy to develop new passes incrementally without setting up barriers. Your soft-float will help. Beyond that I have absolutely no opinion on its naming or location.

-Andy

If we require the host to have sse2 or other IEEE 754 conformant implementation, would it be possible to use hardware float?

Hi Duncan,

Some of these don’t make a lot of sense:

  • Easy to use and reason about (unlike APFloat).
  • Uses operators.
  • No special numbers.

What semantics do you propose to use instead? At some point, people will hit the boundaries of their range, and you need to do something sensible there.

  • Every operation well-defined (even divide-by-zero).

Divide-by-zero is actually well-defined in IEEE 754:
x / +0.0 == +Inf
x / -0.0 == -Inf
(-)0.0 / (-)0.0 == NaN

  • No rounding modes.

You can’t implement a finite precision floating point representation without any rounding. I assume what you mean here is only one rounding mode, i.e. round-nearest-ties-to-even.

  • Digits represented simply as a 32-bit or 64-bit integer.

Isn’t this the same as the significand of an IEEE float? If you go with 64-bit, it sounds like you’re defining something very close to Intel’s FP80.

—Owen

As I see it:

  • float is wanted because often it’s too hard to correctly guess the range of values in advance

  • the uses are generally diagnostic or heuristic and the exact characteristics of rounding in the LSB is unimportant

  • it is valuable to have the same bit-accurate results (and subsequent control flow based on it) on different platforms, regardless of whether or what FP hardware they have.

  • higher performance is better, but if you need software FP anywhere then it’s useful to make that software FP as fast as possible.

  • if the software FP is fast enough (even if not fully IEEE754 conformant) then the penalty for not using hardware FP is not all that large.

Therefore: if you’re going to transparently use IEEE754 hardware where available, then you also have to make the software implementation bit-accurate to IEEE754 standards.

That puts a (perhaps unnecessary) performance penalty on the software FP.

It would be useful to have numbers for how much penalty.

In the presence of fast full result integer multiply, fast shifts by variable amounts, and (possibly) count-leading-zeroes you can do a non-IEEE754 software FP add in maybe a dozen clock cycles.

Yes, that would be very similar in practical terms to x86/68k FP80, but
just without the necessity of spending time guaranteeing precisely the same
NaN or norm/denorm or LSB results as x87 hardware.

You'd have 9 or 11 more bits of mantissa than in standard IEEE too, which
is probably more useful than worrying about slight fuzziness in the LSB —
especially as the results would be precisely the same everywhere.

I'm not sure what's stopping us -- I just know they don't work well.

See responses to Andy's suggestion in March [1] and the ugly patch [2] I committed in r206765 for some other context.

[1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140317/209336.html
[2]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140421/213726.html

Hi Duncan,

Some of these don’t make a lot of sense:

Sorry -- I think I was assuming too much knowledge about what I committed as
part of the BlockFrequencyInfo rewrite.

What's committed there is a class called UnsignedFloat that wraps the
following with a bunch of API:

    template <class UIntT> struct UnsignedFloat {
        UIntT Digits;
        uint16_t Exponent;
    };

There are some static asserts to restrict UIntT to either `uint32_t` or
`uint64_t`. I have tests that are out of tree. The `uint32_t` version uses
64-bit math for divide and multiply while the `uint64_t` version uses long
division etc. -- otherwise they share implementation. They both defer to
APFloat for non-trivial string conversion.

I don't think it will be much work to clean this up and create a SignedFloat
variant that adds a `bool` for the sign (and shares most of the impl).

I'll respond to your specific points inline.

- Easy to use and reason about (unlike `APFloat`).
     - Uses operators.
     - No special numbers.

What semantics do you propose to use instead? At some point, people will hit the boundaries of their range, and you need to do something sensible there.

Simple saturation.

     - Every operation well-defined (even divide-by-zero).

Divide-by-zero is actually well-defined in IEEE 754:
  x / +0.0 == +Inf
  x / -0.0 == -Inf
  (-)0.0 / (-)0.0 == NaN

Point taken!

     - No rounding modes.

You can’t implement a finite precision floating point representation without any rounding. I assume what you mean here is only one rounding mode, i.e. round-nearest-ties-to-even.

Yes. I was emphasizing that rounding modes aren't part of the API.

     - Digits represented simply as a 32-bit or 64-bit integer.

Isn’t this the same as the significand of an IEEE float? If you go with 64-bit, it sounds like you’re defining something very close to Intel’s FP80.

Yup. The `uint64_t` version is similar to a non-conforming and slow FP80
that's always in denormal mode, but the *same* non-conforming on every
platform.

As I see it:

- float is wanted because often it's too hard to correctly guess the range of values in advance

- the uses are generally diagnostic or heuristic and the exact characteristics of rounding in the LSB is unimportant

- it *is* valuable to have the same bit-accurate results (and subsequent control flow based on it) on different platforms, regardless of whether or what FP hardware they have.

- higher performance is better, but if you need software FP anywhere then it's useful to make that software FP as fast as possible.

Right.

- if the software FP is fast enough (even if not fully IEEE754 conformant) then the penalty for not using hardware FP is not all that large.

Also, given the assumption that hardware FP is off-limits, then when the
penalty is too high, someone should work out the range of values and use
integers.

Therefore: if you're going to transparently use IEEE754 hardware where available, then you also have to make the software implementation bit-accurate to IEEE754 standards.

That puts a (perhaps unnecessary) performance penalty on the software FP.

It would be useful to have numbers for how much penalty.

In the presence of fast full result integer multiply, fast shifts by variable amounts, and (possibly) count-leading-zeroes you can do a non-IEEE754 software FP add in maybe a dozen clock cycles.

FYI, APFloat is bit-accurate. Someone could wrap APFloat to give it the
same API, and compare them to see the speed difference. But I think the
simplicity of UnsignedFloat has other benefits.

This may be a stupid question, but I didn’t see it addressed anywhere.

Why not just use APFloat? (maybe even APInt would be sufficient for the desired semantics?)

Is it purely a performance concern? Because it seems like you (and Andy) are supporting this as a suboptimal but convenient replacement for a proper integer-based representation. On the other hand, if it’s just a performance optimization over APFloat/APInt, then this soft-float thing becomes quick hack + optimization which is a combination that gives me a pretty bad gut feeling (but may still be warranted, of course).

– Sean Silva

If we require the host to have sse2 or other IEEE 754 conformant
implementation, would it be possible to use hardware float?

I don't think that IEEE 754 actually guarantees bit-exact results in all
cases.

-- Sean Silva

In concept, I have no problems with having a soft float implementation in tree. I agree with your points regarding platform independence and incrementalism. If you wanted to use an IEEE compliant implementation (possibly with a few restrictions, e.g. rounding mode, trap on bounds violations, etc..), I'd be fine with that.

I'm concerned by the proposed semantics mentioned so far. Without very strong justification and a very carefully written specification, I would resist any alternate semantics. My experience has been that floating point is already confusing enough and that getting any floating point implementation "right" (both implementation *and* usable semantics) is a very hard problem. I don't believe that we have either the resources or the interest in solving that problem, and that a partial solution is worse than nothing at all.

Purely out of curiosity, for prototyping do we actually want floating point? Or would a rational implementation be better? I'm not familiar with these areas, so I can't really judge.

Philip

+1

I strongly question the wisdom of trying to invent our own floating point representation. IEEE 754 is, honestly, pretty damn well designed. Unless we all go spend a few years studying numerical analysis, we’re not going to design something here that is better, and are likely to get caught in numerical traps that using IEEE floating point would have protected us from.

That said, I’m sympathetic to the desire for a good, convenient soft-float library. I think the correct solution is to *make APFloat not suck*, rather than inventing something new.

—Owen

IEEE 754 accuracy guarantees are expressed in terms of ULPs: Units-in-the-Last-Place. Most of the primitive operations are accurate to +/- 0.5 ULPs, which is equivalent to saying that there is a single bit-accurate result (for a given rounding mode). Once you get into transcendentals, you start getting operations with much wider ULP bounds.

—Owen

If we require the host to have sse2 or other IEEE 754 conformant
implementation, would it be possible to use hardware float?

I don't think that IEEE 754 actually guarantees bit-exact results in all
cases.

IEEE 754 accuracy guarantees are expressed in terms of ULPs:
Units-in-the-Last-Place.

Yeah, that's what I was thinking of, but...

Most of the primitive operations are accurate to +/- 0.5 ULPs, which is

equivalent to saying that there is a single bit-accurate result (for a
given rounding mode).

... I wasn't aware that the primitives were spec'd with sufficiently tight
ULP bounds for bit-exact results. Good to know!

Once you get into transcendentals, you start getting operations with much
wider ULP bounds.

Yeah, this was the specific counter-example that I was aware of.

Why not just use APFloat? (maybe even APInt would be sufficient for the desired semantics?)

APFloat's API is awkward to use when doing math -- it's designed for
compiling and optimizing code that uses floats, not for use as a simple
number representation.

It is possible to wrap APFloat though.

Is it purely a performance concern?

Largely (assuming the other option is wrapping APFloat).

Also, wrapping APFloat to give similar semantics is a fair bit a work.

UnsignedFloat interoperates well with integers -- it's easy to compose out
of (and decompose into) exponent and digit parts. Again, this is API that
could be added as a wrapper around APFloat.

Because it seems like you (and Andy) are supporting this as a suboptimal but convenient replacement for a proper integer-based representation.

In a way, but I think of it as a suboptimal but safe replacement for
hardware floats. We currently use hardware floats as a suboptimal but
convenient replacement for proper integer-based representations -- I'd like
to swap in a soft-float instead.

For some problems, proper integer-based representations are really hard,
and can resemble a poor man's soft-float even when done right.

On the other hand, if it's just a performance optimization over APFloat/APInt, then this soft-float thing becomes quick hack + optimization which is a combination that gives me a pretty bad gut feeling (but may still be warranted, of course).

It's a performance optimization over: a heavily wrapped APFloat to give it
simple semantics. Not sure if that gives you a better feeling...

In concept, I have no problems with having a soft float implementation in tree. I agree with your points regarding platform independence and incrementalism. If you wanted to use an IEEE compliant implementation (possibly with a few restrictions, e.g. rounding mode, trap on bounds violations, etc..), I'd be fine with that.

Something IEEE-compliant would be doable by wrapping APFloat. I prefer
something simpler.

I'm concerned by the proposed semantics mentioned so far. Without very strong justification and a very carefully written specification, I would resist any alternate semantics. My experience has been that floating point is already confusing enough and that getting any floating point implementation "right" (both implementation *and* usable semantics) is a very hard problem. I don't believe that we have either the resources or the interest in solving that problem, and that a partial solution is worse than nothing at all.

The specification is simple, though. For UnsignedFloat, you have digits and
an exponent that represent digits*2^exponent. Exceeding bounds saturates at
min or max. Divide-by-zero gives you the max. Rounding is round-half-away-
from-zero. Overloaded operators +-*/ do what you expect. Overloaded
comparison operators do the right thing. Conversion to/from integers works
as expected. That's it.

Justification is that it's better for compilers than a hard float, it has a
large dynamic range, it's simple to use, and its implementation is
straightforward.

Encouraging floating point-based code in the compiler is a non-goal. But
given the use of hard-floats already, having a simple way to talk about
numbers with a large dynamic range appears to be valuable.

Purely out of curiosity, for prototyping do we actually want floating point? Or would a rational implementation be better? I'm not familiar with these areas, so I can't really judge.

Probably depends on the problem. I wrote UnsignedFloat since it was the
simplest way to encapsulate the dynamic range in block frequency info. In
particular, the order of operations in calculating block frequency can
exhaust precision of a `uint64_t`-based rational at either end before
returning back.

Incidentally, we already have a (limited) version of a rational in
BranchProbability -- also used in block frequency info. The rational part
could be extracted out, I'm sure.

> Because it seems like you (and Andy) are supporting this as a
suboptimal but convenient replacement for a proper integer-based
representation.

In a way, but I think of it as a suboptimal but safe replacement for
hardware floats. We currently use hardware floats as a suboptimal but
convenient replacement for proper integer-based representations -- I'd
like
to swap in a soft-float instead.

For some problems, proper integer-based representations are really hard,
and can resemble a poor man's soft-float even when done right.

In the fixed-point package I used to own, doing anything complicated
meant converting to soft-float and back. ("Complicated" includes
division.) Soft-float behaves more intuitively and doesn't require
the client to manage scale, which is a big plus in my book.
--paulr

>
> Why not just use APFloat? (maybe even APInt would be sufficient for the
desired semantics?)

APFloat's API is awkward to use when doing math -- it's designed for
compiling and optimizing code that uses floats, not for use as a simple
number representation.

It is possible to wrap APFloat though.

> Is it purely a performance concern?

Largely (assuming the other option is wrapping APFloat).

Also, wrapping APFloat to give similar semantics is a fair bit a work.

UnsignedFloat interoperates well with integers -- it's easy to compose out
of (and decompose into) exponent and digit parts. Again, this is API that
could be added as a wrapper around APFloat.

> Because it seems like you (and Andy) are supporting this as a suboptimal
but convenient replacement for a proper integer-based representation.

In a way, but I think of it as a suboptimal but safe replacement for
hardware floats. We currently use hardware floats as a suboptimal but
convenient replacement for proper integer-based representations -- I'd like
to swap in a soft-float instead.

Cool, this makes sense to me.

-- Sean Silva

I'm certainly not suggesting this would be better in general than IEEE 754.

But I think it's suitable for the sorts of places we currently use
hard-floats. I guess you (and Philip) are saying there are dragons here?

Although "making APFloat not suck" might be ideal, we don't have the code
written for that.