Representing -ffast-math at the IR level

> > Link-time optimization will sometimes result in "fast-math"
> > functions
> being
> > inlined into non-fast math functions and vice-versa. This pretty
> > much inevitably means that per-instruction fpmath options are
> > required.
>
> I guess it would be user error if a strict function used the results
> of a non-strict function (explicitly compiled with -ffast-math) and
> complain about loss of precision. In that case, the inlining keeping
> the option per-line makes total sense.
>

It's not a user error. User knows his code and accuracy of his code
much better, than any compiler could possible do and may have strong
reasons to specify fast-math for one function and not specify for
another.

I strongly agree. Scientific users often compile different functions
with different floating-point accuracy flags and it is important to
respect that. Furthermore, a function that requires high accuracy can
use a low accuracy result as part of the computation. Doing so is
commonplace. As a quick example, a result could be:

result = (high accuracy result near 1) + 1e-10 * (low accuracy result
near 1).

-Hal

> Hi Dmitry,
>
>
> The kinds of transforms I think can reasonably be done with the
> current
>> information are things like: x + 0.0 -> x; x / constant -> x *
>> (1 / constant) if
>> constant and 1 / constant are normal (and not denormal) numbers.
>>
>> The particular definition is not that important, as the fact that
>> this definition exists :slight_smile: I.e. I think we need a set of
>> transformations to be defined
>> (as enum the most likely, as Renato pointed out) and an interface,
>> which accepts
>> "fp-model" (which is "fast", "strict" or whatever keyword we may
>> end up) and the
>> particular transformation and returns true of false, depending
>> whether the definition of fp-model allows this transformation or
>> not. So the transformation
>> would request, for example, if reassociation is allowed or not.
>>
>
> at some point each optimization will have to decide if it is going
> to be applied
> or not, so that's not really the point. It seems to me that there
> are many many
> possible optimizations, and putting them all as flags in the
> metadata is out of
> the question. What seems reasonable to me is dividing transforms
> up into a few
> major (and orthogonal) classes and putting flags for them in the
> metadata.
>
> Optimization decision to apply or not should be based on strict
> definition
of what is allowed or not, but not on optimization interpretation of
"fast" fp-model (for example). Say, after widely adopting "fast"
fp-model in the compiler, you suddenly realize that the definition
is wrong and allowing some type of transformation is a bad idea (for
any reason - being incompatible with some compiler or not taking into
account some corner cases or for whatever other reason), then you'll
have to go and fix one million places where the decision is made.

Alternatively, defining classes of transformation and making
optimization to query for particular types of transformation you keep
it under control.

> Another point, important from practical point of view, is that
> fp-model is
>> almost always the same for any instructions in the function (or
>> even module) and
>> tagging every instruction with fp-model metadata is quite a
>> substantial waste of
>> resources.
>>
>
> I measured the resource waste and it seems fairly small.
>
>
> So it makes sense to me to have a default fp-model defined for the
>
>> function or module, which can be overwritten with instruction
>> metadata.
>>
>
> That's possible (I already discussed this with Chandler), but in my
> opinion is
> only worth doing if we see unreasonable increases in bitcode size
> in real code.

What is reasonable or not is defined not only by absolute numbers
(0.8% or any other number). Does it make sense to increase bitcode
size by 1% if it's used only by math library writes and a couple
other people who reeeeally care about precision *and* performance at
the same time and knowledgeable enough to restrict precision on
particular instructions only? In my experience it's extremely rare
case, when people would like to have more than compiler flags to
control fp accuracy and ready to deal with pragmas (when they are
available).

>
>
> I also understand that clang generally derives GCC switches and fp
>> precision
>> switches are not an exception, but I'd like to point out that
>> there's a far more
>> orderly way of defining fp precision model (IMHO, of course :slight_smile: ),
>> adopted by MS
>> and Intel Compiler (-fp-model [strict|precise|fast]). It would be
>> nice to have
>> it adopted in clang.
>>
>> But while adding MS-style fp-model switches is different topic
>> (and I guess
>> quite arguable one), I'm mentioning it to show the importance of
>> an idea of
>> abstracting internal compiler fp-model from external switches
>>
>
> The info in the meta-data is essentially a bunch of external
> switches which will then be used to determine which transforms are
> run.
>
>
> and exposing
>
>> a querying interface to transformations. Transformations shouldn't
>> care about
>> particular model, they need to know only if particular type of
>> transformation is
>> allowed.
>>
>
> Do you have a concrete suggestion for what should be in the
> metadata?
>

I would define the set of transformations, such as (i can help with
more complete list if you prefer):

   - reassociation
   - x+0.0=>x
   - x*0.0=>0.0
   - x*1.0=>x
   - a/b => a* 1/b
   - a*b+c=>fma(a,b,c)
   - ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
   - value unsafe transformation (for aggressive fp optimizations,
like a*b+a*c => a(b+c)) and other of the kind.

and several aliases for "strict", "precise", "fast" models (which are
effectively combination of flags above).

From a user's perspective, I think that it is important to have

categories defining:
- finite math (as precise as normal, but might do odd things for NaNs
   or Infty, etc.) - I'd suppose this is a strictest "fast" option.
- algebraic-equivalence - The compiler might do anything that is
   algebraically the same (even if the numerics could be quite
   different) - This is probably the loosest "fast" option.

-Hal

I would define the set of transformations, such as (i can help with
more complete list if you prefer):

  • reassociation
  • x+0.0=>x
  • x*0.0=>0.0
  • x*1.0=>x
  • a/b => a* 1/b
  • a*b+c=>fma(a,b,c)
  • ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
  • value unsafe transformation (for aggressive fp optimizations,

like ab+ac => a(b+c)) and other of the kind.

and several aliases for “strict”, “precise”, “fast” models (which are
effectively combination of flags above).

From a user’s perspective, I think that it is important to have
categories defining:

  • finite math (as precise as normal, but might do odd things for NaNs
    or Infty, etc.) - I’d suppose this is a strictest “fast” option.

That’s exactly “/fp:precise” or “-fp-model precise” in terms of MS and Intel Compiler options.

  • algebraic-equivalence - The compiler might do anything that is
    algebraically the same (even if the numerics could be quite
    different) - This is probably the loosest “fast” option.

That’s “/fp:fast” or “-fp-model fast” in terms on MS and Intel Compiler options.

Intel also supports “-fp-model fast=2” for the most aggressive optimizations.

Dmitry.

Here's a revised patch, plus patches showing how fpmath metadata could be
turned on in clang and dragonegg (it seemed safest for the moment to
condition on -ffast-math rather than on one of the flags implied by
-ffast-math).

Major changes:

- The FPMathOperator class can no longer be used to change math settings,
only to read them. Currently it can be queried for accuracy info. I split
the accuracy methods into two: one for 'fast' accuracy, one for a numerical
accuracy (which returns +infty when the accuracy is 'fast').

- MDBuilder got support for creating fpmath metadata, in particular there is
function that returns the appropriate settings for -ffast-math.

- A default fpmath setting can be supplied to IRBuilder, which will then apply
it to all floating point operations. It is also possible to specify specific
fpmath metadata when creating an operation.

Ciao, Duncan.

fastm.diff (18.3 KB)

fastm-clang.diff (1.46 KB)

fastm-dragonegg.diff (563 Bytes)

Hi Duncan,

I like the changes to IRBuilder and how the operator can't change it.
Looks a lot safer (mistake-wise) and more convenient.

This function won't to remove a previously set tag, which could be
used by optimisations or inlining.

+ Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const {
+ if (!FPMathTag)
+ FPMathTag = DefaultFPMathTag;
+ if (FPMathTag)
+ I->setMetadata(LLVMContext::MD_fpmath, FPMathTag);
+ return I;
+ }

If you want to keep it as only Add, then make FPMathTag = 0 so that
you can easily add the default by just calling AddFPMathTag(instr);

But I'd add a ClearFPMathTag function for optimisations/inlining. Maybe later.

Also, would be good to make sure the instruction is, in fact, a
floating point operation. Either via restricting the type or asserting
on it.

Hi Renato,

I like the changes to IRBuilder and how the operator can't change it.
Looks a lot safer (mistake-wise) and more convenient.

thanks!

This function won't to remove a previously set tag, which could be
used by optimisations or inlining.

This is private to IRBuilder and is only applied to newly created instructions,
thus removing tags is not useful. I guess it could be exposed for general use -
is that what you are suggesting? If so, I think it would be better if MDBuilder
got methods for applying metadata such as:
   SetTBAAMetadata(Instruction *I, MDNode *MD);
   SetFPMathMetadata(Instruction *I, MDNode *MD);
along with getters. I will add that.

+ Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const {
+ if (!FPMathTag)
+ FPMathTag = DefaultFPMathTag;
+ if (FPMathTag)
+ I->setMetadata(LLVMContext::MD_fpmath, FPMathTag);
+ return I;
+ }

If you want to keep it as only Add, then make FPMathTag = 0 so that
you can easily add the default by just calling AddFPMathTag(instr);

But I'd add a ClearFPMathTag function for optimisations/inlining. Maybe later.

See above.

Also, would be good to make sure the instruction is, in fact, a
floating point operation. Either via restricting the type or asserting
on it.

That would be appropriate for MDBuilder - I will do it.

CIao, Duncan.

Btw, the assert in MDNode *CreateFPMath(float Accuracy) is
unnecessary. Trichotomy guarantees it will be right. :wink:

Hi Renato,

Btw, the assert in MDNode *CreateFPMath(float Accuracy) is
unnecessary. Trichotomy guarantees it will be right. :wink:

what if Accuracy is NaN?

Ciao, Duncan.

D'oh. :wink:

Thanks for the updates!

Minor comments:

  • if (!Accuracy)
  • // If it’s not a floating point number then it must be ‘fast’.
  • return HUGE_VALF;

Can we add an assert instead of a comment? It’s just as documenting and will catch any goofs.

  • // If it’s not a floating point number then it must be ‘fast’.
  • return !isa(MD->getOperand(0));

Here as well.

  • if (ConstantFP *CFP0 = dyn_cast_or_null(Op0)) {
  • APFloat Accuracy = CFP0->getValueAPF();
  • Assert1(Accuracy.isNormal() && !Accuracy.isNegative(),
  • “fpmath accuracy not a positive number!”, &I);

To be pedantic for a moment, zero is not a positive number. What about asserting these individually to give us more clear asserts if they fire? That also makes the string easier to write: “fpmath accuracy is a negative number!”.

  • /// SetDefaultFPMathTag - Set the floating point math metadata to be used.
  • void SetDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag = FPMathTag; }

This should be ‘setDefault…’ much like ‘getDefault…’ above.

  • Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const {

Another bad case, but I think this instruction is gone…

  • MDString *GetFastString() const {
  • return CreateString(“fast”);
  • }

‘getFastString’.

  • /// CreateFastFPMath - Return metadata with appropriate settings for 'fast
  • /// math’.

I would prefer the more modern doxygen style:

/// \brief Return metadata …

  • MDNode *CreateFastFPMath() {

Capitalization.

The capitalization and doxygen style comments apply to the next function as well.

Both the Clang and DragonEgg patches look good, but both need test cases. =]

[Resend as I forgot this list doesn't set reply-to to list. Oops]

Link-time optimization will sometimes result in "fast-math" functions being
inlined into non-fast math functions and vice-versa. This pretty much
inevitably means that per-instruction fpmath options are required.

I guess it would be user error if a strict function used the results
of a non-strict function (explicitly compiled with -ffast-math) and
complain about loss of precision. In that case, the inlining keeping
the option per-line makes total sense.

As a writer of numerical code, the perspective that's being taken
makes things seem bizarre. I would never write code/use optimizations
that I expect to produce inaccurate results. What I would do is write
code which, _for the input data that it is going to use_, is not going
to be (to any noticeable degree) any less accurate if some
optimzations are being used. (Clearly it's well known that for most
optimizations there are some sets of input data that cause big changes
in accuracy; however there seems no neat way of telling the compiler
that these aren't going to occur other than by specifying
modes/allowed transformations.) As such, inlining code that uses more
optimizations ("fast-math flagged code") into more sensitive code that
expects those inputs need "strict math" to retain the accuracy through
to the result.

My personal interest is in automatic differentiation, where there's
two kinds of "variable entities" in the
code-after-auto-differentiation: original variables and derivatives,
and it is desirable to have different fp optimizations used on the two
kinds of element. (It's quite important that 0*x-> 0 is used to shrink
down the amount of "pointless" instructions generated for
derivatives.) However, I have to admit I can't think of any other
problem where I'd want control over the fp-optimizations used on a
per-instruction level, so I don't know if it's worth it for the LLVM
codebase in general.

Finally, a minor aside: I was talking to Duncan Sands at EuroLLVM and
discussing whether the FP optimizations would apply to vector op as
well as scalar ops, and he mentioned that the plan was to mirror the
integer case where vector code should be optimized as well as scalar
code.

Since there's no FP optimizations yet, I looked at what LLVM produces
for integer code for

t0 := a * b
t1 := c * d
t2 := t0 + t1
t3 := t2 + e
return t3

in the 16 cases where both a and c are from {variable, -1, 0, +1} in
the scalar and vector cases. The good news is that in each case both
scalar and vector code gets fully optmized; interstingly however
different choices get made in a couple of cases between vector and
scalar. (Basically given an expression like w+x+y-z there are various
ways to build this from binary instructions, and different choices
seem to be made.)

Anyway, I'll rerun this test code for FP mode once there are some FP
optimizations implemented.

HTH,
Dave Tweed

Hi Chandler,

Minor comments:
+ if (!Accuracy)
+ // If it's not a floating point number then it must be 'fast'.
+ return HUGE_VALF;

Can we add an assert instead of a comment? It's just as documenting and will
catch any goofs.

Done.

+ // If it's not a floating point number then it must be 'fast'.
+ return !isa<ConstantFP>(MD->getOperand(0));

Here as well.

+ if (ConstantFP *CFP0 = dyn_cast_or_null<ConstantFP>(Op0)) {
+ APFloat Accuracy = CFP0->getValueAPF();
+ Assert1(Accuracy.isNormal() && !Accuracy.isNegative(),
+ "fpmath accuracy not a positive number!", &I);

To be pedantic for a moment, zero is not a positive number.

Zero is not allowed. The isNormal call will return false for zero.

  What about asserting

these individually to give us more clear asserts if they fire? That also makes
the string easier to write: "fpmath accuracy is a negative number!".

It will fire on: zero, negative numbers, NaN, +-infinity. Personally I reckon
"fpmath accuracy not a positive number!" is reasonable for all of these.

+ /// SetDefaultFPMathTag - Set the floating point math metadata to be used.
+ void SetDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag = FPMathTag; }

This should be 'setDefault...' much like 'getDefault...' above.

The rest of IRBuilder uses a capital S in its setters, so I was just trying to
be consistent here.

+ Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const {

Another bad case, but I think this instruction is gone...

It still exists, and is also capitalized like that for consistency with the rest
of IRBuilder.

+ MDString *GetFastString() const {
+ return CreateString("fast");
+ }

'getFastString'.

OK, done - same for the others that are not in IRBuilder.

+ /// CreateFastFPMath - Return metadata with appropriate settings for 'fast
+ /// math'.

I would prefer the more modern doxygen style:

/// \brief Return metadata ...

+ MDNode *CreateFastFPMath() {

Capitalization.

The capitalization and doxygen style comments apply to the next function as well.

Both the Clang and DragonEgg patches look good, but both need test cases. =]

Yes, I'm working on those as well. See attached patch for the other changes.
It now also includes a unit test.

Ciao, Duncan.

fastm.diff (26.1 KB)

The LLVM patch looks good, and feel free to submit the clang one whenever you have a test case. ;] I figure you’re good to submit the dragonegg one, well, whenever you feel like it.

Duncan,

I have some issues with representing this as a single "fast" mode flag, which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea.

Having something called a "fast" or "relaxed" mode implies that it is less precise than whatever the standard mode is. However, C is notably sparse in specifying what exactly the standard mode is. The typical assumption is that it is the strict one-to-one translation to IEEE754 semantics, but no optimizing C compiler actually implements that.

Other languages are more interesting in this regard. Fortran, for instance, allows reassociation within parentheses. (Can that even be represented with instruction metadata?) OpenCL has a very fairly baseline mode, but specifies a number of specific options the user can enable to relax it (-cl-mad-enable, -cl-no-signed-zeros, -cl-unsafe-math-optimization (implies the previous two), -cl-finite-math-only, -cl-fast-relaxed-math (implies all prior)). GLSL has distinct desktop and embedded specifications that place different levels of constraint on implementations.

If we define the baseline behavior to be strict IEEE conformance, and then don't provide a more nuanced method of relaxing it, we're not going to be in a significantly better world than we are today. No reasonable implementation of these languages wants strict conformance (except maybe desktop-profile OpenCL) as their default mode, nor is there any way a universal definition of "fast" math can work for all of them.

--Owen

Hi Owen,

I have some issues with representing this as a single "fast" mode flag,

it isn't a single flag, that's the whole point of using metadata. OK, right
now there is only one option (the "accuracy"), true, but the intent is that
others will be added, and the meaning of accuracy tightened, later. MDBuilder
has a createFastFPMath method which is intended to produce settings that match
GCC's -ffast-math, however frontends will be able to specify whatever settings
they like if that doesn't suit them (i.e. createFPMath will get more arguments
as more settings become available).

Note that as the current option isn't actually connected to any optimizations,
there is nothing much to argue about for the moment.

My plan is to introduce a few simple optimizations (x + 0.0 -> x for example)
that introduce a finite number of ULPs of error, and hook them up. Thus this
does not include things like x * 0.0 -> 0.0 (infinite ULPs of error),
reassociation (infinite ULPs of error) or any other scary things.

  which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea.

Having something called a "fast" or "relaxed" mode implies that it is less precise than whatever the standard mode is. However, C is notably sparse in specifying what exactly the standard mode is. The typical assumption is that it is the strict one-to-one translation to IEEE754 semantics, but no optimizing C compiler actually implements that.

I think this is a misunderstanding of where I'm going, see above.

Other languages are more interesting in this regard. Fortran, for instance, allows reassociation within parentheses. (Can that even be represented with instruction metadata?)

I'm aware of Fortran parentheses (PAREN_EXPR in gcc). If it can't be expressed
well then too bad: reassociation can just be turned off and we won't optimize
Fortran as well as we could. (As mentioned above I have no intention of turning
on reassociation based on the current flag since it can introduce an unbounded
number of ULPs of error).

   OpenCL has a very fairly baseline mode, but specifies a number of specific options the user can enable to relax it (-cl-mad-enable, -cl-no-signed-zeros, -cl-unsafe-math-optimization (implies the previous two), -cl-finite-math-only, -cl-fast-relaxed-math (implies all prior)). GLSL has distinct desktop and embedded specifications that place different levels of constraint on implementations.

Yup.

If we define the baseline behavior to be strict IEEE conformance,

Which we do.

  and then don't provide a more nuanced method of relaxing it,

Allowing more nuanced ways is the reason for using metadata as explained above.

  we're not going to be in a significantly better world than we are today. No reasonable implementation of these languages wants strict conformance (except maybe desktop-profile OpenCL) as their default mode,

Strict conformance is what they get right now.

  nor is there any way a universal definition of "fast" math can work for all of them.

I agree, and I'm not trying to provide one.

Ciao, Duncan.

Hi Owen,

> I have some issues with representing this as a single "fast" mode
> flag,

it isn't a single flag, that's the whole point of using metadata.
OK, right now there is only one option (the "accuracy"), true, but
the intent is that others will be added, and the meaning of accuracy
tightened, later. MDBuilder has a createFastFPMath method which is
intended to produce settings that match GCC's -ffast-math, however
frontends will be able to specify whatever settings they like if that
doesn't suit them (i.e. createFPMath will get more arguments as more
settings become available).

Note that as the current option isn't actually connected to any
optimizations, there is nothing much to argue about for the moment.

My plan is to introduce a few simple optimizations (x + 0.0 -> x for
example) that introduce a finite number of ULPs of error, and hook
them up. Thus this does not include things like x * 0.0 -> 0.0
(infinite ULPs of error), reassociation (infinite ULPs of error) or
any other scary things.

If I understand what you're saying, I think that saying "infinite ULPs
of error" for x + 0, x*0, etc. is an unhelpful way of classifying
these. These are finite math assumptions, and ULPs of error should only
be computed assuming finite inputs. Accordingly, these are not at all
scary, but can be safely enabled when a finite math assumption is
allowed regardless of other user-required accuracy constraints.

-Hal