Representing -ffast-math at the IR level

The attached patch is a first attempt at representing "-ffast-math" at the IR
level, in fact on individual floating point instructions (fadd, fsub etc). It
is done using metadata. We already have a "fpmath" metadata type which can be
used to signal that reduced precision is OK for a floating point operation, eg

     %z = fmul float %x, %y, !fpmath !0
   ...
   !0 = metadata !{double 2.5}

indicates that the multiplication can be done in any way that doesn't introduce
more than 2.5 ULPs of error.

The first observation is that !fpmath can be extended with additional operands
in the future: operands that say things like whether it is OK to assume that
there are no NaNs and so forth.

This patch doesn't add additional operands though. It just allows the existing
accuracy operand to be the special keyword "fast" instead of a number:

     %z = fmul float %x, %y, !fpmath !0
   ...
   !0 = metadata !{!metadata "fast"}

This indicates that accuracy loss is acceptable (just how much is unspecified)
for the sake of speed. Thanks to Chandler for pushing me to do it this way!

It also creates a simple way of getting and setting this information: the
FPMathOperator class: you can cast appropriate instructions to this class
and then use the querying/mutating methods to get/set the accuracy, whether
2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 ULPs
accuracy rather than doing it by hand for example.

In addition it changes IRBuilder so that you can provide an accuracy when
creating floating point operations. I don't like this so much. It would
be more efficient to just create the metadata once and then splat it onto
each instruction. Also, if fpmath gets a bunch more options/operands in
the future then this interface will become more and more awkward. Opinions
welcome!

I didn't actually implement any optimizations that use this yet.

I took a look at the impact on aermod.f90, a reasonably floating point heavy
Fortran benchmark (4% of the human readable IR consists of floating point
operations). At -O3 (the worst), the size of the bitcode increases by 0.8%.
No idea if that's acceptable - hopefully it is!

Enjoy!

Duncan.

fastm-llvm.diff (13.9 KB)

fastm-clang.diff (2.19 KB)

Hi Duncan,

I'm not sure about this:

+ if (!Accuracy)
+ // If it's not a floating point number then it must be 'fast'.
+ return getFastAccuracy();

Since you allow accuracies bigger than 1 in setFPAccuracy(), integers
should be treated as float. Or at least assert.

Also, I'm thinking you should carry the annotation forward on all uses
of an annotated result, or make sure the floating point library
searches recursively for annotations on any dependency of the value
being analysed.

About creating annotations every time, I think this could be a nice
idea for a metadata factory functionality. Something that would cache
metadata, and in case of repetition, point to the same metadata. This
could be used for other optimisations (if I recall correctly, the
debug metadata does that already).

The problem with this is that, if an optimisation pass changes one,
you must make sure the other can also be changed, or split-on-write,
and that can cause some bloated code in the optimiser, which is not
ideal.

I think, for now, it's acceptable. But should be on request basis
(aka, only present if -fmath options are explicitly specified).

The rest of the patch looks sane, though. I like the idea of using
metadata, since the target code can easily ignore if it doesn't
support FP optimisations or IEEE strictness.

cheers,
--renato

Hi Renato,

I'm not sure about this:

+ if (!Accuracy)
+ // If it's not a floating point number then it must be 'fast'.
+ return getFastAccuracy();

Since you allow accuracies bigger than 1 in setFPAccuracy(), integers
should be treated as float. Or at least assert.

the verifier checks that the accuracy operand is either a floating point
number (ConstantFP) or the keyword "fast". If "Accuracy" is zero here
then that means it wasn't ConstantFP. Thus it must have been the keyword
"fast".

Also, I'm thinking you should carry the annotation forward on all uses
of an annotated result, or make sure the floating point library
searches recursively for annotations on any dependency of the value
being analysed.

Yes, this is a possible optimization (especially useful if functions from a
-ffast-math compiled module are inlined into functions from a non -ffast-math
compiled module or vice versa) but it is not needed for correctness. I plan to
implement optimizations using the metadata later.

About creating annotations every time, I think this could be a nice
idea for a metadata factory functionality. Something that would cache
metadata, and in case of repetition, point to the same metadata. This
could be used for other optimisations (if I recall correctly, the
debug metadata does that already).

Yes, Chandler suggested it already, and I think it is a good idea.

The problem with this is that, if an optimisation pass changes one,
you must make sure the other can also be changed, or split-on-write,
and that can cause some bloated code in the optimiser, which is not
ideal.

Optimizers don't (or shouldn't) change metadata because metadata is
uniqued: if you change it you change it for all users. Instead new
metadata has to be created. So I doubt that this is a problem in
practice. Also, I think metadata is intrinsically a weak value handle,
so if someone changes the metadata underneath the builder then its
copy will become null. When it sees that the cached metadata is null
then it can create it anew. So I think it should be possible to ensure
that this works well.

I think, for now, it's acceptable. But should be on request basis
(aka, only present if -fmath options are explicitly specified).

The rest of the patch looks sane, though. I like the idea of using
metadata, since the target code can easily ignore if it doesn't
support FP optimisations or IEEE strictness.

This kind of metadata must only relax IEEE strictness (and never tighten
it) because *metadata can always be discarded*. Discarding it must never
result in wrong IR/transforms, thus metadata can only give additional
permissions.

Ciao, Duncan.

Hi Duncan,

I’m not an expert in fp accuracy question, but I had quite a few experience dealing with fp accuracy problems during compiler transformations.

I think you have a step in the right direction, walking away from ULPs, which are pretty useless for the purpose of describing allowed fp optimizations IMHO. But using just “fast” keyword (or whatever else will be added in the future) is not enough without strict definition of this keyword in terms of IR transformations. For example, particular transformation may be interested if reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled (x+0=>x) and etc. If this definition is not given on infrastructure level, this may lead to disaster, when each transformation interprets “fast” in its own way.

Dmitry.

Hi Dmitry,

I'm not an expert in fp accuracy question, but I had quite a
few experience dealing with fp accuracy problems during compiler transformations.

I agree that it's a minefield which is why I intend to proceed conservatively.

I think you have a step in the right direction, walking away from ULPs, which
are pretty useless for the purpose of describing allowed fp optimizations IMHO.
But using just "fast" keyword (or whatever else will be added in the future) is
not enough without strict definition of this keyword in terms of IR
transformations. For example, particular transformation may be interested if
reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is
allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled
(x+0=>x) and etc. If this definition is not given on infrastructure level, this
may lead to disaster, when each transformation interprets "fast" in its own way.

This is actually the main reason for using metadata rather than a flag like the
"nsw" flag on integer operations: it is easily extendible with more info to say
whether reassociation is OK and so forth.

The kinds of transforms I think can reasonably be done with the current
information are things like: x + 0.0 -> x; x / constant -> x * (1 / constant) if
constant and 1 / constant are normal (and not denormal) numbers.

Ciao, Duncan.

the verifier checks that the accuracy operand is either a floating point
number (ConstantFP) or the keyword "fast". If "Accuracy" is zero here
then that means it wasn't ConstantFP. Thus it must have been the keyword
"fast".

I think it's assuming too much. If I write "foobar", it'd also work as
"fast", or even worse, if I write "strict"...

I'm not an expert in FP transformations, but as Dmitry said, there
could be more than one "fast" transformations. Maybe that should be an
enum somewhere, rather than an accuracy.

Can you accurately propagate accuracy ratios across multiple
instructions? Through multiple paths and PHI nodes? Not to mention
that the "Accuracy" is also FP, which has its own accuracy problems...
sigh...

This kind of metadata must only relax IEEE strictness (and never tighten
it) because *metadata can always be discarded*. Discarding it must never
result in wrong IR/transforms, thus metadata can only give additional
permissions.

Makes sense.

I think you have a step in the right direction, walking away from ULPs, which
are pretty useless for the purpose of describing allowed fp optimizations IMHO.
But using just “fast” keyword (or whatever else will be added in the future) is
not enough without strict definition of this keyword in terms of IR
transformations. For example, particular transformation may be interested if
reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is
allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled
(x+0=>x) and etc. If this definition is not given on infrastructure level, this
may lead to disaster, when each transformation interprets “fast” in its own way.

This is actually the main reason for using metadata rather than a flag like the
“nsw” flag on integer operations: it is easily extendible with more info to say
whether reassociation is OK and so forth.

The kinds of transforms I think can reasonably be done with the current
information are things like: x + 0.0 → x; x / constant → x * (1 / constant) if
constant and 1 / constant are normal (and not denormal) numbers.

The particular definition is not that important, as the fact that this definition exists :slight_smile: I.e. I think we need a set of transformations to be defined (as enum the most likely, as Renato pointed out) and an interface, which accepts “fp-model” (which is “fast”, “strict” or whatever keyword we may end up) and the particular transformation and returns true of false, depending whether the definition of fp-model allows this transformation or not. So the transformation would request, for example, if reassociation is allowed or not.

Another point, important from practical point of view, is that fp-model is almost always the same for any instructions in the function (or even module) and tagging every instruction with fp-model metadata is quite a substantial waste of resources. So it makes sense to me to have a default fp-model defined for the function or module, which can be overwritten with instruction metadata.

I also understand that clang generally derives GCC switches and fp precision switches are not an exception, but I’d like to point out that there’s a far more orderly way of defining fp precision model (IMHO, of course :slight_smile: ), adopted by MS and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to have it adopted in clang.

But while adding MS-style fp-model switches is different topic (and I guess quite arguable one), I’m mentioning it to show the importance of an idea of abstracting internal compiler fp-model from external switches and exposing a querying interface to transformations. Transformations shouldn’t care about particular model, they need to know only if particular type of transformation is allowed.

Dmitry.

Hi Renato,

the verifier checks that the accuracy operand is either a floating point
number (ConstantFP) or the keyword "fast". If "Accuracy" is zero here
then that means it wasn't ConstantFP. Thus it must have been the keyword
"fast".

I think it's assuming too much. If I write "foobar", it'd also work as
"fast", or even worse, if I write "strict"...

if you use "foobar" the verifier will reject your IR as invalid. That said,
I'm not in love with the word "fast" here. Maybe "finite" would be better.

I'm not an expert in FP transformations, but as Dmitry said, there
could be more than one "fast" transformations.

There's a difference between transformations that interact properly with
NaNs and infinities (eg: x + 0.0 -> x) and those that don't (eg: x * 0.0
-> x). There is also a difference between those that introduce a uniformly
bounded (in the inputs) relative error, and those for which the relative
error introduced can be arbitrarily large depending on the inputs. I have
in mind the uniformly bounded relative error ones, preserving NaNs and
infinities. I guess I should say that explicitly in the LangRef changes.

  Maybe that should be an

enum somewhere, rather than an accuracy.

I'd rather introduce additional operands in the fpmath metadata.

Can you accurately propagate accuracy ratios across multiple
instructions? Through multiple paths and PHI nodes? Not to mention
that the "Accuracy" is also FP, which has its own accuracy problems...
sigh...

I don't understand the question. The metadata applies to one instruction,
the accuracy loss is per instruction. A transform that introduces a relative
error of 2.5 ULPs per instruction can of course result in a huge accuracy loss
after a huge number of instructions.

Ciao, Duncan.

Hi Dmitry,

    The kinds of transforms I think can reasonably be done with the current
    information are things like: x + 0.0 -> x; x / constant -> x * (1 / constant) if
    constant and 1 / constant are normal (and not denormal) numbers.

The particular definition is not that important, as the fact that this
definition exists :slight_smile: I.e. I think we need a set of transformations to be defined
(as enum the most likely, as Renato pointed out) and an interface, which accepts
"fp-model" (which is "fast", "strict" or whatever keyword we may end up) and the
particular transformation and returns true of false, depending whether the
definition of fp-model allows this transformation or not. So the transformation
would request, for example, if reassociation is allowed or not.

at some point each optimization will have to decide if it is going to be applied
or not, so that's not really the point. It seems to me that there are many many
possible optimizations, and putting them all as flags in the metadata is out of
the question. What seems reasonable to me is dividing transforms up into a few
major (and orthogonal) classes and putting flags for them in the metadata.

Another point, important from practical point of view, is that fp-model is
almost always the same for any instructions in the function (or even module) and
tagging every instruction with fp-model metadata is quite a substantial waste of
resources.

I measured the resource waste and it seems fairly small.

  So it makes sense to me to have a default fp-model defined for the

function or module, which can be overwritten with instruction metadata.

That's possible (I already discussed this with Chandler), but in my opinion is
only worth doing if we see unreasonable increases in bitcode size in real code.

I also understand that clang generally derives GCC switches and fp precision
switches are not an exception, but I'd like to point out that there's a far more
orderly way of defining fp precision model (IMHO, of course :slight_smile: ), adopted by MS
and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to have
it adopted in clang.

But while adding MS-style fp-model switches is different topic (and I guess
quite arguable one), I'm mentioning it to show the importance of an idea of
abstracting internal compiler fp-model from external switches

The info in the meta-data is essentially a bunch of external switches which
will then be used to determine which transforms are run.

  and exposing

a querying interface to transformations. Transformations shouldn't care about
particular model, they need to know only if particular type of transformation is
allowed.

Do you have a concrete suggestion for what should be in the metadata?

Ciao, Duncan.

Hi Dmitry,

The kinds of transforms I think can reasonably be done with the current
information are things like: x + 0.0 → x; x / constant → x * (1 / constant) if
constant and 1 / constant are normal (and not denormal) numbers.

The particular definition is not that important, as the fact that this
definition exists :slight_smile: I.e. I think we need a set of transformations to be defined
(as enum the most likely, as Renato pointed out) and an interface, which accepts
“fp-model” (which is “fast”, “strict” or whatever keyword we may end up) and the
particular transformation and returns true of false, depending whether the
definition of fp-model allows this transformation or not. So the transformation
would request, for example, if reassociation is allowed or not.

at some point each optimization will have to decide if it is going to be applied
or not, so that’s not really the point. It seems to me that there are many many
possible optimizations, and putting them all as flags in the metadata is out of
the question. What seems reasonable to me is dividing transforms up into a few
major (and orthogonal) classes and putting flags for them in the metadata.

Optimization decision to apply or not should be based on strict definition of what is allowed or not, but not on optimization interpretation of “fast” fp-model (for example). Say, after widely adopting “fast” fp-model in the compiler, you suddenly realize that the definition is wrong and allowing some type of transformation is a bad idea (for any reason - being incompatible with some compiler or not taking into account some corner cases or for whatever other reason), then you’ll have to go and fix one million places where the decision is made.

Alternatively, defining classes of transformation and making optimization to query for particular types of transformation you keep it under control.

Another point, important from practical point of view, is that fp-model is
almost always the same for any instructions in the function (or even module) and
tagging every instruction with fp-model metadata is quite a substantial waste of
resources.

I measured the resource waste and it seems fairly small.

So it makes sense to me to have a default fp-model defined for the

function or module, which can be overwritten with instruction metadata.

That’s possible (I already discussed this with Chandler), but in my opinion is
only worth doing if we see unreasonable increases in bitcode size in real code.

What is reasonable or not is defined not only by absolute numbers (0.8% or any other number). Does it make sense to increase bitcode size by 1% if it’s used only by math library writes and a couple other people who reeeeally care about precision and performance at the same time and knowledgeable enough to restrict precision on particular instructions only? In my experience it’s extremely rare case, when people would like to have more than compiler flags to control fp accuracy and ready to deal with pragmas (when they are available).

I also understand that clang generally derives GCC switches and fp precision
switches are not an exception, but I’d like to point out that there’s a far more
orderly way of defining fp precision model (IMHO, of course :slight_smile: ), adopted by MS
and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to have
it adopted in clang.

But while adding MS-style fp-model switches is different topic (and I guess
quite arguable one), I’m mentioning it to show the importance of an idea of
abstracting internal compiler fp-model from external switches

The info in the meta-data is essentially a bunch of external switches which
will then be used to determine which transforms are run.

and exposing

a querying interface to transformations. Transformations shouldn’t care about
particular model, they need to know only if particular type of transformation is
allowed.

Do you have a concrete suggestion for what should be in the metadata?

I would define the set of transformations, such as (i can help with more complete list if you prefer):

  • reassociation
  • x+0.0=>x
  • x*0.0=>0.0
  • x*1.0=>x
  • a/b => a* 1/b
  • a*b+c=>fma(a,b,c)
  • ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
  • value unsafe transformation (for aggressive fp optimizations, like ab+ac => a(b+c)) and other of the kind.
    and several aliases for “strict”, “precise”, “fast” models (which are effectively combination of flags above).

So that metadata would be able to say “fast”, “fast, but no fma allowed”, “strict, but fma allowed”, I.e. metadata should be a base-level + optional set of adjustments from the list above.

And, again, I think this should be function level model, unless specified otherwise in the instruction, as it will be the case in 99.9999% of the compilations.

Ciao, Duncan.

Dmitry.

I feel like this discussion is getting a bit off track…

I would define the set of transformations, such as (i can help with more complete list if you prefer):

  • reassociation
  • x+0.0=>x
  • x*0.0=>0.0
  • x*1.0=>x
  • a/b => a* 1/b
  • a*b+c=>fma(a,b,c)
  • ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
  • value unsafe transformation (for aggressive fp optimizations, like ab+ac => a(b+c)) and other of the kind.
    and several aliases for “strict”, “precise”, “fast” models (which are effectively combination of flags above).

So that metadata would be able to say “fast”, “fast, but no fma allowed”, “strict, but fma allowed”, I.e. metadata should be a base-level + optional set of adjustments from the list above.

I would love to see such detailed models if we have real use cases and people interested in implementing them.

However, today we have a feature in moderately widespread use, ‘-ffast-math’. It’s semantics may not be the ideal way to enable restricted, predictable optimizations of floating point operations, but they are effective for a wide range of programs today.

I think having a generic flag value which specifically is attempting to model the loose semantics of ‘-ffast-math’ is really important, and I think any more detailed framework for classifying and enabling specific optimizations should be layered on afterward. While I share our frustration with the very vague and hard to reason about semantics of ‘-ffast-math’, I think we can provide a clear enough spec to make it implementable, and we should give ourselves the freedom to implement all the optimizations within that spec which existing applications rely on for performance.

And, again, I think this should be function level model, unless specified otherwise in the instruction, as it will be the case in 99.9999% of the compilations.

I actually lobbied with Duncan to use a function default, with instruction level overrides, but his posts about the metadata overhead of just doing it on each instruction, I think his approach is simpler.

As he argued to me, eventually, this has to end up on the instruction in order to model inlining correctly – a function compiled with ‘-ffast-math’ might be inlined into a function compiled without it, and vice versa. Since you need this ability, it makes sense to simplify the inliner, the metadata schema, etc and just always place the data on the instructions unless there is some significant scaling problem. I think Duncan has demonstrated it scales pretty well.

For simple metadata, like “fast” in initial proposal, it could be ok. But if more complex metadata is possible (like I’ve described), then this approach could consume more bitcode size, than expected. And I’m sure there will be attempts to add fine-grain precision control. And the first candidate is probably enabling/disable FMAs.

Inlining is a valid concern, though inside the single module fp model will be the same in absolute majority of cases. People also tend to have consistent flags across the project, so it shouldn’t be rare case when it’s consistent between modules.

Function or module level default setting is really just an optimization, but IMHO quite useful one. It would also simplify dumps and understanding of what is going on for people who don’t want dig into details of fp precision problems and be distracted by additional metadata.

Just to be clear. As it’s not me, who is going to implement this, I’m just try to draw an attention to the issues that we’ll finally encounter down the road.

Dmitry.

And, again, I think this should be function level model, unless specified otherwise in the instruction, as it will be the case in 99.9999% of the compilations.

I actually lobbied with Duncan to use a function default, with instruction level overrides, but his posts about the metadata overhead of just doing it on each instruction, I think his approach is simpler.

As he argued to me, eventually, this has to end up on the instruction in order to model inlining correctly – a function compiled with ‘-ffast-math’ might be inlined into a function compiled without it, and vice versa. Since you need this ability, it makes sense to simplify the inliner, the metadata schema, etc and just always place the data on the instructions unless there is some significant scaling problem. I think Duncan has demonstrated it scales pretty well.

For simple metadata, like “fast” in initial proposal, it could be ok. But if more complex metadata is possible (like I’ve described), then this approach could consume more bitcode size, than expected. And I’m sure there will be attempts to add fine-grain precision control. And the first candidate is probably enabling/disable FMAs.

Inlining is a valid concern, though inside the single module fp model will be the same in absolute majority of cases. People also tend to have consistent flags across the project, so it shouldn’t be rare case when it’s consistent between modules.

Function or module level default setting is really just an optimization, but IMHO quite useful one.

And I don’t disagree, I just think it is premature until we have measured an issue with the simpler form. Since we will almost certainly need the simpler form anyways, we might as well wait until the problem manifests.

The reason I don’t expect it to get worse with more complex specifications is because the actual metadata nodes are uniqued. Thus we should see many instructions all referring to the same (potentially complex) node.

It would also simplify dumps and understanding of what is going on for people who don’t want dig into details of fp precision problems and be distracted by additional metadata.

The IR is not a normalized representation already though. It’s primary consumer and producer are libraries and machines, not humans. Debug metadata, TBAA metadata, and numerous other complexities are already present.

Just to be clear. As it’s not me, who is going to implement this, I’m just try to draw an attention to the issues that we’ll finally encounter down the road.

Yep, I’m just trying to explain my perspective on these issues. =]

Hi Dmitry,

    That's possible (I already discussed this with Chandler), but in my opinion is
    only worth doing if we see unreasonable increases in bitcode size in real code.

What is reasonable or not is defined not only by absolute numbers (0.8% or any
other number). Does it make sense to increase bitcode size by 1% if it's used
only by math library writes and a couple other people who reeeeally care about
precision *and* performance at the same time and knowledgeable enough to
restrict precision on particular instructions only? In my experience
it's extremely rare case, when people would like to have more than compiler
flags to control fp accuracy and ready to deal with pragmas (when they are
available).

there is no increase in bitcode size if you don't use this feature. If more
options are added it will hardly increase the bitcode size: there will be one
metadatum with lots of options (!0 = metadata !{ this, that, other }), and
instructions just have a reference to it. So the size increase isn't like
(number of options) * (number of instructions), it is (number of options) +
(number of instructions).

And, again, I think this should be function level model, unless specified
otherwise in the instruction, as it will be the case in 99.9999% of the
compilations.

Link-time optimization will sometimes result in "fast-math" functions being
inlined into non-fast math functions and vice-versa. This pretty much
inevitably means that per-instruction fpmath options are required. That
said, to save space, if every fp instruction in a function has the same
fpmath metadata then the metadata could be attached to the function instead.
But since (in my opinion) the size increase is mild, I don't think it is
worth the added complexity.

Ciao, Duncan.

Hi,

I would love to see such detailed models if we have real use cases and people
interested in implementing them.

However, today we have a feature in moderately widespread use, '-ffast-math'.
It's semantics may not be the ideal way to enable restricted, predictable
optimizations of floating point operations, but they are effective for a wide
range of programs today.

I think having a generic flag value which specifically is attempting to model
the *loose* semantics of '-ffast-math' is really important, and I think any more
detailed framework for classifying and enabling specific optimizations should be
layered on afterward. While I share our frustration with the very vague and hard
to reason about semantics of '-ffast-math', I think we can provide a clear
enough spec to make it implementable, and we should give ourselves the freedom
to implement all the optimizations within that spec which existing applications
rely on for performance.

I agree with Chandler. Also, don't forget that the safest way to proceed is to
start with a permissive interpretation of flags and tighten then up later. For
example, suppose we start with an fpaccuracy of "fast" meaning: ignore NaN's,
ignore infinities, do whatever you like; and then later tighten it to mean: do
the right thing with NaN's and infinities, only introduce a bounded number of
ULPs of error. Then this is conservatively safe: existing bitcode created with
the loose semantics will be correctly optimized and codegened with the new
tight semantics (just less optimized than it used to be). However if we start
with tight semantics and then decide later that it was too tight, then we are
in trouble since existing bitcode might then undergo optimizations that the
creator of the bitcode didn't want. So I'd rather start with a quite permissive
setup which seems generally useful and allows the most important optimizations,
and worry about decomposing and tightening it later.

Given the fact that no-one was interested enough to implement any kind of
relaxed floating point mode in LLVM IR in all the years gone by, I actually
suspect that there might never be anything more than just this simple and not
very well defined 'fast-math' mode. But at least there is a clear path for
how to evolve towards a more sophisticated setup.

Ciao, Duncan.

I guess it would be user error if a strict function used the results
of a non-strict function (explicitly compiled with -ffast-math) and
complain about loss of precision. In that case, the inlining keeping
the option per-line makes total sense.

Would there be need to make fast-math less strict, ie. to only use it
when no strict FP result needs its result? In this case, an option in
the whole function would guarantee that all inlined instructions would
be modified to strict, even if relaxed in the first place.

Just guessing for the future, I agree with you that the first
implementation should be very simple, as it is.

cheers,
--renato

Once it's implemented, there will be zealots complaining that your
"-ffast-math" is not as good as <insert-compiler-here>. But you can
kindly ask them to contribute with code.

Link-time optimization will sometimes result in “fast-math” functions being
inlined into non-fast math functions and vice-versa. This pretty much
inevitably means that per-instruction fpmath options are required.

I guess it would be user error if a strict function used the results
of a non-strict function (explicitly compiled with -ffast-math) and
complain about loss of precision. In that case, the inlining keeping
the option per-line makes total sense.

It’s not a user error. User knows his code and accuracy of his code much better, than any compiler could possible do and may have strong reasons to specify fast-math for one function and not specify for another.

Would there be need to make fast-math less strict, ie. to only use it
when no strict FP result needs its result? In this case, an option in
the whole function would guarantee that all inlined instructions would
be modified to strict, even if relaxed in the first place.

If the user specified different fp-models to different functions on purpose, the most likely you’ll ruin performance by assuming stricter model the result of inlining.

Given the fact that no-one was interested enough to implement any kind of
relaxed floating point mode in LLVM IR in all the years gone by, I actually
suspect that there might never be anything more than just this simple and not
very well defined ‘fast-math’ mode. But at least there is a clear path for
how to evolve towards a more sophisticated setup.

Once it’s implemented, there will be zealots complaining that your
“-ffast-math” is not as good as .

While it’s certainly true, it’s no different from any other analysis/transformation. What is different is the claims that clang -ffast-math is producing less precise code, than . And you’ll have hard time explaining why. And it is sad that some people just expect compilers to produce faster code with keeping precision exactly the same… Even enabling FMA generation (which typically increases precision), provokes people to claim that you broke their precious code, just because the precision changed (didn’t get better or worse, just changed).

> > Given the fact that no-one was interested enough to implement any
> > kind of relaxed floating point mode in LLVM IR in all the years
> > gone by, I
> actually
> > suspect that there might never be anything more than just this
> > simple
> and not
> > very well defined 'fast-math' mode. But at least there is a
> > clear path
> for
> > how to evolve towards a more sophisticated setup.
>
> Once it's implemented, there will be zealots complaining that your
> "-ffast-math" is not as good as <insert-compiler-here>.

While it's certainly true, it's no different from any other
analysis/transformation. What *is* different is the claims that clang
-ffast-math is producing less precise code, than
<insert-compiler-here>. And you'll have hard time explaining why. And
it is sad that some people just expect compilers to produce faster
code with keeping precision exactly the same... Even enabling FMA
generation (which typically increases precision), provokes people to
claim that you broke their precious code, just because the precision
changed (didn't get better or worse, just changed).

To be fair, this can be a very serious validation issue.

-Hal