EuroLLVM Numerics issues

All: There will be a BoF talk at the EuroLLVM conference regarding Numerics (FMF and module flags which control fp behavior and optimization).

Even if you are not going to be in attendance, please reply to this thread as we are collecting open issues and ideas for future direction in all layers of LLVM for which optimizations are controlled by numerics flags. Please read over the numerics blog if you like for reference material:

http://blog.llvm.org/2019/03/llvm-numerics-blog.html

p.s. (restarting this thread here).

Regards,
Michael

Thanks for putting this together Michael. I won’t be at EuroLLVM this year, but I’m very interested in the Numerics topic, and I’ll be watching this thread.

Thanks,

-Warren Ristow

SN Systems / Sony Interactive Entertainment

A few things I’ve been thinking about:

  • Is anyone working on finishing the migration to using the new fneg instruction?
  • Controls for allowing and/or mandating denormal flushing
  • Making denormal-fp-math attribute per FP type
  • FTZ flag- Dealing with constrained and unconstrained versions of target FP intrinsics
  • Can we define a policy or general direction for snan handling?
  • Relatedly, llvm.minnum/llvm.maxnum should be renamed to fmin/fmax, and a new set of minnum/maxnum that follow the defined snan behavior are needed. This would give 3 complete sets of min/max intrinsics
  • Are target features/attributes allowed to change the behavior of standard operations/intrinsics?- Adding FP min/max to atomicrmw, and which versions are needed

A few things I’ve been thinking about:

  • Is anyone working on finishing the migration to using the new fneg instruction?

What did you have in mind? I was not aware that there is work pending.

  • Controls for allowing and/or mandating denormal flushing
  • Making denormal-fp-math attribute per FP type
  • FTZ flag

This would be useful to us too.

  • Dealing with constrained and unconstrained versions of target FP intrinsics

That’s really interesting and not something that I had anticipated. It needs to be discussed…

As far as I know, fneg isn’t constant folded. The tests haven’t been migrated, and clang is still emitting fsub -0.0, x. I’m not sure what the state of the rest of the optimizations is, but I just remember the initial instruction getting added.

-Matt

What did you have in mind? I was not aware that there is work pending.

As far as I know, fneg isn’t constant folded.

Ah, yes. I did know that, but forgot. It’s now on my todo list…

The tests haven’t been migrated, and clang is still emitting fsub -0.0, x. I’m not sure what the state of the rest of the optimizations is, but I just remember the initial instruction getting added.

Huh, I did not know this. I thought I did a visual inspection and saw that FNeg IR was being generated by Clang, but I’m probably wrong. Also on my todo list now.

Thanks, Matt!

Hi Michael,

Thanks for raising this topic. I am very interested, but unfortunately I won’t be at EuroLLVM. Here are some things on my mind, roughly in order of how much time I’ve spent thinking about them:

====================

Complex types

====================

There, I said it.

Oh hell yes!

"Kaylor, Andrew via llvm-dev" <llvm-dev@lists.llvm.org> writes:

====================

Masked vector FP operations

====================

We’ve resisted adding explicitly predicated operations other than load
and store in the past, but I think for vector FP operations we’re
going to need this in order to maintain strict FP semantics.

Yep, we definitely will. This is one of the reasons Simon Moll's
predication work (D57504) is so important.

====================

Complex types

====================

There, I said it.

I'll echo my colleague's response.

Oh hell yes! OH HELL YES! :slight_smile:

====================

Accuracy controls

====================

We have a fast math flag that lets us substitute approximations for
some math library functions. It would be nice to have a mechanism to
control the accuracy of the approximations.

Indeed. "Fast or not" is too coarse.

====================

Per function controls

====================

Similarly, it would be nice to explicitly list which math library functions could be replaced.

I’d also like to suggest the formation of a floating point working
group to try to get more organized about driving some of these things
(particularly the constrained intrinsics) toward completion.

That's a great idea.

                         -David

Folding a couple of topics back into this thread:

<email from cameron.mcinally@nyu.edu>

I’d like to touch on a topic mentioned in the blog post. The constrained intrinsics work is at a road block on how to proceed with the constrained implementation in the backends, i.e. D55506. Reviews/ideas in this area would be greatly appreciated (attn: target code owners).

Thanks,
Cameron

<email from venkataramanan.kumar.llvm@gmail.com>

Just like to point out few things that I thought is related to FP Numerics.
LLVM could do some additional transformation with “sqrt” and “division” under fast math on X86 like 1/sqrt(x)* 1/sqrt(x) to 1/x. These are long latency instructions and could get benefit if enabled under unsafe math.

Also are we considering doing such FP transforms on vector floating point types?

regards,
Venkat.

I’m working on fneg. I started with the IRBuilder and found that some of the transformation passes use it. Updating the m_FNeg() matchers gets me farther, but the InstCombiner doesn’t know how to deal with a non-BinaryOperator.

Sorry for the delay in responding to this thread, but I’ve been out of the country and am a little behind on email.

I was working on threading the #pragma FENV_ACCESS down into clang’s AST. But that’s on hold because Richard Smith wants more design discussion. The current method I was building on doesn’t work for templates.

The clang TreeTransform class is magic that I don’t grok yet.

Some updated post fact from the BoF at the EuroLLVM conference:

  • Constrained intrinsics: Steve, Matt and Andrew to continue working the newest iteration of this feature, working towards an initial implementation that we can add to for multiple architecture usage. Steve, if a thread is currently active, can you chime in here about it.

  • FTZ: Extending FMF for FTZ, need clarity on whether code might be optimized to FTZ or is optimized to FTZ as the feature spec.
    See thread: [RFC] Making space for a flush-to-zero flag in FastMathFlags

  • FMF caller/callee and inlining: We are looking into this internally here at Apple at least initially for consistency of results for shaders. I may start a thread on this if others are interested.

  • Complex numbers and Vector Masking: We need an owner for this, would someone like to start a thread on this?

Also see This review for FP Exceptions: https://reviews.llvm.org/D61331

Regards,
Michael