EuroLLVM Numerics issues

Michael_Berg · March 29, 2019, 5:05pm

All: There will be a BoF talk at the EuroLLVM conference regarding Numerics (FMF and module flags which control fp behavior and optimization).

Even if you are not going to be in attendance, please reply to this thread as we are collecting open issues and ideas for future direction in all layers of LLVM for which optimizations are controlled by numerics flags. Please read over the numerics blog if you like for reference material:

http://blog.llvm.org/2019/03/llvm-numerics-blog.html

p.s. (restarting this thread here).

Regards,
Michael

wristow · March 29, 2019, 9:16pm

Thanks for putting this together Michael. I won’t be at EuroLLVM this year, but I’m very interested in the Numerics topic, and I’ll be watching this thread.

Thanks,

-Warren Ristow

SN Systems / Sony Interactive Entertainment

Arsenault_Matthew · March 29, 2019, 9:51pm

A few things I’ve been thinking about:

Is anyone working on finishing the migration to using the new fneg instruction?
Controls for allowing and/or mandating denormal flushing
Making denormal-fp-math attribute per FP type
FTZ flag- Dealing with constrained and unconstrained versions of target FP intrinsics
Can we define a policy or general direction for snan handling?
Relatedly, llvm.minnum/llvm.maxnum should be renamed to fmin/fmax, and a new set of minnum/maxnum that follow the defined snan behavior are needed. This would give 3 complete sets of min/max intrinsics
Are target features/attributes allowed to change the behavior of standard operations/intrinsics?- Adding FP min/max to atomicrmw, and which versions are needed

cjm345 · April 1, 2019, 3:30pm

A few things I’ve been thinking about:

Is anyone working on finishing the migration to using the new fneg instruction?

What did you have in mind? I was not aware that there is work pending.

Controls for allowing and/or mandating denormal flushing

Making denormal-fp-math attribute per FP type

FTZ flag

This would be useful to us too.

Dealing with constrained and unconstrained versions of target FP intrinsics

That’s really interesting and not something that I had anticipated. It needs to be discussed…

arsenm · April 1, 2019, 3:52pm

As far as I know, fneg isn’t constant folded. The tests haven’t been migrated, and clang is still emitting fsub -0.0, x. I’m not sure what the state of the rest of the optimizations is, but I just remember the initial instruction getting added.

-Matt

cjm345 · April 1, 2019, 4:10pm

What did you have in mind? I was not aware that there is work pending.

As far as I know, fneg isn’t constant folded.

Ah, yes. I did know that, but forgot. It’s now on my todo list…

The tests haven’t been migrated, and clang is still emitting fsub -0.0, x. I’m not sure what the state of the rest of the optimizations is, but I just remember the initial instruction getting added.

Huh, I did not know this. I thought I did a visual inspection and saw that FNeg IR was being generated by Clang, but I’m probably wrong. Also on my todo list now.

Thanks, Matt!

andykaylor · April 3, 2019, 1:16am

Hi Michael,

Thanks for raising this topic. I am very interested, but unfortunately I won’t be at EuroLLVM. Here are some things on my mind, roughly in order of how much time I’ve spent thinking about them:

cjm345 · April 3, 2019, 2:04pm

====================

Complex types

====================

There, I said it.

Oh hell yes!

David_A_Greene · April 3, 2019, 4:30pm

"Kaylor, Andrew via llvm-dev" <llvm-dev@lists.llvm.org> writes:

====================

Masked vector FP operations

====================

We’ve resisted adding explicitly predicated operations other than load
and store in the past, but I think for vector FP operations we’re
going to need this in order to maintain strict FP semantics.

Yep, we definitely will. This is one of the reasons Simon Moll's
predication work (D57504) is so important.

====================

Complex types

====================

There, I said it.

I'll echo my colleague's response.

Oh hell yes! OH HELL YES!

====================

Accuracy controls

====================

We have a fast math flag that lets us substitute approximations for
some math library functions. It would be nice to have a mechanism to
control the accuracy of the approximations.

Indeed. "Fast or not" is too coarse.

====================

Per function controls

====================

Similarly, it would be nice to explicitly list which math library functions could be replaced.

I’d also like to suggest the formation of a floating point working
group to try to get more organized about driving some of these things
(particularly the constrained intrinsics) toward completion.

That's a great idea.

-David

Michael_Berg · April 4, 2019, 6:06am

Folding a couple of topics back into this thread:

I’d like to touch on a topic mentioned in the blog post. The constrained intrinsics work is at a road block on how to proceed with the constrained implementation in the backends, i.e. D55506. Reviews/ideas in this area would be greatly appreciated (attn: target code owners).

Thanks,
Cameron

Just like to point out few things that I thought is related to FP Numerics.
LLVM could do some additional transformation with “sqrt” and “division” under fast math on X86 like 1/sqrt(x)* 1/sqrt(x) to 1/x. These are long latency instructions and could get benefit if enabled under unsafe math.

Also are we considering doing such FP transforms on vector floating point types?

regards,
Venkat.

kpneal · April 11, 2019, 7:15pm

I’m working on fneg. I started with the IRBuilder and found that some of the transformation passes use it. Updating the m_FNeg() matchers gets me farther, but the InstCombiner doesn’t know how to deal with a non-BinaryOperator.

Sorry for the delay in responding to this thread, but I’ve been out of the country and am a little behind on email.

kpneal · April 11, 2019, 7:17pm

I was working on threading the #pragma FENV_ACCESS down into clang’s AST. But that’s on hold because Richard Smith wants more design discussion. The current method I was building on doesn’t work for templates.

The clang TreeTransform class is magic that I don’t grok yet.

Michael_Berg · April 30, 2019, 9:04pm

Some updated post fact from the BoF at the EuroLLVM conference:

Constrained intrinsics: Steve, Matt and Andrew to continue working the newest iteration of this feature, working towards an initial implementation that we can add to for multiple architecture usage. Steve, if a thread is currently active, can you chime in here about it.
FTZ: Extending FMF for FTZ, need clarity on whether code might be optimized to FTZ or is optimized to FTZ as the feature spec.
See thread: [RFC] Making space for a flush-to-zero flag in FastMathFlags
FMF caller/callee and inlining: We are looking into this internally here at Apple at least initially for consistency of results for shaders. I may start a thread on this if others are interested.
Complex numbers and Vector Masking: We need an owner for this, would someone like to start a thread on this?

Also see This review for FP Exceptions: https://reviews.llvm.org/D61331

Regards,
Michael

Topic		Replies	Views
[RFC]: Fix llvm.min.f and llvm.max.f* intrinsics LLVM Project	8	267	October 11, 2024
[PATCH] D14707: add fast-math-flags to 'call' instructions (PR21290) LLVM Dev List Archives	0	119	January 7, 2016
Floating point working group IR & Optimizations	11	389	February 22, 2024
NEON FP flags LLVM Dev List Archives	9	101	April 1, 2016
Floating point instructions patch LLVM Dev List Archives	5	100	April 30, 2005

EuroLLVM Numerics issues

Related topics