Floating-point traps on x86-64

I produce a mathematical modelling library on several platforms, including iOS, macOS and Android, all of which use Clang. On x86-based platforms, the hardware can generate floating-point traps. I prefer to run testing with traps for Invalid Operation, Divide-by-Zero and Overflow active, since that finds me problem sites more quickly than working backwards from “this test case failed.”

However, I had a problem with Apple Clang 8.x, which I believe was LLVM 3.9, targeting x86-64, in that the optimiser was assuming that floating-point traps were turned off. This was shown, for example, by the way it hoisted floating-point divides above tests of the divisor that were meant to safeguard the divides.

After a long support case with Apple, they gave me some Clang command-line options for LLVM that suppressed the problem:

-mllvm -speculate-one-expensive-inst=false

-mllvm -bonus-inst-threshold=0

I appreciate that this costs some performance, and I can accept that. These options worked fine for Apple Clang 9.x, whose various versions seem to have been based on LLVM 4.x and 5.x.

Now I’ve come to Apple Clang 10.0, which seems to be based on LLVM 6.0.1, and I have lots of floating-point traps again in optimised x86-64 code. It seems possible that I need some further LLVM options: does this seem plausible?

I’m not familiar with the LLVM codebase, and while I can find the source files that list the options I can use with -mllvm, I’d be guessing at which options are worth trying. Can anyone make suggestions?

Thanks very much,

Unfortunately the LLVM x86 backend has at least one bug that causes FPU exceptions on valid FP operations that I know of: https://bugs.llvm.org/show_bug.cgi?id=30885. You can work around that bug by using -march=haswell, if you continue to hit other exceptions you can file LLVM bugs against them, though I’m not sure if there is anyone actively interested in fixing them. Good luck.

Hey John,

This is something we’re currently working on. See:

https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

There’s still a lot to be done though. Short of this work, I don’t think you’ll find guarantees of trap-safety in LLVM.

Hope that helps,
Cameron

Vlad wrote:

Unfortunately the LLVM x86 backend has at least one bug that causes FPU exceptions on valid FP operations that I know

of: https://bugs.llvm.org/show_bug.cgi?id=30885. You can work around that bug by using -march=haswell, if you continue

to hit other exceptions you can file LLVM bugs against them, though I’m not sure if there is anyone actively interested in

fixing them.

I’ll give it a try, although I don’t hold out much hope.

Cameron wrote:

This is something we’re currently working on. See:

https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

There’s still a lot to be done though. Short of this work, I don’t think you’ll find guarantees of trap-safety in LLVM.

That seems to be saying that trap-safety will be available on those intrinsics when they’re complete.

How about expressions written with variables holding doubles and the ordinary C arithmetic operators? My chances of getting millions of line of code that use those re-written with intrinsics are zero; without that, I’ll be forced to turn off floating-point traps if the -march=haswell tweak does not help.

Thanks,

https://bugs.llvm.org/show_bug.cgi?id=30885. You can work around that bug by using -march=haswell

I’ll give it a try, although I don’t hold out much hope.

That was not my problem, which is not a surprise. That issue is about converting doubles to various long integer types, which the code I work on basically doesn’t do.

How about expressions written with variables holding doubles and the ordinary C arithmetic operators?

My chances of getting millions of line of code that use those re-written with intrinsics are zero;

There’s a maybe here, in that we could plausibly get our domain-specific programming language to generate intrinsics from expressions, but since the intrinsics work isn’t finished yet, that isn’t a solution for this month.

without that, I’ll be forced to turn off floating-point traps …

If I have to turn off floating-point traps, that will cause some quality disadvantage for x86 platforms compiled with Clang vs. ones that use GCC or MSVC. ARM platforms already have that disadvantage, because most ARM processor manufacturers leave out floating-point traps. The mathematical modeller I work on is used to create, edit or store about 45% of the world’s 3D CAD data, so this is not a trivial problem.

To be clear, what I’d like is the “fpexcept.maytrap” behaviour on all floating-point arithmetic. An aspect of this that is sometimes neglected with SSE2 is the need for the compiler to only use paired SSE2 operations that can trap if both registers in the SSE2 pair contain values that were either produced during the evaluation, or set up as safe values by the compiler. When you only need one square root, traps are active, and there’s garbage in the other SSE2 register, the garbage is negative reasonably often.

I guess I’m going to go hunting through LLVM options next to see if I can find anything that suppresses the problem. If I can’t, it seem like time to turn off those traps.

I’m hesitant to report LLVM bugs for several reasons:

  • The software is closed-source commercial code. We need to set up confidentiality agreements when we share source code with compiler manufacturers, but that’s impractical for an open-source project. So I’d have to construct examples with nothing confidential in them, which takes more time.
  • I’m not producing finished applications, but a library which is licensed to many ISVs. They need to use compatible development tools, and worry if they can’t use the same ones. So it’s much easier to support them if we use standard distributions of Clang, from Apple Xcode and the Android NDK. That means fixes will take unpredictable lengths of time to become available to me.
  • Floating-point traps don’t seem to be considered very important by the Clang team.

Thanks,

Vlad wrote:

Unfortunately the LLVM x86 backend has at least one bug that causes FPU exceptions on valid FP operations that I know

of: https://bugs.llvm.org/show_bug.cgi?id=30885. You can work around that bug by using -march=haswell, if you continue

to hit other exceptions you can file LLVM bugs against them, though I’m not sure if there is anyone actively interested in

fixing them.

I’ll give it a try, although I don’t hold out much hope.

Cameron wrote:

This is something we’re currently working on. See:

https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

There’s still a lot to be done though. Short of this work, I don’t think you’ll find guarantees of trap-safety in LLVM.

That seems to be saying that trap-safety will be available on those intrinsics when they’re complete.

That’s correct.

How about expressions written with variables holding doubles and the ordinary C arithmetic operators? My chances of getting millions of line of code that use those re-written with intrinsics are zero; without that, I’ll be forced to turn off floating-point traps if the -march=haswell tweak does not help.

Those would also be represented as constrained intrinsics. E.g. llvm.experimental.constrained.fadd. The intrinsics are used to prevent trap-unsafe optimizations throughout LLVM.

There are some proposals to automatically convert normal IRBuilder calls into constrained intrinsics. E.g. D53157.

https://bugs.llvm.org/show_bug.cgi?id=30885. You can work around that bug by using -march=haswell

I’ll give it a try, although I don’t hold out much hope.

That was not my problem, which is not a surprise. That issue is about converting doubles to various long integer types, which the code I work on basically doesn’t do.

How about expressions written with variables holding doubles and the ordinary C arithmetic operators?

My chances of getting millions of line of code that use those re-written with intrinsics are zero;

There’s a maybe here, in that we could plausibly get our domain-specific programming language to generate intrinsics from expressions, but since the intrinsics work isn’t finished yet, that isn’t a solution for this month.

The constrained intrinsics should be fairly trap-safe from IR through the SelectionDAG. The X86 backend has not been made trap-safe yet. IMO, the lion’s share of the trap-unsafe optimizations are before the backend. You might get lucky trying the constrained intrinsics now (maybe).

That said, performance at this point will be horrible. Perhaps worse than -O0.

without that, I’ll be forced to turn off floating-point traps …

If I have to turn off floating-point traps, that will cause some quality disadvantage for x86 platforms compiled with Clang vs. ones that use GCC or MSVC. ARM platforms already have that disadvantage, because most ARM processor manufacturers leave out floating-point traps. The mathematical modeller I work on is used to create, edit or store about 45% of the world’s 3D CAD data, so this is not a trivial problem.

To be clear, what I’d like is the “fpexcept.maytrap” behaviour on all floating-point arithmetic. An aspect of this that is sometimes neglected with SSE2 is the need for the compiler to only use paired SSE2 operations that can trap if both registers in the SSE2 pair contain values that were either produced during the evaluation, or set up as safe values by the compiler. When you only need one square root, traps are active, and there’s garbage in the other SSE2 register, the garbage is negative reasonably often.

Understood. I also require something like this for my work. The constrained intrinsics implementation is a step in the right direction, but there’s a long long way to go. You may be able to use it to flush out Invalid/Overflow/Zero issues in the source, but would not be able to run production quality code with traps enabled anytime soon.

Cameron wrote:

The X86 backend has not been made trap-safe yet. IMO, the lion’s share of the trap-unsafe optimizations

are before the backend.

The x86 backend will be responsible for the paired SSE2 instructions, won’t it? This code uses quite a lot of divides and square roots, and paired instructions without care for the other half of the pair are deadly there.

You might get lucky trying the constrained intrinsics now (maybe).

That said, performance at this point will be horrible. Perhaps worse than -O0.

Ouch.

I also require something like this for my work. The constrained intrinsics implementation

is a step in the right direction, but there’s a long long way to go. You may be able to use

it to flush out Invalid/Overflow/Zero issues in the source, but would not be able to run

production quality code with traps enabled anytime soon.

The source is under continuous development, and we have testing with traps active on MSVC, GCC, Solaris and AIX. They can all manage this with production-quality code. So we’d be looking for problems created by Clang. Sadly, having floating-point traps active causes actual Clang quality problems to be hidden in the haystack of floating-point errors.

Cameron wrote:

The X86 backend has not been made trap-safe yet. IMO, the lion’s share of the trap-unsafe optimizations

are before the backend.

The x86 backend will be responsible for the paired SSE2 instructions, won’t it? This code uses quite a lot of divides and square roots, and paired instructions without care for the other half of the pair are deadly there.

The scalar constrained intrinsics would prevent vectorization, so you hopefully wouldn’t see a vector sqrt/etc instruction with scalar input(s). I don’t think the X86 backend itself would select a vector instruction for a scalar operation (I could be wrong).

If your user/frontend is issuing a vector instruction, the user/frontend would be responsible for inserting safe values as needed (or masking on targets that support it).

You might get lucky trying the constrained intrinsics now (maybe).

That said, performance at this point will be horrible. Perhaps worse than -O0.

Ouch.

I also require something like this for my work. The constrained intrinsics implementation

is a step in the right direction, but there’s a long long way to go. You may be able to use

it to flush out Invalid/Overflow/Zero issues in the source, but would not be able to run

production quality code with traps enabled anytime soon.

The source is under continuous development, and we have testing with traps active on MSVC, GCC, Solaris and AIX. They can all manage this with production-quality code. So we’d be looking for problems created by Clang. Sadly, having floating-point traps active causes actual Clang quality problems to be hidden in the haystack of floating-point errors.

Yes, I know. Only a small number of people are currently working on this. New contributors are welcome…

Cameron wrote:

The scalar constrained intrinsics would prevent vectorization, so you hopefully wouldn’t see a vector

sqrt/etc instruction with scalar input(s). I don’t think the X86 backend itself would select a vector

instruction for a scalar operation (I could be wrong).

I fear it would, because they’re faster.

If your user/frontend is issuing a vector instruction, the user/frontend would be responsible for

inserting safe values as needed (or masking on targets that support it).

Our code is all C, with no assembler, so it has no way to issue vector instructions.

Only a small number of people are currently working on this. New contributors are welcome…

The learning curve before I could do anything useful makes it very hard to justify spending working

hours on this. I frankly lack the enthusiasm for spending lots of my own time on this.