computing a conservatively rounded square of a double

I am trying to compute conservative lower and upper bounds for the
square of a double. I have set the rounding mode to FE_UPWARDS
elsewhere, so the code is

struct Interval {
  double nlo, hi;

Interval inspect_singleton_sqr(const double x) {
  Interval s;
  s.nlo = x * -x;
  s.hi = x * x;
  return s;

Both multiplies are necessary, since they round in different
directions. However, clang does not know this, assumes that x * -x =
-(x * x), and simplifies down to a single multiply:

  .quad -9223372036854775808 # double -0.000000e+00
  .quad -9223372036854775808 # double -0.000000e+00
  .globl _Z21inspect_singleton_sqrd
  .align 16, 0x90
  .type _Z21inspect_singleton_sqrd,@function
_Z21inspect_singleton_sqrd: # @_Z21inspect_singleton_sqrd
# BB#0:
  vmulsd %xmm0, %xmm0, %xmm1
  vxorpd .LCPI1_0(%rip), %xmm1, %xmm0
  .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd

I realize this is unsupported behavior, but it would be nice to still
be able to use clang to do numerical computation. Is there a way to
convince clang to not pull the multiplication outside addition? I
would like to avoid interfering with other optimizations in the
process: the problem is easy to solve by making a multiply function
and marking it never inline, but that would be terrible for
performance. I am happy to write clang specific code that turns on
only when __clang__ is defined.


Geoffrey - In the short term, if you marked this function as ", that might have the effect you desire. See As for the more general solution, I don’t know. For the rest of this, I’m speaking only as an interested observer. None of this is particularly thought through. We really should have a good solution to this. I know we don’t currently support floating point context and flags, but not being able to support standard idioms from numerical computation seems very problematic. Here’s one proposal to get the conversation started - please don’t take it too seriously. We could introduce an “fp_rounding_sensitive” annotation (flag?) on the instruction level which disables optimizations sensative to floating point rounding. This could be parsed by clang as either a function attribute or statement attribute. Alternatively, we could introduce an “fp_rounding_mode(x)” annotation with the semantics of specifying the desired rounding mode. The required work would be much the same. I’m picturing either of these being really course grained in their effect - at least to start. Something like disable all FP optimizations within their context. A toy implementation could even map the presence of such a flag to “optnone” as a starting point. We could improve incrementally over time. p.s. If we already have such a thing and I just don’t remember, please feel free to point that out. :slight_smile: Philip

An alternative way to think about fp_rounding_sensitive is related to
one of the possible solutions:

static inline double neg(const double x) {
  union { double x; uint64_t i; } u;
  u.x = x;
  u.i = u.i ^ (uint64_t(1)<<63);
  return u.x;

If I replace x * -x with x * neg(x), everything works since clang
doesn't incorrectly apply the identity. Unfortunately, it produces
significantly worse code, which I've included below. Instead of using
vxorps to do the high bit flip in an SSE register, it moves data into
(and then out of) an integer register and does the bit flip there.

In other words, my problem would be solved if my neg() function had
the same nice code generation as normal -x, but didn't participate in
the incorrect identities. I'm not sure if this can be arranged today;
I tried

   asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y));

but it started spitting internal compiler errors at me:

    ./geode/exact/Interval.h:167:12: error: illegal "f" output constraint
          asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y));
    fatal error: error in backend: Access past stack top!


While I did propose a patch to add this attribute to Clang, and it was approved, I never committed it because the LLVM work to fully support it has been dangling in nobody-wants-to-review-it limbo forever. And it seemed like I shouldn’t put the attribute into Clang if LLVM didn’t fully honor the contract.

I’ll redo the Clang part (Aaron Ballman has refactored the attribute-handling stuff about a hundred times since I did the original patch so it no longer applies) and hopefully get that into Clang soon.




If any of those llvm changes are still pending, let me know. I’d be happy to review.


It would be wonderful to get this in! What would the front-end
interface be (-frounding-math, #pragma, special builtins, etc.)?