Optimization of sqrt() with invalid argument

My colleague Will Schmidt recently discovered
(http://llvm.org/bugs/show_bug.cgi?id=21048) that the LLVM optimizers
are converting sqrt(n), where n is negative, into 0.0. Now, as Sanjay
pointed out, the value of sqrt(n) for negative n other than -0.0 is
undefined according to the C99 standard, so this is allowable behavior.
But I think that it's not necessarily good behavior, nonetheless.

The problem is that an unoptimized call to sqrt(n) in libm will produce
a NaN, at least in the implementations I have access to. So this
particular choice changes the behavior of optimized versus unoptimized
code, which of course is allowable, but probably to be avoided when
possible.

However, there's a de facto standard of producing NaN for these cases.
The gcc and xlc compilers will optimize the call into a constant NaN,
given the right options (-O2 -ffast-math for gcc, -O2 -qignerrno for
xlc). All of gcc, icc, and xlc produce NaN via the library call with
unoptimized code. Choosing 0.0 for Clang/LLVM introduces an unnecessary
incompatibility.

This is important in practice because of the xalanc benchmark in SPEC
CPU2006. This code initializes a variable to a NaN value by assigning
it the value sqrt(-2.01) -- don't ask me why. This variable is used to
terminate a loop, and when Clang/LLVM produces 0.0 instead of NaN, the
loop spins forever.

Are there any objections to changing the LLVM optimization behavior to
produce a constant NaN instead of 0.0 in this case?

Thanks,
Bill

From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
To: llvmdev@cs.uiuc.edu
Cc: "Will Schmidt" <will_schmidt@vnet.ibm.com>, "Hal Finkel" <hfinkel@anl.gov>, spatel+llvm@rotateright.com
Sent: Thursday, September 25, 2014 2:50:48 PM
Subject: Optimization of sqrt() with invalid argument

My colleague Will Schmidt recently discovered
(http://llvm.org/bugs/show_bug.cgi?id=21048) that the LLVM optimizers
are converting sqrt(n), where n is negative, into 0.0. Now, as
Sanjay
pointed out, the value of sqrt(n) for negative n other than -0.0 is
undefined according to the C99 standard, so this is allowable
behavior.
But I think that it's not necessarily good behavior, nonetheless.

The problem is that an unoptimized call to sqrt(n) in libm will
produce
a NaN, at least in the implementations I have access to. So this
particular choice changes the behavior of optimized versus
unoptimized
code, which of course is allowable, but probably to be avoided when
possible.

However, there's a de facto standard of producing NaN for these
cases.
The gcc and xlc compilers will optimize the call into a constant NaN,
given the right options (-O2 -ffast-math for gcc, -O2 -qignerrno for
xlc). All of gcc, icc, and xlc produce NaN via the library call with
unoptimized code. Choosing 0.0 for Clang/LLVM introduces an
unnecessary
incompatibility.

This is important in practice because of the xalanc benchmark in SPEC
CPU2006. This code initializes a variable to a NaN value by
assigning
it the value sqrt(-2.01) -- don't ask me why. This variable is used
to
terminate a loop, and when Clang/LLVM produces 0.0 instead of NaN,
the
loop spins forever.

Are there any objections to changing the LLVM optimization behavior
to
produce a constant NaN instead of 0.0 in this case?

I think that changing this makes sense. Returning NaN is constant with the POSIX sqrt() specification (http://pubs.opengroup.org/onlinepubs/009695399/functions/sqrt.html).

-Hal

This is important in practice because of the xalanc benchmark in SPEC
CPU2006.

No, it would be important in practice for really used legacy (i.e.
unfixable) code relying on this undefined behaviour. It's important
for gaming the system if it occurs in SPEC.

Cheers.

Tim.

We should fix it because it costs us nothing to do and conforms to IEEE-754, and there’s no good reason to maintain the current behavior.

– Steve

Bill raised another question in the bug report that I’m curious about too:

Is anyone or any bot running/posting SPEC CPU results for clang?

I know it’s part of test-suite/external, but this constant fold code has been around 5+ years. Was the bug lying dormant all this time, only visible on PPC, or something else?

We should fix it because it costs us nothing to do and conforms to IEEE-754, and there’s no good reason to maintain the current behavior.

Probably, but there are theoretical issues: 0 is easier for other
optimisations to deal with than NaN (especially after constant
propagation). It's also creating NaNs in situations where they've been
explicitly disabled (the @llvm.sqrt intrinsic only really gets created
in finite math mode). In my perfect world we'd optimise it to a
segfault.

I doubt anything bad will actually happen though, so I'm not actually
bothered one way or the other; just vaguely annoyed that compilers get
changed to accommodate broken benchmarks (well, more accurately, to
allow us to sneak a -ffast-math compile into our numbers when the
benchmark wasn't designed for it).

Cheers.

Tim.

I know it's part of test-suite/external, but this constant fold code has
been around 5+ years. Was the bug lying dormant all this time, only visible
on PPC, or something else?

My guess would be that people don't tend to run with -ffast-math (it's
got a reputation for breaking code, even ignoring this particular
issue). Without that, from what I can see the issue won't arise.

But that doesn't account for complete silence, it must happen occasionally.

Cheers.

Tim.

If this can really only happen under fast-math, I take back what I said entirely. IIRC, NaNs are explicitly not supported in that mode, so all bets are off. Zero is a perfectly reasonable result.

– Steve

>
>> I know it's part of test-suite/external, but this constant fold code has
>> been around 5+ years. Was the bug lying dormant all this time, only visible
>> on PPC, or something else?
>
> My guess would be that people don't tend to run with -ffast-math (it's
> got a reputation for breaking code, even ignoring this particular
> issue). Without that, from what I can see the issue won't arise.

If this can really only happen under fast-math, I take back what I said entirely. IIRC, NaNs are explicitly not supported in that mode, so all bets are off. Zero is a perfectly reasonable result.

The result of this is that we return 0 only under -ffast-math. In all
other cases, the sqrt call will remain and we will return NaN. So that
seems like a bothersome discrepancy given that the Posix standard
wording tends to favor NaN (though an implementation-defined value is
allowable, regardless of -ffast-math).

As mentioned earlier, a number of other compilers also generate NaN with
-ffast-math or its equivalent. I believe this is done to comply with
the Posix wording.

Regardless of feelings about playing benchmark games (with which I have
sympathy), people tend to compile many of the SPECfp benchmarks with
-ffast-math so they can, e.g., use FMA instructions in their publishes.
But this is a side issue, and I'm rather sorry I mentioned SPEC at all.
This is really an issue of internal and external consistency.

Thanks,
Bill

From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
To: "Stephen Canon" <scanon@apple.com>
Cc: spatel+llvm@rotateright.com, "Will Schmidt" <will_schmidt@vnet.ibm.com>, "LLVM Developers Mailing List"
<llvmdev@cs.uiuc.edu>
Sent: Friday, September 26, 2014 1:23:16 PM
Subject: Re: [LLVMdev] Optimization of sqrt() with invalid argument

> >
> >> I know it's part of test-suite/external, but this constant fold
> >> code has
> >> been around 5+ years. Was the bug lying dormant all this time,
> >> only visible
> >> on PPC, or something else?
> >
> > My guess would be that people don't tend to run with -ffast-math
> > (it's
> > got a reputation for breaking code, even ignoring this particular
> > issue). Without that, from what I can see the issue won't arise.
>
> If this can really only happen under fast-math, I take back what I
> said entirely. IIRC, NaNs are explicitly not supported in that
> mode, so all bets are off. Zero is a perfectly reasonable result.

The result of this is that we return 0 only under -ffast-math. In
all
other cases, the sqrt call will remain and we will return NaN. So
that
seems like a bothersome discrepancy given that the Posix standard
wording tends to favor NaN (though an implementation-defined value is
allowable, regardless of -ffast-math).

As mentioned earlier, a number of other compilers also generate NaN
with
-ffast-math or its equivalent. I believe this is done to comply with
the Posix wording.

Regardless of feelings about playing benchmark games (with which I
have
sympathy), people tend to compile many of the SPECfp benchmarks with
-ffast-math so they can, e.g., use FMA instructions in their
publishes.
But this is a side issue, and I'm rather sorry I mentioned SPEC at
all.
This is really an issue of internal and external consistency.

Yes, but Steve is right: -ffast-math implies -ffinite-math-only, which means that we assume no NaNs. I understand that people use -ffast-math to get vectorized reductions (just getting aggressive FMAs only requires -ffp-contract=fast), but this, unfortunately, is also a matter of internal consistency :frowning:

-Hal

> From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
> To: "Stephen Canon" <scanon@apple.com>
> Cc: spatel+llvm@rotateright.com, "Will Schmidt" <will_schmidt@vnet.ibm.com>, "LLVM Developers Mailing List"
> <llvmdev@cs.uiuc.edu>
> Sent: Friday, September 26, 2014 1:23:16 PM
> Subject: Re: [LLVMdev] Optimization of sqrt() with invalid argument
>
> > >
> > >> I know it's part of test-suite/external, but this constant fold
> > >> code has
> > >> been around 5+ years. Was the bug lying dormant all this time,
> > >> only visible
> > >> on PPC, or something else?
> > >
> > > My guess would be that people don't tend to run with -ffast-math
> > > (it's
> > > got a reputation for breaking code, even ignoring this particular
> > > issue). Without that, from what I can see the issue won't arise.
> >
> > If this can really only happen under fast-math, I take back what I
> > said entirely. IIRC, NaNs are explicitly not supported in that
> > mode, so all bets are off. Zero is a perfectly reasonable result.
>
> The result of this is that we return 0 only under -ffast-math. In
> all
> other cases, the sqrt call will remain and we will return NaN. So
> that
> seems like a bothersome discrepancy given that the Posix standard
> wording tends to favor NaN (though an implementation-defined value is
> allowable, regardless of -ffast-math).
>
> As mentioned earlier, a number of other compilers also generate NaN
> with
> -ffast-math or its equivalent. I believe this is done to comply with
> the Posix wording.
>
> Regardless of feelings about playing benchmark games (with which I
> have
> sympathy), people tend to compile many of the SPECfp benchmarks with
> -ffast-math so they can, e.g., use FMA instructions in their
> publishes.
> But this is a side issue, and I'm rather sorry I mentioned SPEC at
> all.
> This is really an issue of internal and external consistency.

Yes, but Steve is right: -ffast-math implies -ffinite-math-only, which means that we assume no NaNs. I understand that people use -ffast-math to get vectorized reductions (just getting aggressive FMAs only requires -ffp-contract=fast), but this, unfortunately, is also a matter of internal consistency :frowning:

Hi Hal,

Sure. Well, if the consensus is that we don't want to change this, then
so be it -- but I expect I won't be the last person to raise the issue.

At the least, we should constant-fold to NaN for the non fast-math sqrt
calls...there's no point in going through the libcall overhead when we
know the answer in advance. I can prepare a patch for that if everyone
agrees.

Thanks,
Bill

This isn’t purely a fast-math issue…ConstantFolding isn’t using enable-unsafe-fp-math to decide whether to emit the ‘0’. It’s just looking for the llvm intrinsic rather than a call to sqrt:

%0 = call double @llvm.sqrt.f64(double -2.000000e+00)

vs:

%0 = call double @sqrt(double -2.000000e+00) #2

So how about a front-end fix:

If the parameter is a negative constant, instead of converting the sqrt call into the intrinsic, just leave it as-is regardless of fast-math. I think that change would provide internal and external consistency. And I suspect this is the solution the other compilers have reached - they end up generating a HW sqrt instruction even though they could have precomputed the result.

At the least, we should constant-fold to NaN for the non fast-math sqrt
calls...there's no point in going through the libcall overhead when we
know the answer in advance. I can prepare a patch for that if everyone
agrees.

I thought we already did that, but apparently not. The only thing to
beware of there is make sure it only triggers when -fno-math-errno has
been passed. In the IR you'll be looking for a call to @sqrt (or
variant) with the "readnone" attribute.

Cheers.

Tim.

This isn't purely a fast-math issue...ConstantFolding isn't using
enable-unsafe-fp-math to decide whether to emit the '0'. It's just looking
for the llvm intrinsic rather than a call to sqrt:

Yep. As Hal said, the key option is -ffinite-math, which allows the
front-end to emit that intrinsic in the first place. We document
@llvm.sqrt's behaviour and it has no requirements to check
unsafe-fp-math before folding.

So how about a front-end fix:
If the parameter is a negative constant, instead of converting the sqrt call
into the intrinsic, just leave it as-is regardless of fast-math.

That sounds like a nasty hack to me. It covers only a very limited set
of situations where this would happen. For example, almost certainly
not:

double foo() {
  const double x = -1.0;
  return sqrt(x);
}

Possibly not even (depending on how you define "constant"):

    double foo() { return sqrt(-1.0f); }

If we're going to change it I'd prefer the active decision in the mid-end.

Cheers.

Tim.

Yeah, it’s a hack. I was just hoping to let everyone walk away a winner. :slight_smile:

One more question then:

Why return zero instead of undef?

if (V >= -0.0)
return ConstantFoldFP(sqrt, V, Ty);
else // Undefined
return Constant::getNullValue(Ty);

Surely returning UndefValue::get(Ty) would allow greater havoc to ensue? Maybe even the segfault that you suggested. :slight_smile:

I don’t think clang emits any warnings for the bogus C code…that would be the nicer thing to do.

One more question then:
Why return zero instead of undef?

Now that I don't know. I'd have written it with undef, myself. The 0.0
seems to date back to the original commit in 2007, so no help there
either.

Sorry.

Tim.

The crux is that sqrt(-1) can set errno, can't it?

Joerg

I'm pretty sure -ffast-math means you don't care about that.

First quoted sentence from Bill is about non-fast-math.

Joerg

From: "Joerg Sonnenberger" <joerg@britannica.bec.de>
To: llvmdev@cs.uiuc.edu
Sent: Friday, September 26, 2014 5:14:54 PM
Subject: Re: [LLVMdev] Optimization of sqrt() with invalid argument

> At the least, we should constant-fold to NaN for the non fast-math
> sqrt
> calls...there's no point in going through the libcall overhead when
> we
> know the answer in advance. I can prepare a patch for that if
> everyone
> agrees.

The crux is that sqrt(-1) can set errno, can't it?

-ffast-math implies -fno-math-errno, but otherwise the default is target specific. On Linux, math calls can set errno by default, for example. On Darwin, they don't. On systems that can set errno, we indeed cannot constant fold the result unless we can prove the errno modification is not observable in a well-defined way. However, because we set different attributes on the functions depending on whether errno can be set or not, I think we already handle this correctly.

-Hal