llvm.sqrt intrinsic undefined behaviour

Hi all,
I'm working on a language in which I would like all operations to be well defined. (and efficient)
I want to define a sqrt function in my language, that will return NaN for arguments < 0, and NaN for a NaN argument.
As far as I know, these semantics map nicely to the SQRTPS SSE instruction, which seems to return NaN on arguments < 0.
However, the LLVM lang ref states "llvm.sqrt has undefined behavior for negative numbers other than -0.0".

This means that, to avoid undefined behaviour, in general I will have to add a runtime branch to avoid passing values less than zero to llvm.sqrt().
This is unfortunate since I would like to avoid inefficient, unneeded branching.

I propose changing the llvm.sqrt() LLVM instrinsic to be well defined on all inputs, and be defined to return NaN on negative inputs.

Btw, I don't particularly care about errno or related, as my language is not C. I realise there is some kind of issue here to do with code reordering and errno, but It would be a pity if these problems slowed down sqrt code emission for all LLVM users.

What do people think?

Thanks,
     Nick

Hi all,
I'm working on a language in which I would like all operations to be
well defined. (and efficient)
I want to define a sqrt function in my language, that will return NaN
for arguments < 0, and NaN for a NaN argument.
As far as I know, these semantics map nicely to the SQRTPS SSE
instruction, which seems to return NaN on arguments < 0.
However, the LLVM lang ref states "llvm.sqrt has undefined behavior for
negative numbers other than -0.0".

This means that, to avoid undefined behaviour, in general I will have to
add a runtime branch to avoid passing values less than zero to llvm.sqrt().
This is unfortunate since I would like to avoid inefficient, unneeded
branching.

I propose changing the llvm.sqrt() LLVM instrinsic to be well defined on
all inputs, and be defined to return NaN on negative inputs.

Btw, I don't particularly care about errno or related, as my language is
not C. I realise there is some kind of issue here to do with code
reordering and errno, but It would be a pity if these problems slowed
down sqrt code emission for all LLVM users.

What do people think?

My suggestion is to implement sqrt() like this:

y = x >= 0.0f : llvm.sqrt(x) ? NaN;

If you are worried about performance on X86, you could have the frontend
emit the llvm.sse_sqrt_ps intrinsic for sqrt() or you could add
a pattern to the X86 backend to select this sequence to a SQRTPS instruction.

-Tom

I strongly disagree with this proposal. The purpose of this general purpose intrinsic is to expose sqrt functionality present on many of the architectures LLVM supports. If we defined its edge cases, we won't be able to map it to target functionality freely on targets whose edge cases don't match that definition.

I'd recommend using (or adding, if it doesn't already exist) an X86-specific intrinsic to expose exactly the instruction you want.

-Owen

I strongly disagree with this proposal. The purpose of this general
purpose intrinsic is to expose sqrt functionality present on many of
the architectures LLVM supports. If we defined its edge cases, we
won't be able to map it to target functionality freely on targets whose
edge cases don't match that definition.

I agree the targets should be the primary focus, but a cursory search
failed to find one whose sqrt instruction(s) didn't produce NaN for
negative values; it's pretty much the only sane choice. Do they exist
(perhaps odd GPUs or something that always traps)?

If not, perhaps we could sensibly decouple the errno stuff from the
actual value produced: make no guarantees about what happens to the
environment but specify the result.

Cheers.

Tim.

> I strongly disagree with this proposal. The purpose of this
> general
> purpose intrinsic is to expose sqrt functionality present on many
> of
> the architectures LLVM supports. If we defined its edge cases, we
> won't be able to map it to target functionality freely on targets
> whose
> edge cases don't match that definition.

I agree the targets should be the primary focus, but a cursory search
failed to find one whose sqrt instruction(s) didn't produce NaN for
negative values; it's pretty much the only sane choice. Do they exist
(perhaps odd GPUs or something that always traps)?

If not, perhaps we could sensibly decouple the errno stuff from the
actual value produced: make no guarantees about what happens to the
environment but specify the result.

FWIW, the fact that the current intrinsic (unlike all of the other libm-like intrinsics) does not have the same semantics and the corresponding libm function, means that we can only use it for autovectorization of sqrt() calls in finite-math mode. This is somewhat annoying for targets that can provide vectorized sqrt() functions more generally (whenever -fno-math-errno is in effect).

I would like to see some form of the sqrt intrinsic that is defined as the other ones are, as that will allow for autovectorization; either by changing the current definition, by adding some optional parameter, or by adding a new one.

-Hal