fptosi undefined behaviour

Hi all,

Consider the following snippet, the aim of which is to convert a
double to a signed i16, returning 0 if not exactly representable:

define i16 @foo(double) {
top:
  %1 = fptosi double %0 to i16
  %2 = sitofp i16 %1 to double
  %3 = fcmp une double %2, %0
  %4 = select i1 %3, i16 0, i16 %1
  ret i16 %4
}

Of course, if the value is out-of-range, the result of fptosi is
undefined. Nevertheless, the snippet works on x86 & x86_64, generating
what to me seems to be fairly efficient code for the task.

However it breaks on ARM, with foo(200000.0) => 3392. From what I can
tell (given my very limited knowledge of LLVM IR, assembler and ARM
architecture), the first line is returning a value out-of-range of the
i16 type.

1) I realise this is a somewhat silly question, but is this still
acceptable "undefined behaviour"?

2) If so, is there a way to do this in an efficient manner without
relying on undefined behaviour? (i.e. I can introduce a range check
before the fptosi call, but this would add further overhead).

(for further context, this problem originally arose in the Julia issue
https://github.com/JuliaLang/julia/issues/14549)

Thanks,
Simon

Hi all,

Consider the following snippet, the aim of which is to convert a
double to a signed i16, returning 0 if not exactly representable:

define i16 @foo(double) {
top:
  %1 = fptosi double %0 to i16
  %2 = sitofp i16 %1 to double
  %3 = fcmp une double %2, %0
  %4 = select i1 %3, i16 0, i16 %1
  ret i16 %4
}

Of course, if the value is out-of-range, the result of fptosi is
undefined. Nevertheless, the snippet works on x86 & x86_64, generating
what to me seems to be fairly efficient code for the task.

However it breaks on ARM, with foo(200000.0) => 3392. From what I can
tell (given my very limited knowledge of LLVM IR, assembler and ARM
architecture), the first line is returning a value out-of-range of the
i16 type.

1) I realise this is a somewhat silly question, but is this still
acceptable "undefined behaviour"?

Yes, it is.

2) If so, is there a way to do this in an efficient manner without
relying on undefined behaviour? (i.e. I can introduce a range check
before the fptosi call, but this would add further overhead).

You will need to add the bounds checks to the LLVM IR to get the
behavior that you want. If LLVM does not generate efficient code
for this, then you will need to teach the backends to recognize this
pattern and generate better code if it can.

-Tom

I always thought these out-of-range instructions did produce an
"undef" rather than allowing fully-general undefined behaviour
(otherwise we couldn't speculate them, for a start).

If so, I think the code ought to be valid: %1 is *some* i16
bitpattern, which means %2 cannot be completely unconstrained and
should never be equal to %0.

Cheers.

Tim.

Thank Tom and Tim for your responses.

If the behaviour is truly undefined as Tom says, would it be possible
to get checked intrinsics for this?

-Simon

Resending to “everyone”:

What’s wrong with checking

short foo(double d)
{
if (trunc(d) > MAX_SHORT || trunc(d) < MIN_SHORT) return 0; else return short(d);
}

If you don’t care about the one in difference, you could do abs(trunc(d)) < MAX_SHORT; to avoid the check against min.

That’s what you want, right?

Well, you would need switch the check the other way round, to avoid
problems with NaNs. But this is probably what we will do (and indeed,
appears to be what Swift does without -Ounchecked).