Converting float to int with FJCVTZS

Hi!

I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.

This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).

Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.

What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?

Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.

Thanks,
Johannes

Hi Johannes,

I don’t think cvttss2si wraps around. Instead it returns 0x80000000 for large values. “If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned.”

Also isn’t fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?

~Craig

Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there’s no 32-bit unsigned conversion instruction without avx512.

So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn’t work would it?

LLVM already has support for UB-free float2int conversions:
https://llvm.org/docs/LangRef.html#saturating-floating-point-to-integer-conversions
Rather than trying to herd each backend to conditionally do the same thing,
I think a much more straight-forward solution would be
to expose those intrinsics as clang builtins.

Roman.

Hi, Craig!

Thanks for your reply. It seems I have indeed conflated some behavior here, specifically that it converts with cvttss2si and then truncates the 64 bit result.

Replacing fcvtzu with fjcvtzs does indeed produce the same result as x86_64, however, it does not generalize to other conversions from floating point to integer.

So my proposed solution to always use fjcvtzs was not great. Might there be some other way to get similar behavior? I will try to use intrinsics to get the same behavior not matter the conversion; but the question remains if it's possible to do with compiler flags instead of changing the code.

For a motivating example, see https://godbolt.org/z/sjeE6M

#include <stdio.h>
#include <cstdint>

void cast(float value) {
  printf("uint32_t(%.2f) = %u\n", value, uint32_t(value));
}

int main() {
  cast(4294967808.);
}

// output on x86_64: uint32_t(4294967808.00) = 512
// output on aarch64: uint32_t(4294967808.00) = 4294967295

Replacing uint32_t(value) with __builtin_arm_jcvt(value) on aarch64 makes it behave like x86_64:

// output on aarch64: __builtin_arm_jcvt(4294967808.00) = 512

Hi, Roman!

Thanks for you relpy. I think exposing those would be great indeed.

Also, having the possibility to pick a UB-free implementation with compiler flags would be great. Be it saturating or anything else.

To be sure, neither of these are possible today?

Johannes

Hi, Roman!

Thanks for you relpy. I think exposing those would be great indeed.

Also, having the possibility to pick a UB-free implementation with compiler flags would be great. Be it saturating or anything else.

To be sure, neither of these are possible today?

I don't recall seeing any such patches on clang side, no.

Johannes

Roman

Using "-fsanitize=float-cast-overflow -fsanitize-trap=float-cast-overflow" will ensure a UB-free float-to-int on all targets. (See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html )

-Eli