[RFC][HIP] Use double for long double by default for both host/device

HIP device and host target may use different long double formats. For example, amdgpu target uses double as long double, whereas x86_64 uses 80 bit long double type by default. Usually, this does not cause issue as long as variables containing long double type are not passed across host/device boundary. However, the difference of long double sizes between host and device can cause subtle issues.

For example, std::max_align_t in libstdc++ is defined based on sizeof(long double), which evaluates to 8 in device compilation and 16 in host compilation. A template kernel using std::max_align_t as a template argument will result in different names in device and host compilation, causing the kernel not to be able to be found by the runtime. This could happen even if the HIP program does not use long double at all.

The situation is similar to the violation of C++ one definition rule. As HIP is a single source C++ extension, in device and host compilation, all types should be consistent, otherwise there may be undefined behaviour. This is very similar to the situation that one C++ source file defines long double as 64 bit whereas the other C++ source file defines long double as 80 bit and then they are linked together. Such ODR violations could be difficult to diagnose.

Since most HIP programs do not use long double types directly, one way to avoid this issue is to use double as long double by default for both host and device targets, i.e., use -mlong-double-64 for default. If users need to use extended long double type, they can pass -Xarch_host -mlong-double-80 to clang.


Interestingly, we’re discussing a similar issue for SYCL (another single-source C++ extension with device and host compilation). In SCYL, we’ve historically disallowed the long-double type at all on the device, however the fall-out of this is things like UDLs stop working.

One thing we considered is to just make it 64 bit on the device (and let the host continue to do what it is doing), then prohibiting it from being moved across the boundary. My thought was to just treat the difference in kernel name there as an ODR violation/IF-NDR.

One such concern with changing the host size with -mlong-double-X is that it makes linking to OTHER libraries a problem with LD.

ANOTHER solution perhaps, is to just make long-double a ‘storage only’ format on the device. In this end, all math ends up getting done with double-precision, then just expanded/contracted when stored/loaded.

1 Like

I think this is the right solution. Host/device assume that the types and in-memory representation of objects crossing host/device boundary is identical. Host-side compilation is also expected to follow the standard host-side rules for C++ compilation, so changing representation of ‘long doube’ will likely buy us a lot of interoperability trouble. Allowing storage-only long double type on the GPU side should keep everything working, and would also be compatible with the future GPUs if/when they grow support for the actual long double. The downside is that it may break some existing GPU-side code which may be assuming that long double is demoted to double. I think such code would be relying on undefined behavior (AKA implementation detail), but if necessary we could introduce an escape hatch option to allow implicit demotion to double on the GPU side, putting responsibility for breaking host/device type equivalency on the user who wants to rely on it.

1 Like

FWIW, OpenMP offload implements long double as storage only, so we allow it to be present but not to be “used”.This seems to be a reasonable compromise so far.

@jdoerfert, am I correct in assuming that user-defined floating-point literals such as operator ""_foo(long double) are therefore not possible for OpemMP offload code (note that the standard doesn’t support such operators with a different floating-point parameter type per [lex.ext]p4)?

I’d assume you get an error if you try to use them, yes. We do not allow operations, calls, etc. on any unsupported types.