Compiling CUDA code fails

Artem-B · March 31, 2022, 9:37pm

It’s true for V100.
Less so for A100. Cards like A100/A30 that are based on GA100 chip do indeed have the normal 1:2 fp64/fp32 hardware ratio. However, other nominally datacenter-grade cards like A40,A10/A16 are based on GA102/GA107 GPU variants and those come with 1:64 and 1:32 fp64/fp32 ratio.

The thing I’m constantly irked about NVIDIA’s GPU nomenclature is that GA102 and GA107 have the same compute capability, but the former has only half of fp64 hardware. I guess it’s better than the situation with sm_35 where we had models with 1:3 and 1:24 ratios (K40 vs GTX 780), but it still makes it a bit of a pain to come up with reasonable optimization trade-offs.

AFAICT, it implements it as a soft-float emulation of IEEE FP (at least that’s what GCC does on x64, according to Gcc 4.3 release notes).

__float128 ops in both gcc and clang call the standard library to actually do the operations: Compiler Explorer

We currently do not have the standard library on the GPU. We may be able to use the same soft-float approach once we have a way to provide GPU-side libcall implmentations that @jdoerfert has proposed. See [llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late

Topic		Replies	Views
[CUDA] CUDA device code does not support variadic functions in clang Clang Frontend cuda , clang	1	1013	February 24, 2022
Parsing CUDA AST using clang Clang Frontend	2	90	March 10, 2019
[GPUCC] link against libdevice LLVM Dev List Archives	7	59	August 2, 2016
problem on compiling cuda program with clang++ LLVM Dev List Archives	7	80	October 27, 2016
Using Clang Tools on CUDA Programs LLVM Dev List Archives	3	78	December 23, 2015

Compiling CUDA code fails

Related Topics