OpenCL conversion operations: why call?

Artoria2e5 · October 13, 2025, 4:03am

On this Godbolt instance I have a small kernel with vector size 2 from mfakto. It uses these OpenCL builtins: convert_uint2, convert_float2, as_uint2, mad24. However, when compiled to either gfx906 or nvptx, it seems to be setting up call sequences for convert_\* and mad24:

  __private float2 qf = convert_float2(mad24(q.d4, 32768u, q.d3));
  qf = qf * 32768.0f;
  
  __private uint2 qi = convert_uint2(qf*nf);

        s_add_u32 s16, s16, _Z14convert_float2Dv2_j@rel32@lo+4
        s_addc_u32 s17, s17, _Z14convert_float2Dv2_j@rel32@hi+12
        s_mov_b64 s[4:5], s[48:49]
        s_mov_b64 s[6:7], s[38:39]
        s_mov_b64 s[8:9], s[36:37]
        s_mov_b64 s[10:11], s[34:35]
        s_mov_b32 s12, s53
        s_mov_b32 s13, s52
        s_mov_b32 s14, s51
        s_mov_b32 s15, s50
        v_mov_b32_e32 v31, v59
        s_swappc_b64 s[30:31], s[16:17]
        v_mul_f32_e32 v1, 0x47000000, v1
        v_mul_f32_e32 v0, 0x47000000, v0
        v_mul_f32_e32 v0, v57, v0
        v_mul_f32_e32 v1, v56, v1
        (you get the point)

        { // callseq 1, 0
        st.param.v2.b32         [param0], {%r7, %r8};
        call.uni (retval0), _Z14convert_float2Dv2_j, (param0);
        ld.param.v2.b32         {%r9, %r10}, [retval0];
        } // callseq 1

To investigate further I added -emit-llvm. It looks like the call for as_uint2 is being eliminated at the final IR output, but calls for convert_\* and mad24 do remain.

The question is: why is LLVM doing this? The hardware has instructions for u32/f32 conversion and LLVM is known to use them for shaders. In fact, I can get LLVM to emit them by using __builtin_convertvector:

        v_mov_b32_e32 v0, v9
        s_swappc_b64 s[30:31], s[54:55]
        v_cvt_f32_u32_e32 v0, v0
        v_cvt_f32_u32_e32 v1, v1
        s_getpc_b64 s[16:17] # this is for a later mul24, irrelevant here
        s_add_u32 s16, s16, _Z5mul24Dv2_jS_@rel32@lo+4
        s_addc_u32 s17, s17, _Z5mul24Dv2_jS_@rel32@hi+12
        s_mov_b64 s[4:5], s[48:49]
        v_mul_f32_e32 v0, 0x47000000, v0
        v_mul_f32_e32 v1, 0x47000000, v1 # end mul24 stuff
        v_mul_f32_e32 v1, v46, v1
        v_mul_f32_e32 v0, v47, v0
        v_cvt_u32_f32_e32 v46, v0
        v_cvt_u32_f32_e32 v47, v1

So… no. I don’t get it. Is there some edge case that I’m not thinking about? Can I in any way assure LLVM that these edge cases will not happen?

Oh. I also tried the spirv64 target, where the output did have the desired OpExtInst %6 %1 u_mad24 .... A similar call is present in -emit-llvm, so perhaps this should really be the job of a later stage?

Or, perhaps there already is something that turns the calls into instructions after the gfx9 assembly / nvptx code is generated?

zsrkmyn · October 13, 2025, 12:47pm

as_uint2 is macro of a built-in intrinsic identified by clang, whereas convert_float2is an external function which needs to be further lowered by backend passes. They’re handled in different phases. If there’s no optimiaztion pass importing or lowering the call, then it will be kept as is.

You can check IR emitted by clang w/o going through the optimization pipeline by passing -Xclang -disable-llvm-passes, and there’s no as_uint2 at all.

Artoria2e5 · October 13, 2025, 1:08pm

So I really should’ve put this in “AMDGPU” category then. In any case, I should benchmark first, now that I do know how to write a version without those calls…

Artoria2e5 · October 30, 2025, 5:37am

A full dump by mfakto’s own pipeline via rocm reveals no real calls. No idea who or what’s removing it, but I guess there’s a lesson somewhere about what to trust. And a small interpretability bug.

Topic		Replies	Views
[OpenCL patch] support for __builtin_astype, __builtin_convert, __builtin_vec_step Clang Frontend	6	236	March 17, 2011
[OpenCL patch] asType, Convert revised Clang Frontend	0	99	May 26, 2011
Completing the OpenCL Vector Extensions Clang Frontend	5	138	January 29, 2013
Changes to the PTX calling conventions LLVM Dev List Archives	12	89	December 14, 2011
Calling Conventions Clang Frontend	1	101	April 16, 2008

OpenCL conversion operations: why call?

Related topics