Hello everyone,
I am trying to implement the max PTX builtin function.
This is defined in the following way:
"max.type d, a, b;"
where .type can be:
.type = { .u16, .u32, .u64,
.s16, .s32, .s64 };
The presence of multiple types requires llvm.ptx.max
to be overloaded for i16, i32 and i64.
So I think that the right way to define the intrinsic would be
(as in the attached max_not_working.patch file):
def int_ptx_max : Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>],
[Commutative]>;
The problem is that the builtin is not recognised in the following test case:
define ptx_device i16 @max_16(i16 %a, i16 %b) {
entry:
%d = call i16 @llvm.ptx.max(i16 %a, i16 %b)
ret i16 %d
}
declare i16 @llvm.ptx.max(i16, i16)
Things change if I define explicitly the i16 intrinsic, like this:
def int_ptx_max : Intrinsic<[llvm_i16_ty],
[llvm_i16_ty, llvm_i16_ty],
[Commutative]>;
In this case all goes well. But this is obviously non-extensible.
Am I using the llvm_anyint_ty type in the right way ?
Is there another way to implement this kind of behaviour?
Thanks,
Alberto
max_not_working.patch (1.32 KB)
max_working.patch (1.3 KB)