I’m adding an intrinsic with no-return ‘foo’ to clang. However, with optimization ‘-O2’ enabled, the intrinsic call ‘llvm.riscv.foo’ is not existed in file “foo.ll”. The source code of clang is as follows:
// BuiltinRiscv.def
TARGET_BUILTIN(__builtin_riscv_foo, "vUZiUZi*", "n", "zoo")
TARGET_BUILTIN(__builtin_riscv_readfoo, "UZiUZi", "n", "zoo")
// IntrinsicRISCV.td
class RISCVFoo: Intrinsic<[], [llvm_i32_ty, LLVMPointerType<llvm_i32_ty>],
[IntrReadMem, IntrArgMemOnly, IntrHasSideEffects]>;
class RISCVReadFoo: Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]>;
def int_riscv_foo: RISCVFoo;
def int_riscv_readfoo: RISCVReadFoo;
The compilation command used is as follows:
clang --target=riscv32 -march=rv32imafcv -emit-llvm -S -O2 foo.cpp -o foo.ll
The source code for test case is as follows:
#include "stdio.h"
unsigned int data[4] = {0xffff5555, 0x12345678, 0x77777777, 0x00000001};
int main(int argc, char *argv[]) {
unsigned int test_value;
__builtin_riscv_foo(0, &data[0]); /* load value to into hidden registers */
test_value = __builtin_riscv_readfoo(0); /* get value from hidden registers */
if (test_value == 0xffff5555) {
printf("pass\n");
} else {
printf("fail\n");
}
return 0;
}
The main code in file “foo.ll” is as follows:
define dso_local noundef i32 @main(i32 noundef %argc, i8** nocapture noundef readnone %argv) local_unnamed_addr #0 {
entry:
%0 = tail call i32 @llvm.riscv.readfoo(i32 0)
%cmp = icmp eq i32 %0, -43691
%. = select i1 %cmp, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @str.2, i32 0, i32 0), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @str, i32 0, i32 0)
%puts = tail call i32 @puts(i8* nonnull dereferenceable(1) %.)
ret i32 0
}
As you can see, the intrinsic ‘readfoo’ remains, and the call to intrinsic ‘foo’ is eliminated! But if I change the optimization level from ‘-O2’ to ‘-O0’, everything works fine.
define dso_local noundef i32 @main(i32 noundef %argc, i8** noundef %argv) #0 {
entry:
%retval = alloca i32, align 4
%argc.addr = alloca i32, align 4
%argv.addr = alloca i8**, align 4
%test_value = alloca i32, align 4
store i32 0, i32* %retval, align 4
store i32 %argc, i32* %argc.addr, align 4
store i8** %argv, i8*** %argv.addr, align 4
call void @llvm.riscv.foo(i32 0, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @data, i32 0, i32 0))
%0 = call i32 @llvm.riscv.readfoo(i32 0)
...
I don’t know what to do so that clang with ‘-O2’ option doesn’t over-optimize my intrinsic, and the clang version is 14.0.6. Any suggestions would be helpful, thanks!