Is intrinsic with no-return incompatible with optimization -O2?

I’m adding an intrinsic with no-return ‘foo’ to clang. However, with optimization ‘-O2’ enabled, the intrinsic call ‘’ is not existed in file “foo.ll”. The source code of clang is as follows:

// BuiltinRiscv.def
TARGET_BUILTIN(__builtin_riscv_foo, "vUZiUZi*", "n", "zoo")
TARGET_BUILTIN(__builtin_riscv_readfoo, "UZiUZi", "n", "zoo")    
class RISCVFoo: Intrinsic<[], [llvm_i32_ty, LLVMPointerType<llvm_i32_ty>], 
                  [IntrReadMem, IntrArgMemOnly, IntrHasSideEffects]>; 
class RISCVReadFoo: Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]>;
def int_riscv_foo: RISCVFoo;
def int_riscv_readfoo: RISCVReadFoo;

The compilation command used is as follows:

clang --target=riscv32 -march=rv32imafcv -emit-llvm -S -O2 foo.cpp -o foo.ll

The source code for test case is as follows:

#include "stdio.h"

unsigned int data[4] = {0xffff5555, 0x12345678, 0x77777777, 0x00000001};

int main(int argc, char *argv[]) {

  unsigned int test_value;

  __builtin_riscv_foo(0, &data[0]); /* load value to  into hidden registers */
  test_value = __builtin_riscv_readfoo(0); /* get value from hidden registers */
  if (test_value == 0xffff5555) { 
  } else {

  return 0;

The main code in file “foo.ll” is as follows:

define dso_local noundef i32 @main(i32 noundef %argc, i8** nocapture noundef readnone %argv) local_unnamed_addr #0 {
  %0 = tail call i32 @llvm.riscv.readfoo(i32 0)
  %cmp = icmp eq i32 %0, -43691
  %. = select i1 %cmp, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @str.2, i32 0, i32 0), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @str, i32 0, i32 0)
  %puts = tail call i32 @puts(i8* nonnull dereferenceable(1) %.)
  ret i32 0

As you can see, the intrinsic ‘readfoo’ remains, and the call to intrinsic ‘foo’ is eliminated! But if I change the optimization level from ‘-O2’ to ‘-O0’, everything works fine.

define dso_local noundef i32 @main(i32 noundef %argc, i8** noundef %argv) #0 {
  %retval = alloca i32, align 4
  %argc.addr = alloca i32, align 4
  %argv.addr = alloca i8**, align 4
  %test_value = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  store i32 %argc, i32* %argc.addr, align 4
  store i8** %argv, i8*** %argv.addr, align 4
  call void 0, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @data, i32 0, i32 0))
  %0 = call i32 @llvm.riscv.readfoo(i32 0)

I don’t know what to do so that clang with ‘-O2’ option doesn’t over-optimize my intrinsic, and the clang version is 14.0.6. Any suggestions would be helpful, thanks!

IntrHasSideEffects is currently not supported in conjunction with IntrReadMem. You need to drop that attribute.

This would be fixed by ⚙ D137937 [TableGen] Represent IntrHasSideEffects using inaccessiblemem read+write, but it’s currently stuck due to AMDGPU intrinsics with unclear semantics.

1 Like

thanks! It works.