How to make an intrinsic behavie like a load instruction with regards to GVN pass?

Hi Everyone,

For an obscure reason I try to add an intrinsic function that combines GEP and Load instructions in one. Plus I want Global Value Numbering pass to be able to remove redundant calls to this intrinsic. I’ve played with different intrinsic options (IntrReadMem, IntrArgMemOnly, …) but I can’t get the example below to get optimized.

The example is as follows:

  • intrinsic name is llvm.bpf.getelementptr.and.load.i32, the address to load value from is the first argument;
  • intrinsic is called two times for the same address;
  • return value of the call is fed to a fake function consume_int;
  • I expect that opt -opaque-pointers -passes=gvn -S -o - test.ll would replace the second call to the intrinsic with the return value of the first, but this does not happen.

Example code & intrinsic definition are below.
Would appreciate your advice…

Thanks,
Eduard


Example:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "bpf"

%struct.foo = type { i32, i32 }

; Function Attrs: nounwind
define dso_local void @foofn(ptr noundef %ctx) local_unnamed_addr #0 {
entry:
  %x3 = tail call i32 (ptr, i1, i8, i8, i8, i1, ...) @llvm.bpf.getelementptr.and.load.i32(ptr elementtype(%struct.foo) %ctx, i1 false, i8 0, i8 1, i8 2, i1 true, i64 0, i32 1)
  tail call void @consume_int(i32 noundef %x3) #3
  %x14 = tail call i32 (ptr, i1, i8, i8, i8, i1, ...) @llvm.bpf.getelementptr.and.load.i32(ptr elementtype(%struct.foo) %ctx, i1 false, i8 0, i8 1, i8 2, i1 true, i64 0, i32 1)
  tail call void @consume_int(i32 noundef %x14) #3
  ret void
}

declare dso_local void @consume_int(i32 noundef) local_unnamed_addr #1

; Function Attrs: argmemonly nocallback nofree nosync nounwind readonly willreturn
declare i32 @llvm.bpf.getelementptr.and.load.i32(ptr nocapture readonly, i1 immarg, i8 immarg, i8 immarg, i8 immarg, i1 immarg, ...) #2

attributes #0 = { nounwind "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #2 = { argmemonly nocallback nofree nosync nounwind readonly willreturn }
attributes #3 = { nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"frame-pointer", i32 2}
!2 = !{!"clang version 16.0.0 (https://github.com/llvm/llvm-project.git c6347513e14c2111b8a2be454784bc2751707090)"}


Expectation:

  %x3 = tail call i32 (ptr, i1, i8, i8, i8, i1, ...)
   @llvm.bpf.getelementptr.and.load.i32(ptr elementtype(%struct.foo) %ctx,
                                        i1 false, i8 0, i8 1, i8 2, i1 true, i64 0, i32 1)
  tail call void @consume_int(i32 noundef %x3) #3
  tail call void @consume_int(i32 noundef %x3) #3

Reality: code not changed.


Intrinsic defintion:

  def int_bpf_getelementptr_and_load : ClangBuiltin<"__builtin_bpf_getelementptr_and_load">,
              Intrinsic<[llvm_any_ty],
                        [llvm_ptr_ty,     // base ptr for getelementptr
                         llvm_i1_ty,      // volatile
                         llvm_i8_ty,      // atomic order
                         llvm_i8_ty,      // synscope id
                         llvm_i8_ty,      // alignment
                         llvm_i1_ty,      // inbounds
                         llvm_vararg_ty], // indices for getelementptr insn
                        [IntrReadMem,
                         IntrArgMemOnly,
                         IntrNoCallback,
                         IntrNoFree,
                         IntrWillReturn,
                         IntrNoSync,
                         NoCapture <ArgIndex<0>>,
                         ReadOnly  <ArgIndex<0>>,
                         ImmArg    <ArgIndex<1>>, // volatile
                         ImmArg    <ArgIndex<2>>, // atomic order
                         ImmArg    <ArgIndex<3>>, // synscope id
                         ImmArg    <ArgIndex<4>>, // alignment
                         ImmArg    <ArgIndex<5>>, // inbounds
                        ]>;

That wouldn’t be sound even for loads from a quick glance, nothing says @consume_int doesn’t also have a copy of %ctx or some other pointer to (part of) the containing object; either @foofn would need more information about where %ctx comes from or it’d need to know about @consume_int’s implementation.

Oh… Of-course…, sorry, completely missed this. Thank you for the explanation.

Works as expected when I change @foofn definition to @foofn(ptr noalias noundef %ctx).