[RFC]: Improving FPMR Handling for FP8 Intrinsics in LLVM

Hi @nikic,
So I also tried the early-cse with sme instructions.
With only neon instruction the patch ⚙ D116609 [EarlyCSE] Allow elimination of redundant writeonly calls can remove redundancy, but when I add sme instructions it does not make any difference:

define void @test_fdot16_1x2_indexed(i32 %slice.0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zm, i64 noundef %fpm) {
entry:
  %slice = add i32 %slice.0, 7
  call void @llvm.aarch64.set.fpmr(i64 %fpm)
  call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(i32 %slice, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zm, i32 1)
  call void @llvm.aarch64.set.fpmr(i64 %fpm)
  call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(i32 %slice, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm, i32 0)
  call void @llvm.aarch64.set.fpmr(i64 %fpm)
  call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(i32 %slice.0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zm, i32 0)
  ret void
}

So I think your suggestion about describing “inaccessiblemem” more precisely by adding additional locations in llvm-project/llvm/include/llvm/Support/ModRef.h at f46c44dbc0d225277178cf5b6646a96f591fdeaa · llvm/llvm-project · GitHub will also need to be implemented.