Poor optimization of memory access to (no-byval indirect) aggregate value arguments

Hello,

On a target with an ABI that passes aggregate value arguments indirectly (as pointers) without byval, I notice that the memory access to these arguments is optimized poorly. Consider this example with RISC-V:

Memory access to the in.p and out.p pointers remains in the critical loop, while I would have expected/hoped that LICM could promote these values to scalars in the loop body, moving the loads before the loop, and sinking the stores. (Also, the stores could be eliminated because they are not used.)

I can understand how LICM does not have sufficient information about the value arguments in the LLVM IR to be able to conclude there are no aliasing accesses and do the promotions.

As an experiment, I started with the LLVM IR that is generated by clang for this function, and I notice that if I add noalias attributes to the %in and %out arguments (the pointers to the value arguments), the load/stores for in.p and out.p do get promoted. Furthermore, if add calls to an llvm.lifetime.end intrinsic at the end of the function, the stores after the loop get eliminated:

So I now have the following questions:

  • Would it be sound for clang to generate a noalias attribute for pointer parameters that are used to pass an aggregate value argument indirectly? (It would seem so, since there is a private copy of the value argument: Compiler Explorer )
  • Is it allowed to call llvm.lifetime.end on a stack object allocated by a caller?
  • Did anyone explore improving LLVM IR generated by clang for indirectly passed aggregate value arguments?

Thanks!