Multidimensional array indexing intrinsics

Linearized array addresses are an issue in loop nest transformation. To alleviate the same, a multidimensional array indexing intrinsics have been proposed recently in the llvmp-dev mailing list [1]. From the mailing thread, it looks like there is a consensus on using intrinsics for communicating dimensions [2].

While working with our own loop transformation framework, we did a similar work on using intrinsics in flang compiler. We are willing to work with the community and open source the implementation in flang compiler. Also we would like to extend the work to the backend to components like SCEV.

Any feedback/suggestions are most welcome,
thanks,
-Prashanth N R
Compiler Tree technologies

  1. http://lists.llvm.org/pipermail/llvm-dev/2019-July/134063.html
  2. http://lists.llvm.org/pipermail/llvm-dev/2019-August/134420.html

Hi Prasanth,

sharing your implementation would be awesome! Note that the flang
compiler is currently being reimplemented in a project called f18 [1]
and will use MLIR for code-generation which has an in-built
multi-dimensional types. However, your approach might still be
insightful for lowering MLIR to LLVM-IR (for further optimizations)
and frontends that do not target MLIR.

Michael

[1] https://github.com/flang-compiler/f18

Hi Prashanth,

Do you know how that would map to the mdspan proposal of C++? (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0009r9.html)
Should we split the stride from the bound, to represent sub-rectangles, or padding? (this would require 2 “size” per dimension, but it is quite common in video/image processing)

Should we require the stride to be multiple of one an other and in the decreasing order? (this could allow to encode transpose without having to move data around)

As for us, we went for an “underlying geometry” attribute/metadata, which takes a bunch of sizes.
It gives for each argument a hint on how to de-linearize the accesses (aka. the dimension’s sizes), and the address computation still uses gep.
But we still need a few assert that bound the indexes to not overflow their respective dimension’s upper bound, to avoid the gep delinearization to involve modulo.

It’s far from ideal, but it allows us to support legacy codes that were written in linearized form.
Our main issue is actually handling memcpy-like code, which have both a linearized address computation but also a linearized loop nest structure…