The idea here is to improve performance of the index calculations of 1D and 2D arrays (as those are far more common than larger dimension arrays). It also improves the likelihood of the loop vectorizers in MLIR and LLVM deciding to use vector operations.
When a module is passed an array of unknown size, see Example 1 the below, the compiler will use the descriptor to determine the stride.
This pass that works on the generated MLIR code detects arguments that have unknown size in the first dimension, and if there are loops that use that argument. If so, the loop is duplicated, and surrounded by, essentially
if (stride(arg) == sizeof(elementtype)) new_loop else old_loop;
The new loop will the use a 1D array as an alias for the 1D or 2D array, and which now has a fixed stride known at translation to LLVM-IR and machine-code. This allows for better code in the loop - and at least sometimes, also allows the loop to be vectorized.
In SPEC-2017 roms_r, I’m seeing about 6% improvement - there’s still room for more improvement in this benchmark, but it’s definitely a step in the right direction. In an artificial benchmark, the improvement is more along the lines of 30-40% improvement.
There’s a previous discussion on this here:
There is also a github issue here:
A prototype of the code is available here:
(the build on debian fails due to some formatting of CLOptions.td)
module func3 contains subroutine func( a, b, n) real*8 :: a(:, :), b(:, :) integer :: n integer :: i,j do i=1, n do j=1,n a(j,i) = a(j,i) + b(j,i) end do end do end subroutine func end module func3