Transformations to aid optimizer for subroutines/functions with assumed shape arguments

Fortran supports dummy arguments with assumed shape. This permits passing arrays of different extents but enforces the rank, type and kind. Actual arguments can be strided arrays, such as an array section with a stride or a column in a 2-d array. This means that the passed array need not be contiguous (or unit-stride).

In many cases there will be loops that operate on these assumed shape arrays passed as arguments. The code generated for the loop will have to use the stride of the array to find the successive elements. The presence of the stride causes some optimisations like vectorisation to be missed.

Fortran compilers handle this in different ways,

  • Always generate code for unit-stride arrays in the callee with the assumed shape dummy. The code generated for the caller will involve creating a copy (that is contiguous) if the actual being passed to the dummy is not contiguous.
  • Always create copies in the callee with the assumed shape dummy. The copy is contiguous and hence the loops operating on the copy can be optimised. There is some overhead to fill the copy and setting the assumed shape dummy to the final value. But if there is a hot loop or several loops operating on the same array then there are benefits. Some compilers call this repacking arrays.
  • Create two copies of the callee with the assumed shape dummy. One copy will assume unit stride and the other for variable stride. The code generation can invoke the former if the actual arguments are contiguous and the latter for non-contiguous arrays.

There are possibly other ways to handle this. I wanted to check whether there will interest in having these kind of transformations in llvm/flang.

Note: These transformations are beneficial for benchmarks like roms_r in spec2017.

1 Like

At least one compiler will create a contiguous temporary on the calling side if the leading dimension of the actual argument is not contiguous, but doesn’t care about later dimensions. So it is not all-or-nothing.

There are advantages and disadvantages to using a temporary array for partial or complete contiguity on the calling side vs on the called side. A temporary on the calling side might turn out to be needless if the called side doesn’t exploit it enough to cover its cost (or at all). A temporary on the called side has to be made on every call and misses the opportunity of one temporary array serving multiple dynamic calls – consider the motivating example of a subroutine forwarding its incoming assumed-shape dummy arrays to another procedure in a loop.

Compile-time multi-versioning of callees has an obvious problem with scaling to multiple arguments, independent of whether the multi-versioning applies to temporaries made for contiguity or to the SIMD optimization of loops. At least runtime checking for contiguity is cheap, especially when only the leading dimension matters.

Last, users now have more control over the contiguity requirements in an explicit interface. Fortran 2008 added a CONTIGUOUS attribute that applies to assumed-shape dummy arrays as well as to pointers. It effectively forces the calling/called side decision to the calling side, and suggests that its absence should default to checking for and dealing with noncontiguous actual arguments on the called side.

2 Likes

It would be lovely to have control over these options, including size-dependent control like -heap-arrays[n].

One place where the compiler must not make a copy is when ASYNCHRONOUS is involved, because it leads to incorrect results if the asynchronous update happens to a copy of an array. The best example of this is MPI, such as MPI_Irecv, where the output buffer can be modified after the call returns.

1 Like

I’m taking a look at this - right now, just trying to figure out what is and isn’t “working reasonably” in flang when compared to classic flang and gfortran.

Goal is to come up with some sort of plan to make things better… :slight_smile: