Motivation
I’ve investigated performance issues about assignments and found 9 issues. They are filed in [Flang] Performance issues in assignments · Issue #121125 · llvm/llvm-project · GitHub.
I’m looking into #121127 related to array assignments for derived types. I found that array assignments (Fortran::runtime::Assign
) are slower than GFortran. I think inlining array assignments into scalar assignments in loops can be one of its solutions. However, Assign
also seems to affect the performance in some other cases. Therefore, I’m thinking of improving Assign
first.
Solution
My solution is composed of two parts.
- Improving assignments for array elements
- Assigns n-dim arrays in n-nested loops
- Improving assignments for components of derived types
- Introduces “copy routines” like GFortran
For array elements
The address calculation of array elements (Descriptor::Element
) is expensive. It reproduces the subscripts, calculates the offset, and returns the address. Runtime information (e.g., extents, strides) is accessed several times for each array element during those steps.
This is because the runtime doesn’t know how the arrays exactly are in advance. On the other hand, GFortran generates n-nested loops for n-dim arrays at compile-time, so reproducing subscripts isn’t necessary; the loop indices are sufficient for address calculation. Therefore, I’m considering using n-nested loops for n-dim arrays in Assign
under certain conditions.
- The shapes of the arrays are the same.
- The rank of the arrays is less than 7.
- This is probably meaningless. I’ve just introduced this not to complicate the runtime code.
In addition, some calls of Descriptor::Element
can be replaced with ones of Descriptor::OffsetElement
. Descriptor::ZeroBasedIndexedElement
is an example.
For components
If the arrays are of derived types, the number of memory accesses for runtime information increases. GFortran hard-codes the information as the operands of instructions in many cases. In other cases, a “copy routine” bound to the vtable is used. We can see that GFortran generates a routine named like __d_MOD___copy_d_Mytype
. (Please note that copy routines are not defined in the Fortran standard.)
I tried introducing such a routine and improved the performance in some cases. However, my idea is not fully developed at the moment, and I would appreciate your feedback.
I’m thinking of implementing the routine as a special type-bound procedure as follows:
module m
type t1
integer x
contains
procedure, nopass :: copy => .cpy.t1
generic :: copy_routine => copy
end type
contains
subroutine .cpy.t1(lhs,rhs) ! copy routine
type(t1) :: rhs
type(t1) :: lhs
lhs = rhs
end subroutine
end
The symbol of the routine is created in Semantics, and the routine is materialized in the initial lowering.
Although sequence types do not have type-bound procedures, their components are guaranteed to be contiguous in memory. Therefore, memmove
can be used to assign each element.
If the derived type is declared in modules, it would be easy to generate the routine as a module procedure. Otherwise, the derived type should have an explicit interface for the type-bound procedure. I’m not familiar with derived types, but I believe such a Fortran code violates the standard. Compiler Explorer
I’d like to ask whether copy routines could be allowed to violate the standard.
If not, is there any way to address this?