Performance problems with FORTRAN allocatable arrays

I've noticed that LLVM does a bad job of optimizing array indexing
code for FORTRAN arrays declared using the ALLOCATABLE keyword.

For example if you have something like the following:

DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:slight_smile: :: QAV
...
ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) )
...
DO L = 1, 5
   DO K = K1, K2
      DO J = J1, J2
         DO I = I1, I2
            II = I + IADD
            IBD = II - IBDD
            ICD = II + IBDD

            QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) +

                                    5.0D0 * Q( II,J,K,L,N) -
                                            Q(ICD,J,K,L,N))

         END DO
      END DO
   END DO
END DO

Most of the code needed to calculate the address of QAV(I,J,K,L)
should be hoisted out of the loop since J, K, and L are constant
inside the loop. But I'm not seeing this happening because LLVM's
alias analysis cannot distinguish between the loads of the array
dimensions for QAV and the store to QAV(I,J,K,L). I've tried all the
alias analyses available in the standard distribution, including type
based analysis and scalar evolution. But if you think about it, the
array dimensions of QAV is 'metadata' and should not alias with any
actual accesses in the program. I've compiled the same code with GCC
and it was able to hoist most of the address calculations out as
expected. GCC was able to hoist address calculations for Q also.

This is an actual piece of code in SPECCPU2006 437.leslie3d and the
loop I analyzed is in line 1630 of file tml.f. 437.leslie3d suffers
horrible performance problems because of this and similar problems.

Is there anyway to enable this optimization? Is there a way to flag in
the IR that a particular locations is array dimension meta data?

Thanks,
Wonsun

Hi Wonsun, can you please provide a testcase.

Best wishes, Duncan.

Hi Duncan,

Here is the test case:

------------------------------------- snip -------------------------------------

      MODULE LES3D_DATA
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER IMAX, JMAX, KMAX, ND
      DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:slight_smile: :: QAV
      DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:,:slight_smile: :: Q
      END MODULE LES3D_DATA

      PROGRAM LES3D

      USE LES3D_DATA
      IMPLICIT REAL*8(A-H,O-Z)

      READ(5,*) IMAX, JMAX, KMAX, ND
      ALLOCATE(Q(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND,2),
     > QAV(-2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND))

      DO L = 1, 5
         DO K = 0, KMAX
            DO J = 0, JMAX
               DO I = 0, IMAX
                  QAV(I,J,K,L) = 2.0D0 * Q(I,J,K,L,1)
               END DO
            END DO
         END DO
      END DO

      stop

      END

------------------------------------- snip -------------------------------------

I compiled the above using the following commands:

llvm-gfortran -O3 <source name> -c -emit-llvm -o <bytecode name>

If you disassemble the bytecode, you'll see that the matrix address
calculations for QAV and Q are not hoisted out of the loop.

Thanks,
Wonsun