Motivation
I think the TSVC s314 program should be vectorized, but the Flang does not vectorize it. So I want to improve it. The s314 program is below:
      subroutine s314 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
c
c     reductions
c     if to max reduction
c
      integer ntimes, ld, n, i, nl
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
      real t1, t2, second, chksum, ctime, dtime, x
      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s314 ')
      t1 = second()
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
         if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue
      t2 = second() - t1 - ctime - ( dtime * float(ntimes) )
      chksum = x
      call check (chksum,ntimes*n,n,t2,'s314 ')
      return
      end
Problem
The problem in this case is that the following program, which is a simplified form of the s314, is not vectorized.
   subroutine test (ntimes,n,a)
     integer, intent(in) :: ntimes, n
     real, dimension(n), intent(in) :: a
     integer :: i
     real :: x
     do nl = 1,ntimes
       x = a(1)
       do i = 2,n
         if(a(i) .gt. x) then
           x = a(i)
         end if
       end do
       call dummy(x)
     end do
   end subroutine
The program is not vectorized because the LICM pass cannot move the store instruction (x = a(i)) outside the inner loop due to the possibility that the actual argument x is captured by the dummy subroutine, which makes the program not thread-safe. Consequently, the subsequent LoopVectorize pass fails to match the pattern for reduction computation and vectorization is prevented.
In the C version of s314, x is not captured by the dummy function because arguments are passed by value. This allows the LICM pass to move the store instruction outside the loop, enabling vectorization by the subsequent LoopVectorize pass.
In Fortran, arguments to subroutines are passed by reference by default. However, if the actual arguments do not have the TARGET attributes, the language standard does not specify storing the address of a passed variable. In this case, the actual argument x in the program does not have the TARGET attribute, so I can interpret that the dummy subroutine does not store the address of x.
I think that this is equivalent to the nocapture attribute in LLVM IR. Furthermore, Fortran does not define multithreading behavior. Therefore, optimizations like moving the store instruction (x = a(i)) outside the loop should not pose any problems.
Based on these considerations, I think that this program can be safely vectorized. However, the current Flang does not perform vectorization.
Suggestion
To enable vectorization in this case, I think that adding the nocapture attribute to the x argument of the dummy subroutine in LLVM IR would be sufficient. With the nocapture attribute, the LICM pass could safely move the store instruction outside the loop, enabling vectorization. However, since this optimization is based on the Fortran language standard, it is not possible to add the nocapture attribute after the program has been converted to LLVM IR. I considered adding a corresponding attribute in FIR or MLIR, but I was unable to find any suitable attribute.
Question
- Is adding the nocaptureattribute to the argumentxof thedummysubroutine at LLVM IR level a suitable approach for vectorizing s314? I don’t fully understand the condition for “Pointer Capture”, and I’d appreciate any advice if this approach is incorrect or if there are better alternatives.
- If adding the nocaptureattribute is the correct approach, are there any corresponding attributes in FIR or MLIR that are equivalent tonocapturein LLVM IR?