Applying the `nocapture` attribute to reference-passed arguments in Fortran subroutines

Motivation

I think the TSVC s314 program should be vectorized, but the Flang does not vectorize it. So I want to improve it. The s314 program is below:

      subroutine s314 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
c
c     reductions
c     if to max reduction
c
      integer ntimes, ld, n, i, nl
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
      real t1, t2, second, chksum, ctime, dtime, x

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s314 ')
      t1 = second()
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
         if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue
      t2 = second() - t1 - ctime - ( dtime * float(ntimes) )
      chksum = x
      call check (chksum,ntimes*n,n,t2,'s314 ')
      return
      end

Problem

The problem in this case is that the following program, which is a simplified form of the s314, is not vectorized.

   subroutine test (ntimes,n,a)
     integer, intent(in) :: ntimes, n
     real, dimension(n), intent(in) :: a
     integer :: i
     real :: x

     do nl = 1,ntimes
       x = a(1)
       do i = 2,n
         if(a(i) .gt. x) then
           x = a(i)
         end if
       end do
       call dummy(x)
     end do
   end subroutine

The program is not vectorized because the LICM pass cannot move the store instruction (x = a(i)) outside the inner loop due to the possibility that the actual argument x is captured by the dummy subroutine, which makes the program not thread-safe. Consequently, the subsequent LoopVectorize pass fails to match the pattern for reduction computation and vectorization is prevented.
In the C version of s314, x is not captured by the dummy function because arguments are passed by value. This allows the LICM pass to move the store instruction outside the loop, enabling vectorization by the subsequent LoopVectorize pass.

In Fortran, arguments to subroutines are passed by reference by default. However, if the actual arguments do not have the TARGET attributes, the language standard does not specify storing the address of a passed variable. In this case, the actual argument x in the program does not have the TARGET attribute, so I can interpret that the dummy subroutine does not store the address of x.

I think that this is equivalent to the nocapture attribute in LLVM IR. Furthermore, Fortran does not define multithreading behavior. Therefore, optimizations like moving the store instruction (x = a(i)) outside the loop should not pose any problems.
Based on these considerations, I think that this program can be safely vectorized. However, the current Flang does not perform vectorization.

Suggestion

To enable vectorization in this case, I think that adding the nocapture attribute to the x argument of the dummy subroutine in LLVM IR would be sufficient. With the nocapture attribute, the LICM pass could safely move the store instruction outside the loop, enabling vectorization. However, since this optimization is based on the Fortran language standard, it is not possible to add the nocapture attribute after the program has been converted to LLVM IR. I considered adding a corresponding attribute in FIR or MLIR, but I was unable to find any suitable attribute.

Question

  • Is adding the nocapture attribute to the argument x of the dummy subroutine at LLVM IR level a suitable approach for vectorizing s314? I don’t fully understand the condition for “Pointer Capture”, and I’d appreciate any advice if this approach is incorrect or if there are better alternatives.
  • If adding the nocapture attribute is the correct approach, are there any corresponding attributes in FIR or MLIR that are equivalent to nocapture in LLVM IR?

Welcome to the project, @s-watanabe314! And thank you for posting this!

I think propagating the nocapture attribute (as well as several other parameter attributes) to LLVM provides benefits. We just have to make it right.

I believe the only valid way of “capturing” a raw pointer used to represent a Fortran entity is through the pointer-target association. So I think, as long as the dummy argument does not have a TARGET attribute, the corresponding raw pointer cannot be captured by the callee such that it violates the LLVM nocapture definition. Even more, a raw pointer corresponding to !fir.ref<!fir.box<>> argument cannot be captured by the callee ever. @jeanPerier please correct me if I am wrong?

I do not think the raw pointers may be captured in the multithread execution context (the second paragraph of Pointer Capture), but @kiranchandramohan may know better.

Regarding the implementation:

  1. Please look for LLVM::LLVMDialect::getNoCaptureAttrName(). We can set the attribute for the parameters at MLIR LLVM dialect level.
  2. We can try to do it during Flang CodeGen phase, when the signatures of the functions are translated to LLVM operations/types.
  3. If we can identify whether the dummy argument is declared with TARGET attribute, then we can mark with nocapture all pointer parameters that appear from !fir.ref conversion. The pointer parameters appearing due to !fir.ref<!fir.box<>> and !fir.box<> conversions might be marked with nocapture always.
  4. The fir.runtime functions, in general, do not capture pointers, but there are exceptions (e.g. PointerAssociateScalar, PointerDeallocatePolymorphic, …).

One major question is whether we can reliably identify whether the callee’s parameters are declared with or w/o TARGET attribute. We can only do it if the procedure’s interface is explicit, but not in this case:

subroutine test(x)
  external :: callee
  real, target :: x
  call callee(x)
end subroutine test

@jeanPerier is there an existing way to distinguish these cases late during CodeGen?

Thanks for your reply, @szakharin

I’ll investigate the CodeGen process in Flang and the getNoCaptureAttrName() to understand how the nocapture attribute is applied. Thanks for the advice.

I believe that if the actual argument has a TARGET attribute, we need to check if the dummy argument has a TARGET attribute. However, I believe it’s acceptable to apply nocapture if the actual argument doesn’t have a TARGET attribute. Here’s why I think so:

  1. If the dummy argument does not have a TARGET attribute, the pointer will not be captured.
  2. If the dummy argument has a TARGET attribute, the language standard states that any pointers associated with the dummy argument
    become undefined when execution of the procedure completes. Based on this, I believe the pointer can be considered as not captured.
  3. If an attribute of the dummy argument is unknown, it will fall under either case 1 or 2, and it should be safe to assume nocapture.

Is it correct to apply nocapture if the actual argument does not have a TARGET attribute?

For a pointer to become defined in a procedure with an effective actual argument as its target, and remain a defined pointer to that target after the call, both the actual argument and the dummy argument must have either the POINTER or TARGET attributes, and the dummy argument must not be a VALUE. See 15.5.2.5 paragraphs 8-12.

The case of a TARGET effective actual argument being associated with an INTENT(IN) POINTER dummy isn’t specifically addressed in that text, but IMO should be viewed as a capture opportunity, because the dummy POINTER can be used as a target for a pointer assignment to a pointer that survives the procedure reference.

So: if VALUE, no capture; if a temporary is used, no capture; if both actual and dummy have POINTER or TARGET, possible capture.

As long as we are talking about setting the LLVM parameter attribute nocapture for a function, I think the attributes of the actual argument do not matter (unless we can find all callers). So for the initial implementation we should set nocapture only based on the information about the dummy arguments declarations, i.e. assuming that there is a call of the function that passes an actual argument with a TARGET/POINTER attribute.

@klausler, I think we should be careful with ASYNCHRONOUS dummy arguments too since Fortran 2018 (the argument address could be captured by MPI runtime for instance even if it does not have the TARGET attribute I think). Do you agree?
I am wondering about VOLATILE too, it is not lowering it yet, but I would likely not put nocpature of dummy that are VOLATILE.

Now, all these points at the Fortran level are about the data address. Many objects are actually pass via descriptor address, and in that case I do not know if there is a way in LLVM to convey the “nocpature” aspect to base address inside the descriptor.

Regarding descriptors, I think it is not possible to capture them in procedures written in Fortran as Slava mentioned, but a C programmer could do it via BIND(C). I think they should not given the standard does not really specify any lifetime of CFI descriptor address outside of the call itself. I do not know if we should be conservative about BIND(C) procedure where the implementation is not visible in the compilation unit (i.e., that may be implemented in C) or not.

Some of the flang runtime functions may also capture data address and descriptor addresses without making it explicit in the FIR interfaces (flang runtime is in C++, it does not come with a Fortran interface using TARGET attributes and al.). In general the runtime will not because it is stateless, but some runtime API have “cookies” where they could capture argument addresses (at least IO and the runtime to help lower array constructor have such cookies), so this should be double checked runtime API per runtime API.

Yes, ASYNCHRONOUS as well. VOLATILE probably not, but it’s so rarely used that it might be wise to treat it similarly.

Dummy data addresses captured in local NAMELIST groups or passed to the runtime will not survive the call.

Thanks everyone for your comments. I agree that we should base the decision on dummy arguments rather than actual arguments, as the attributes might differ for each call if we don’t know all callers.

Summarizing our discussion, here’s how we might determine whether to apply the nocapture attribute:

  • Internal Procedures:

    • Flang will not apply nocapture. The middle-end will check the procedure’s contents and apply it as needed (this is already implemented, so no changes are required).
  • External Procedures with Explicit Interface:

    • If the procedure is bind(c): possible capture
    • If any dummy arguments have pointer, target, volatile or asynchronous attributes: possible capture
    • If no dummy arguments have pointer, target, volatile or asynchronous attributes: nocapture will be applied
  • External Procedures without Explicit Interface: possible capture

Based on the above, I believe I should implement the nocapture attribute only when it’s explicitly clear that the dummy arguments in the interface block do not have pointer, target, volatile, or asynchronous attributes. Please let me know if anything is incorrect.

This condition implies that cases like TSVC s314 (mentioned earlier) cannot be vectorized. I can’t think of any other way to vectorize s314, but should I assume that this case is not vectorizable?

External Procedures without Explicit Interface: possible capture

As per Fortran 2023 15.4.2.2 (4.a), a dummy argument with pointer , target , volatile or asynchronous attribute requires the caller to use an explicit interface.

While the compiler cannot enforce it if the definition is in a different compilation unit, it is enforced by flang if the definition is inside the same file.

At least gfortran, nvfortran, and “classic flang” are also setting a clear precedent in assuming implicit interface args cannot be captured:

You can observed that the call to bug is optimized out below because both nocapture allows deducing that bar() call cannot modify x and that x.eq.1 is true.

subroutine test()
  integer :: x
  call foo(x)
  x = 1
  call bar()
  if (x.eq.1) then
    call as_expected()
  else
    call bug()
  end if
end subroutine

So, I think flang should still leverage that implicit interfaces do not capture argument addresses given the standard is very clear and it is not a brand new optimization.

As per Fortran 2023 15.4.2.2 (4.a), a dummy argument with pointer , target , volatile or asynchronous attribute requires the caller to use an explicit interface.

Based on this description, I understand that if an external procedure is implicit, its dummy arguments are considered to not have the pointer, target, volatile, or asynchronous attributes, and therefore, nocapture can be applied. Thanks for your comment!

I believe the implementation should be as follows. Is my understanding correct?

  1. If the external procedure is implicit, apply nocapture to all dummy arguments of the procedure.
  2. If the external procedure is explicit, apply nocapture to dummy arguments that do not have the pointer, target, volatile, or asynchronous attributes.

Sounds correct to me.

On top of that, you can add nocapture when passing descriptor addresses.

At what stage do you plan to add these attributes?

It’s probably obvious, but take care to ignore dummy procedures, alternate returns, and character lengths for assumed-length character arguments and the function result.

I’m a beginner with both flang and LLVM, so I don’t understand much. However, I received a comment suggesting that we might be able to add nocapture during flang CodeGen phase when replacing functions with LLVM operations. Therefore, I intend to look into the CodeGen process.
If anyone have any other good ideas or advice, please let me know.

Thank you for the additional information. I will keep these points in mind during implementation.

I forgot to mention bind(c) in the second condition. The correct condition should be: “If the external procedure is explicit, not bind(C), apply nocapture to dummy arguments that do not have the pointer, target, volatile, or asynchronous attributes.”

Codegen is not translating func.func to llvm.func directly. Maybe llvm-project/flang/lib/Optimizer/Transforms/FunctionAttr.cpp at main · llvm/llvm-project · GitHub that is already decorating func.func with LLVM attributes is a better place.

Note that we will probably need to also do it on indirect calls. In that case the attributes must be set on the call operation in LLVM IR. but last time I checked llvm.call MLIR operation was not able to carry llvm attributes. If you want to do that afterwards, that would also be a valuable contribution that can be done seperatly.

It’s probably obvious, but take care to ignore dummy procedures, alternate returns, and character lengths for assumed-length character arguments and the function result.

To clarify at the FIR level, I think you will only need to pay attention to not add “nocapture” on FIR arguments with function types:

  • when the argument has function type, it should not get nocapture because dummy procedures can always be assigned to procedure pointer, regardless of attributes.
  • alternate returns are encoded as an integer result in FIR, so nothing you need to do about.
  • character length are register arguments, so the nocapture also does not apply/is not needed in LLVM.
  • function results passed by reference: in FIR, you will not currently be able to make a difference, but that does not matter, they cannot be captured, “nocpature” will be added because they do not have func.func argument attributes (at the Fortran level, the symbol may have the TARGET attribute, but it is not set on the extra FIR arguments for it because it would be invalid to use a pointer pointing to the result after the function returned), so again, you should not need special handling at the FIR level to add the LLVM attribute here.

Thank you for your explanation. I’ve looked into where I might be able to add LLVM attributes to func.func. However, I’m concerned about determining the first condition of my implementation:

  1. If the external procedure is implicit, apply nocapture to all dummy arguments of the procedure.

After conversion to func.func, distinguishing whether the original program included an interface statement seems impossible. How can I determine this? Should I add the nocapture attribute earlier, perhaps during FIR generation?

Also, I noticed that if a subroutine with the same name is called from multiple subroutines within a single file, no error occurs even if the contents of the interface statements differ (Compiler Explorer). Where should nocapture be applied in such cases? If I add nocapture to func.func declaration, it would also apply nocapture to the call from test_2, which is undesirable. Therefore, I believe it’s necessary to add nocapture to the arguments of each fir.call or llvm.call. In other words, would the correct implementation be to add nocapture at the call, rather than at the declaration?

Thank you for summarizing this. While I’m not sure if it’s strictly correct, I will consider an implementation where nocapture is added to numerical values like fir.ref<f32> but not to procedures like fir.boxproc.

Thanks for being thorough here, that is an interesting example. Technically it is illegal (and flang emits a warning about the interface use mismatch), but I am not a fan that test_2 could be optimized in an unexpected way while it is providing an explicit interface. I tested, and classic flang is doing the optimization assuming nocapture there, but gfortran is not.

I am not a fan of adding a lot of LLVM attributes everywhere early in FIR, because it is supposed to be higher level, and ideally we would want to use some func.func concepts so thatr MLIR passes can rely on this info too. But we may need more info at the FIR level here.

We need to decide what to do for your example before I answer your other points about where to add those attributes.

@klausler, below is an extension of @s-watanabe314 example that is not 100% legal but is accepted by most compilers (flang emits a warning).

Do you think that the fact that the user did not bother giving a explicit interface in test_1 while it should have gives us the right to optimize out the else branch in test_2 ?

Classic flang seems to be doing the “bad optimization”, and as discussed above, preventing LLVM to “badly optimize” test_2 while optimizing test_1 would force us to give LLVM nocapture attributes at the call level instead of the function declaration level (which is not a huge deal, but will increase the IR size and maintenance and how we manage call interface information at the FIR level).

subroutine test_1 ()
  real :: x
  call callee(x)
  x = 1
  call bar()
  if (x.eq.1) then
   call expected()
  else
   call bug()
  end if
end subroutine

subroutine test_2 ()
  interface
    subroutine callee(x)
      real, target :: x
    end subroutine
  end interface
  real, target :: x
  call callee(x)
  x = 1
  call bar()
  if (x.eq.1) then
   call expected()
  else
   call could_happen()
  end if
 end subroutine

When one compiles that example with -pedantic, f18 reveals its knowledge of the explicit interface to callee in the warning it emits:

warning: The global subprogram 'callee' is not compatible with its local procedure declaration (incompatible procedure attributes: ImplicitInterface)

We should be consistent. If we can emit a warning about an interface, and hence “know” what the interface is, then it seems wrong to ignore that knowledge of the risk of a capture when optimizing. The compiler should heed its own warnings.

The compiler knows what the interface is, so should it determine whether each llvm.call instruction is nocapture or not?

If so, I have two questions.

The first is about the phase where nocapture attribute is added to the call. I thought I could add nocapture when converting from fir.call to llvm.call, but after conversion to MLIR, it seems impossible to determine whether the subroutine contained an interface. Therefore, I believe I need to determine whether the subroutine has an interface when the fir.call is initially generated and add the nocapture attribute at that time. In this case, both fir.call and llvm.call would require a new attribute (like arg_attrs). Is it acceptable to make these changes?

Another question, though perhaps not suitable for discussion here, concerns the current implementation of function declarations when different calls are made. For example, in the program shown earlier, the arguments of the callee declaration (func.func) do not have the fir.target attribute. It seems that the declaration is determined when callee is first called. In fact, if the order of test_1 and test_2 is reversed, the arguments of the callee declaration are given the fir.target attribute. At this point, if there are any optimizations suppressed by the fir.target attribute, incorrect optimization might occur. If this is acceptable, would an implementation that adds nocapture to the declaration at the point of the first call to callee also be acceptable?

Given Peter answer, with which I rather agree, yes we will have to decorate llvm.call instead of llvm.func in certain cases, but not all, we can start by limiting ourselves to the cases where we cannot be sure that the provided interface is correct: external procedures not defined in the current compilation unit. That is, at the FIR level, when the func.func has a body, or if it it comes from a module (which we can know from the mangling), it is OK to add the attributes on the func.func directly in the FunctionAttr pass.

For the other cases, we will need:

  1. In lowering, to set the Fortran attributes to the fir.call according to the call interface when it is a call to an external procedures not defined in the current compilation unit.
  2. In codegen, when lowering fir.call to llvm.call, to translate the Fortran attributes to LLVM nocapture as described in previous discussions.

This will also imply adding attribute to llvm.call arguments in LLVM MLIR dialect and codegen, which as mentioned above, is not supported currently (see this disccussion).