[RFC] add nsw flags to arithmetic integer operations using the option -fno-wrapv

yus3710-fj · March 11, 2024, 11:30am

As we discussed in the following topics, we found that the nsw flag which stands for “no signed wrap” helps some optimizations. On the contrary, integer overflow is prohibited by the Fortran standard so the flags should be added to arithmetic integer operations as Clang does.

My prototype implementation seems to work well but I’d like to confirm the design has no problem. Mainly, I have two concerns about the implementation.

Options to switch the behavior

There was a discussion that we should add the nsw flag under an option because we’ve not determined the impact on optimizations yet. @tblah told me that I could try -Ofast, but I’m not sure -ffast-math is suitable for this case since it’s the option for floating-point operations. Fortunately, Clang (and Gfortran) have the options for this case, -f[no-]wrapv and -f[no-]strict-overflow. I think the flags can be added to the operations when -fno-wrapv or -fstrict-overflow is specified.
FYI: Gfortran also has the option -fno-wrapv-pointer and enables -fno-wrapv and -fno-wrapv-pointer if -fstrict-overflow is specified. On the other hand, Clang doesn’t have the option -fno-wrapv-pointer and that leads both options are almost the same. I think Flang will follow Clang.
Flang doesn’t accept these options at the moment so I should introduce them into Flang at first. Note that -fno-wrapv should be enabled by default but -fwrapv will be enabled by default until we determine the impact on optimizations.

$ flang-new -O2 s243.f -Rpass=vector -c
$ flang-new -O2 s243.f -Rpass=vector -c -fno-wrapv
/path/to/TSVC/vectors/loop/s243.f:14:7: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
$ flang-new -O2 s243.f -Rpass=vector -c -fstrict-overflow
/path/to/TSVC/vectors/loop/s243.f:14:7: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
$

Coverage of the implementaion

My implementation only works in LoweringBridge which lowers the parse tree to HLFIR, so the flags aren’t added to operations in transformation passes. IIUC, there isn’t a structure passing the information of options to them so the flags would be always added to operations. I’d like to make this RFC focus on introducing the options into Flang and adding the flags using the options and ignore this issue.

How to implement

There are also some tiny concerns about implementation.
I referred to the Clang implementation for my implementation and then Flang came to add the flags to operations in simple expressions. (FYI: I implemented the feature only in the gen function for binary operations in HlfirBuilder.) However, the flags aren’t added to some operations in strucured operations such as DoLoopOp and that means I have to add flags to operations everywhere they are created though I think it’s hard. So I’m going to split the patch into small pieces and post bit by bit. (Of course, your help is welcome.)
On the other hand, I found that I can apply the implementaion for the fast-math flag to that for the nsw flag and it seems to add flags to all operations easily in lowering.
IMHO, the feature should be implemented like Clang, but I’m not sure. So, I’d like your comments.

tblah · March 11, 2024, 12:07pm

The design looks good to me. Thanks for finding more appropriate flags.

For the coverage of the implementation, I would implement this by writing a pass to add the attribute and configuring whether that pass runs based upon the command line arguments. For example see https://github.com/llvm/llvm-project/pull/74598 and [flang] Set fast math related function attributes for -Ofast/-ffast-math by asb · Pull Request #79301 · llvm/llvm-project · GitHub which add attributes to functions. This way the pass can run late to catch all of the operations created late on. In particular, loop induction variable increments and array index arithmetic are only created very late in the pipeline.

yus3710-fj · March 12, 2024, 7:04am

Thank you for your comment.
Your idea looks good to me so I’ll try it.

yus3710-fj · March 19, 2024, 5:26am

My implementation seems to work well and the flags are added to all arithmetic operations.
If there is no objection to this RFC, I’ll create PRs next week.

yus3710-fj · March 28, 2024, 2:43am

I was preparing PRs but my opinion has changed.

Though I can add the flag to all arithmetic integer operations, I’m not sure all operations should have the nsw flag when -fno-wrapv is enabled considering the meaning of the option. It doesn’t mean that some operations shouldn’t have the flag. I’d like to say that whether some operations have the flag wouldn’t depend on this option.
IIUC, The option is related to the language specification. According to 10.1.5.2.4 in the Fortran 2018 standard, the execution of any numeric intrinsic operation whose result is not defined by the arithmetic used by the processor is prohibited. However, some internal operations such as DO-variable increment aren’t referred to. Therefore, I suspect that the flag should be added only to such numeric intrinsic operations when the option is specified. In other words, I would implement as originally planned.

To add nsw flag to DO-variable increment, I would introduce another new option -flang-experimental-integer-overflow into Flang. This option doesn’t add the flag to all operations to keep consistent with -fwrapv. (In addition, it is also because it’s hard to pass the information of options to transformation passes as I said before.)
-flang-experimental-integer-overflow will be enabled by default and deleted in the future. On the other hand, -fno-wrapv will also be enabled by default but -f[no-]wrapv will remain.

I would appreciate it if you could give me some comments.

tblah · March 28, 2024, 2:09pm

Sounds good to me, thank you for your continued work on this.

yus3710-fj · May 9, 2024, 11:58am

I created the PR for -flang-experimental-integer-overflow (adding nsw to do-variable increment).
On the other hand, it was found that the patch for -fno-wrapv (adding nsw to numeric intrinsic operations) had a problem. The patch added nsw to operations which can overflow. I’m considering the solution, but it would take some more time to investigate.

yus3710-fj · June 28, 2024, 6:59am

In the last call, I told that I was thinking of using integer range analysis and adding flags to operations that are found not to overflow. However, it was found that the idea wouldn’t work well for my target loop because the loop bounds aren’t constant.
So, I would try adding the flag only to index calculations of fir.array_coor or fircg.ext_array_coor. To keep the transformation safe, I would introduce some constraints on it.

The bit sizes of the values of the array bounds are less than or equal to those of the results of index calculations
The indices of the array are linear experssions (a*i+b)

I’d like to ask whether the constraints are sufficient.

yus3710-fj · July 10, 2024, 9:39am

Thank you for your comments in the last call. I’d like to organize my idea and provide some examples.
The targets of this implementation are indicies as the input of ext_array_coor, not operations that ext_array_coor includes internally. In other words, the new pass for this would run before FIRToLLVMLowering pass.
The following is an example of codes and generated LLVM IR.

The addition generating %14 in LLVM IR can have nsw, and this implementation would add nsw to it.

As I said, I would introduce some constraints on the inputs of ext_array_coor. The following code is an example that shouldn’t have nsw. (It’s not a practical code, but it would be accepted.)

In the LLVM IR, the addition generating %34 can have nsw while the one generating %28 cannot because i+128 as signed integer might overflow. AFAIK, however, there is no way to confirm it at the moment. I suspect programmers don’t recognize that OUT_OF_RANGE has a code which can overflow. Therefore, I conservatively think all operations in this index calulation should have no nsw even though index overflow is non-conforming.
This implementation would accept only the following index patterns. (a is an array, m and n are constants, and i is a variable.)

a(m*i)
a(i+n)
a(m*i+n)

As you know, however, this idea has an obvious defect. It ignores many patterns which can have nsw actually. (e.g. a((m+i)*n)) I started with the above conditions to make the implementation as simple as possible.

tblah · July 10, 2024, 11:15am

Thank you for explaining so clearly.

It seems I misunderstood before and this will involve some pattern matching on arguments to array_coor. @jeanPerier had concerns about this.

Are the particular index patterns you are allowing motivated by real code that does vectorise with other fortran compilers?

From my perspective, the problem here is that I don’t feel qualified to judge whether the non-conforming behavior is relied upon by Fortran programs in the wild. Perhaps if it could be demonstrated that other widely-used compilers already make similar assumptions, this would give some confidence that these transformations will not break existing (although non-conforming) programs.

I have seen real performance improvements from using -fno-wrapv and so I would be very happy if we can figure out something that could be enabled by default. Thank you for your work so far!

jeanPerier · July 11, 2024, 8:13am

Thank you @yus3710-fj for taking the time to illustrate.
My small concern with IR based pattern matching was that many thing may have happened on the IR before you try to add this flag, and I mainly had CSE in mind.

For instance, in the example below with a(i+128) = merge(i, 0, out_of_range(i,1_1)), the adds from the OUT_OF_RANGE and the one from the addressing could be fused.

In this case, the fact i+128 appeared in the addressing should imply that OUT_OF_RANGE add cannot overflow, so it is safe to add nsw anyway. But one could think of cases were CSE could merge the adds with OUT_OF_RANGE reachable and the addressing unreachable in case of overflow, in which case adding nsw on the CSEed add would be incorrect. Here is an example below (CSE currently does not happen, but nothing technically prevents it).

Compiler Explorer (godbolt.org)

You can however rule out my CSE concern by checking for single usage of the add results.

yus3710-fj · July 16, 2024, 8:14am

Thank you for your comment, @tblah.

Are the particular index patterns you are allowing motivated by real code that does vectorise with other fortran compilers?

I found one of the loops in TSVC (s243) is the case. Flang doesn’t vectorize it while GFortran does. Flang can vectorize it if index calculation (i+1) has the flag.

From my perspective, the problem here is that I don’t feel qualified to judge whether the non-conforming behavior is relied upon by Fortran programs in the wild. Perhaps if it could be demonstrated that other widely-used compilers already make similar assumptions, this would give some confidence that these transformations will not break existing (although non-conforming) programs.

I apologize if I have misunderstood your concerns.

I previously mentioned that the flag would always be added to index calculations regardless of the -fno-wrapv option. However, I don’t have concrete evidence to support this claim. It would be difficult to confirm other widely-used compilers do the similar things. Therefore, I now believe that the -fno-wrapv option is necessary to control the behavior of adding nsw to index calculations.

Then, the constraints could be relaxed considering the meaning of -fno-wrapv. The option means nsw is added to operations written in source codes explicitly. Therefore, the constraint regarding bounds of arrays wouldn’t be necessary.

yus3710-fj · July 16, 2024, 8:22am

Thank you for pointing it out, @jeanPerier.

You can however rule out my CSE concern by checking for single usage of the add results.

That looks good, but I suspect that this condition might be too restrictive. An index calculation shared with some arrays wouldn’t have the flag.
I’m considering checking if the uses of the addition are only for index calculations. Additionally, I’m contemplating whether the feature should be implemented in the LoweringBridge or immediately after it to avoid transformations on the IR.

tblah · July 17, 2024, 9:32am

I’ve investigated and it looks like many other compilers do perform this optimization. I modified the loop a bit to rule out any kind of intelligent integer range analysis and they all still vectorized the loop, and so I believe they must be assuming that i+1 does not overflow. This sort of reasoning has been used before when discussing enabling other optimizations by default.

I’m happy so long as Jean’s concerns are addressed.

yus3710-fj · July 22, 2024, 9:13am

I separated do-variable increments from the targets of -fno-wrapv and introduced a new option (-flang-experimental-integer-overflow) for them. However, I found that this approach does not align with the behavior of GFortran.

I conducted an experiment with a loop that requires nsw on the do-variable increment for vectorization. Compiler Explorer
Flang vectorizes the loop when -flang-experimental-integer-overflow is specified but not otherwise. On the other hand, GFortran vectorizes the loop when -fno-wrapv is specified but not when -fwrapv is specified.
This indicates that -flang-experimental-integer-overflow should be encompassed within -fno-wrapv. In other words, -fno-wrapv can add nsw to not only “numeric operations” in source codes but also other operations in IR.

Therefore, I think the following patches are needed:

Adding nsw to index calculations.
Introducing -fno-wrapv into Flang (disabled by default).
Integrating -flang-experimental-integer-overflow into -fno-wrapv.

yus3710-fj · August 30, 2024, 4:25am

I’ve investigated the behavior of -fno-wrapv of GFortran.
(Please note that I didn’t check the source code of GFortran. I only observed the behavior with some Fortran codes.)

In conclusion, GFortran seems to add the flag to all operations except for ones in arguments of intrinsic functions for bitwise comparison (BGT, BLE, and so on).
Perhaps, constraints are not necessary but the flag in arguments of such intrinsic functions has to be dropped.

code	`-fno-wrapv` (default in GFortran)	`-fwrapv`	notes
`a(foo(i)+1)`	addr calc is optimized	addr calc isn’t optimized	Compiler Explorer
`a(3*(i+1))`	addr calc is optimized	addr calc isn’t optimized	Compiler Explorer
`n+k > n`	transformed to `k > 0`	not transformed	IIRC, Fortran standard always permits this transformation.
`BGT(n+k, n)`	not transformed	not transformed

I’d appreciate it if you could give me some comments.

(BTW, should I create the design document about nsw and integer overflow in flang/docs?)

tblah · September 3, 2024, 2:31pm

Thanks for persisting with this @yus3710-fj.

Given that you have investigated gfortran in depth, and I tried the previous code example on all of the compilers on Compiler Explorer and found they all vectorize the code (apparently assuming overflow behavior), I think this is worth proceeding with creating a prototype so that we can try this transformation with some existing kernels to double check nothing is broken.

It would be ideal if you could do a similar investigation as you did for gfortran with another fortran compiler that is used more frequently in HPC applications. Perhaps you could provide the same results for Fujitsu’s compiler? From my perspective this isn’t a requirement but it was brought up on the call yesterday.

The fact that gfortran appears to make this assumption in many different cases leads me to think that the prototype does not need all of the restrictions you outlined (in cases where they make the implementation much more difficult as @jeanPerier pointed out).

I still think this should be restricted to arithmetic used for array addressing so that we are not making risky changes where we have no expectation for improvements to code generation. The other restrictions you outlined are welcome too if they make the implementation simpler.

@jeanPerier do you have anything to add regarding your concerns about the pattern matching for the restrictions or shall we proceed to a prototype?

yus3710-fj · September 5, 2024, 5:26am

Thank you for your comment, @tblah.

I did a similar investigation using Fujitsu’s compiler and found that our compiler also assumes that integer overflow never occurs in address calculations. However, n+k > n isn’t transformed to k > 0. Therefore, as you mentioned, nsw should be added only to operations in subscripts at this point.

yus3710-fj · September 26, 2024, 12:33am

I’ve created 3 patches for -fno-wrapv.

[1/3] [flang] add nsw to operations in subscripts by yus3710-fj · Pull Request #110060 · llvm/llvm-project · GitHub
[2/3] [flang][Driver] Add support for -f[no-]wrapv and -f[no]-strict-overflow in the frontend by yus3710-fj · Pull Request #110061 · llvm/llvm-project · GitHub
[3/3] [flang] integrate the option -flang-experimental-integer-overflow into -fno-wrapv by yus3710-fj · Pull Request #110063 · llvm/llvm-project · GitHub

yus3710-fj · October 9, 2024, 2:21am

As I mentioned in another thread, I found that nsw on loop bounds could help vectorization. The language standard doesn’t mention about integer overflow of DO statement parameters, but some compilers seem to assume they never overflow. I can say so because the following loop is vectorized without loop versioning.

subroutine s115 (a,n)
  integer n, i, j
  real a(n)

  do j = 1,n
    do i = j+1, n
      a(i) = a(i) + a(j)
    end do
  end do

end subroutine

j+1 can be less than j if j+1 overflows, so the loop should be versioned but actually not.

The following is the result of my investigation. I made parameters polynomial forms and checked whether the loop was vectorized.

	gfortran	ifort	ifx	frt (Fujitsu)	notes
initial parameters	vec	vec	vec	vec	Compiler Explorer
terminal parameters	vec	no vec	no vec	vec	Compiler Explorer
incrementation parameters	vec	no vec	no vec	vec	Compiler Explorer The loop was not vectorized by LLVM even if the operation has the `nsw` flag.

Considering this, nsw on calculations of initial parameters would be allowed. On the other hand, it is controversial to add nsw to terminal parameters and incrementation parameters. IMHO, flang-new could also allow nsw on them because some other compilers (not all) do so.

We agreed to add nsw only to address calculations to keep transformations safe. However, the above argument violates it. Therefore, I’d like to ask for your opinions.

Topic		Replies	Views
[RFC] Implement `Fortran::evaluate::ExplicitAdd` to find operations written in source codes Flang	0	126	June 6, 2024
The meaning of the options -f[no-]wrapv Beginners clang	0	100	March 26, 2024
ScalarEvolution "add nsw" question LLVM Dev List Archives	6	90	April 13, 2016
[RFC] Integer overflow flags support in `arith` dialect MLIR	20	785	February 21, 2024
Do we abuse the "nsw" flag LLVM Dev List Archives	3	86	May 7, 2013

[RFC] add nsw flags to arithmetic integer operations using the option -fno-wrapv

Options to switch the behavior

Coverage of the implementaion

How to implement

Related topics