Performance analysis for TSVC

Thank you for your investigations.

There was discussion about improving flang address calculation for array indices here [RFC] Changes to fircg.xarray_coor CodeGen to allow better hoisting

IIRC the latest is that we want to try adding the no-signed-wrap flag to these calculations so that LLVM is more free to re-arrange these calculations and hoist more of it out of the loop. Unfortunately, I got pulled onto different work and haven’t had time to try this. I will get back to it but it won’t be in the near future, so feel free to pick it up if it has a higher priority for you (but check first if this does apply to your example because I was looking at a different benchmark).

The extra subtractions in flang’s calculations are required because arrays in fortran can have different starting indices and these ranges need to be adapted to match llvm 0-based indices. The hope is that with more information, LLVM could hoist the subtractions out of loops: resulting in simpler address calculations inside of the loops.

Another option would be to re-order the mathematical operations generated by flang to calculate the address so that they are easier for LLVM to optimize. This was rejected because commenters felt that LLVM should be able to do this without help.

There is more information about NSW (no signed wrap) in the LLVM language reference entries for integer arithmetic operations e.g. LLVM Language Reference Manual — LLVM 19.0.0git documentation

Support is already in upstream MLIR dialects: [RFC] Integer overflow flags support in `arith` dialect
[mlir][LLVM] Add nsw and nuw flags by tblah · Pull Request #74508 · llvm/llvm-project · GitHub

I made a start here: [flang][CodeGen] add nsw to address calculations by tblah · Pull Request #74709 · llvm/llvm-project · GitHub, the main thing still to do is adding nsw to loop index calculations.

2 Likes