I’m interested in Flang’s optimization and would like to join the development of it.
I’m trying to catch up the status of the development, but I can’t find some information I want.
So let me ask some questions here.
Do you have roadmaps/milestones of the development of Flang’s optimization?
(e.g. optimizations for Fortran-specific features, array/loop optimizations)
Are there any features you’re working on?
Are there any target applications for Flang?
Which do you think optimizations such as loop vectorization should be performed mainly in, LLVM or MLIR(FIR/HLFIR)?
Nice to see your interest in Flang’s optimization.
There has been some work and interest in optimizations for Flang. This was primarily driven by a few benchmarks suites (like Spec, polybench, SNAP etc). As far as spec-2017 is concerned, the biggest outliers are with 527.cam4, 549.fotonik3d and 548.exchange2. The cam4 issue is likely to be fixed by HLFIR (by better insertion/removal of temp arrays for Array expressions), the fotonik issue by passing more alias information to LLVM, and the exchange2 issue by function specialization in LLVM (⚙ D145819 [FuncSpec] Increase the maximum number of times the specializer can run.).
At the moment, vectorization transformations are performed by LLVM. If there are cases where LLVM does not have enough information to do vectorization and that information is available in the FIR/HLFIR layers then we can think of performing vectorisation at the MLIR level.
In general, if you believe there is a transformation that will help Flang’s performance, you can write up a post in discourse and then proceed to implement if there are no serious reservations.
The rust-compiler team had a meeting about which optimizations should happen in the rust compiler and which are delegated to LLVM. It was roughly: if you can benefit from knowledge of rust semantics do it in the rust compiler.
As Fortran is about loops and arrays, there might be opportunities for optimizations in that area that LLVM cannot do.
At this moment, I’d like to join the implementation of alias analysis in FIR because it would be important for transformations in FIR.
I see that we’re waiting for “full restrict” in LLVM, but I don’t understand how alias analysis in FIR is going.
Please tell me some more information about it if you have.
Furthermore, we Fujitsu would like to work with you on Flang’s optimization actively, but it seems that Flang’s optimization is not high priority now.
I’ll see you in the next Technical Call and I’d like to discuss how we can work on it.
I’m afraid that I’m not an expert of Fortran and Flang at this point, but I’m happy to collaborate with you.
MLIR-level alias analysis for FIR/HLFIR or/and proper implementation of MemoryEffects interface for FIR/HLFIR operations is also important for enabling more MLIR optimizations in future.
The second part is propagating aliasing information from MLIR to LLVM IR so that LLVM middle/back-end can do more optimizations. For this I have hope to reuse “full restrict”, but for the time being we are using very naive TBAA metadata to allow LLVM disambiguating data and descriptor accesses.
@jeanPerier, @tblah and myself have been working on HLFIR enabling, and we are now at the point that Polyhedron, CPU2000, CPU2006 and CPU2017 benchmarks are compiling and passing with test data sets (except for 628.pop2). Besides functional support for some Fortran features that are not (and will not be) available with FIR lowering, HLFIR is our path toward generating efficient MLIR. So the next step here would be to analyse HLFIR vs FIR performance and make sure we produce same or faster code with HLFIR. My initial measurements show that we are far from that, so there is quite a bit of work investigating the benchmarks and classifying the issues. It could be that a big portion of these issues might be resolved by improving the MLIR-level alias analysis. So you may consider investing your time into performance analysis of the benchmarks as well.
Thank you for your comments, and I apologize for the late response.
I’m thinking of working on MLIR-level alias analysis for FIR/HLFIR and performance analysis.
I’m afraid I’m not familiar with FIR/HLFIR, so I think the first thing I must do is to research the concept and implementation of FIR/HLFIR and understand them.
After that, I’d like to join the improvement of Flang in earnest.
I started to investigate HLFIR performance, and I measured the performance of TSVC as a first step.
There was no problem in terms of alias analysis, but there was a weird performance issue. The assembly code of the innermost loop is the same as FIR lowering but the performance is reduced by 10% when HLFIR lowering is enabled.