A bit more than a year ago, I wrote a post in here about Flang-new compiling MGLET. This was a big achievement.
During the last year we have experimented a lot with OpenMP offloading in MGLET. We have learnt a lot and made very good progress in developing patterns for offloading the various operations that needs to be done in. Our approach in this work has been a top-down approach, in which we start with the fundamental design problems, data access and problems like memory management (mapping…) instead of the more traditional approach where one starts by offloading a few compute kernels to see how they perform.
While more or less every Fortran compiler out there can offload some nested DO loop structure using OpenMP, it turns out that most compilers really fall through when it comes to examples that leave the territory of a few nested DO loops in a single source file. Since there is a tendency for every compiler to have it’s own behavioral quirks and do things a bit differently in the space of OpenMP, we have established a series of tests we can run on a compiler/platform to check it’s behavior. This is useful, as otherwise, debugging unknown offloading issues inside a large codebase is terrible…
From our tests, the only compiler that can perform all the tasks we need correctly, is the Intel Fortran compiler. However, in a very impressive and string 2nd place comes the current LLVM flang! As far as MGLET is concerned, flang is only a few minor fixes away form being the second compiler that are able to offload MGLET, which is really impressive given that much of this support has been developed over the last year or so. My personal impression is also that the compiler is very well thought through.
I believe that there are only two issues preventing us from compiling the complete code with flang and OpenMP:
- An issue preventing use of many mathematical functions (issue 147027)
- An issue using modules that refer to REAL(10) in files containing offloading code (issue 146876)
For the latter issue i believe that there is already a pull request that will resolve it, which we eagerly wait for being merged.
At last I would also like to present a small wish: most compilers manage to implement some primitive write or print function that also wotks from an offloaded region, this makes a huge difference when debugging issues related to offloading. This would be real handy if we could also have in flang.
At last, thanks for the effort from all of the flang developers out there, it is a pleasure to follow your progress!