We will definitely need some information at the FIR level - otherwise we will miss out on the opportunity to have more Fortran-aware alias information (this information would probably be generated during lowering from parse tree to FIR).
This approach will require more effort as we would have to design and implement FIR optimisations on top of FIR alias information. Ideally, we should work towards a design that will allow us to leverage LLVM middle-end optimisation, but which wouldn’t prevent us from implementing FIR optmisations later on (without having to re-design the aliasing info).
Could you elaborate - is there a limitation related to !noalias and !scope.aliasmetadata that I missed?
What’s your opinion?
TBAA works well for languages with strict type aliasing. I feel that that’s the reason why there’s little appetite to take this route (i.e. no strict type aliasing in Fortran). As for scoped.alias - isn’t the work on full restrict going to introduce more fined grained mechanism to track the required info? I feel that we will need something like that anyway - scoped.alias alone (in its current form) is not powerful enough.
With restrict, I guess that we would be assuming that everything is restrict in Fortran except for the corner cases that you listed (and probably a few more). For example, variables with the target attribute would be equivalent to regular C pointers in this sense. Would this approach be viable in your opinion?
The short answer is that neither TBAA nor restrict aliasing metadata are actually good matches for Fortran. (I’m aware of the provenance work but can’t speak to it.) These are both C-oriented alias information descriptions, of course. Fortran aliasing rules are significantly different (one might argue diametrically opposed) to C.
While the C-oriented metadata can be hacked to more or less work for Fortran, it’s never been the best approach.
FIR models Fortran, obviously, and it has always been the case that aliasing information would be present in FIR. As the optimizer is the last piece to really get any serious attention in flang, that work is not completed.
Looking further down the road, it doesn’t make much sense to me to jury rig something like TBAA and shoehorn it into a 1st class Fortran compiler. I’d expect a new Fortran metadata encoding to carry the information from FIR (high-level) to LLVM IR (low-level) without loss. Obviously all of this requires some work.
I’m not as well versed in this subject as some of the other contributors in this thread, but I believe the main difference is that C and many other languages has to assume that “almost anything can alias almost anything else”. Fortran as a language requires that “nothing shall alias”.
Below is the review of a document - I thought that would appear somewhere on the LLVM website, but google doesn’t seem to know how about it, which probably means it’s not actually being rendered onto a docs page. Either that, or my ability to google is worse than usual…
So for most programs in Fortran, just having a hard-coded “it doesn’t alias” should just work. It doesn’t ALWAYS, I’ve found. But that’s probably because programmers aren’t following the rules, rather than the Fortran compiler as such has problems “following its own rules”. I have not debugged my way to find out what actually caused the problem - I just compiled the bits I cared about with “doesn’t alias” and with the more conservative setting for the remaining source files.
If Fortran’s aliasing were as simple as “nothing shall alias”, things would be more simple. It is indeed the case that standard F’77 without vendor extensions was free of dynamic aliasing, but things are more complicated today, and it is possible now for alias analysis to yield both “false positive” and “false negative” results if not done correctly. You really don’t want to ignore aliasing in a program that uses standard pointers, and you really don’t want to always assume aliasing of pointers in a compilation that’s using inline expansion of procedure references.
Standard conforming F’77 programs are not free from dynamic aliasing just because they’re F’77. Any program that never uses a feature that allows for dynamic aliasing is safe from dynamic aliasing. This happens to be vacuously true for standard F’77 programs, but that’s a consequence, not a premise.
And, as I said, this compiler doesn’t have a F’77 or other language standard level mode.
I’m glad I woke this thread up from its sleep! At least we’re having a discussion on the subject. (And as they say, the best way to get a reaction on the internet is to post something that is wrong - although I don’t think I was strictly wrong, just a bit simple - I’ve been called worse!)
In my recent investigations on Spec shows that code generated is definitely being helped if LLVM understands that things aren’t aliasing. Like going from 2.6x slower to 1.2x slower (I’ll post a link to some more details in a bit, I just need to clean up and publish it)
Your general thought (though incomplete in details) heads in the right direction and is why representations really matter here and deserve consideration. An aliasing representation that constructs lists of “everything A does not alias” may be quite concise and compact in C, call it O(k) for small(-ish) k. In Fortran, it can trivially become O(n^2) on the set of variables. Conversely, a representation that constructs lists of “everything A may (or must) alias” is exactly the opposite for each language, obviously.
Thanks Eric. It’s difficult at this point for me to anticipate all the details that we will be running into. I have not started any implementation. Things will get clearer as we go and some of the approaches may have to be revisited. But I need to get started with something because right now we have nothing. There will be a significant emphasis on testing and the interface is well defined and will not change. This will enable on-going refinement of the implementation.
I believe the O(n^2) nature of LLVM’s alias.scope/noalias representation shows up in the increased compilation times reported by @tmislam in . So this metadata may indeed be impractical for Fortran programs. At the same time, there is currently only alias.scope/noalias and tbaa metadata available for the purpose of representing aliasing in LLVM IR ([Scenario 2] in @tmislam’s list above). And I do not think the full restrict proposal allows reducing the amount of metadata significantly.
Unfortunately, LLVM IR metadata cannot be used to represent something like “may/must alias” because of the general rule that metadata should be discardable without affecting correctness. I believe that is why noalias/alias.scope is defined that way.
I don’t see why it would not be possible to design a more suitable (scalable, etc.) metadata representation for Fortran alias analysis and write a pass that understands and can analyze that metadata representation. [Left as an exercise for the reader, as they say.]