Represent Fortran alias information in LLVM IR

Hi,

We, IBM XL Fortran compiler team, is interested in representing Fortran alias information in LLVM IR. We use the XL Fortran frontend to emit LLVM IR that includes alias information to feed to the LLVM in order to create object files. For the Fortran alias representation in LLVM IR, we considered both TBAA and ScopeAlias/NoAlias metadata approaches, we think that the ScopeAlias/NoAlias metadata is more appropriate for refined alias information for Fortran. The XL Fortran frontend emits the alias info in terms of what other symbols that a symbol alias to. We experiment a scheme that represents the alias relation in terms of noalias and scope alias metadata in LLVM IR. An example is shown in the attached slides and the full .ll file for the example is also attached.

In this experiment, we observe that the performance gain varies from workload to workload, and the extent can be from a few percent to 2X. The compile time and the size of the IR increase as well.

We briefly investigated the possible causes of the long compile time and the large IR size issues. For the compile-time performance, we observe:

  • Each alias query (ScopedNoAliasAAResult::mayAliasInScopes) involves partitioning a metadata set based on the domains of the metadata elements. One possible solution is that pre-partitioning the metadata sets and maintaining the partitions on updates can help.
  • Intersection of noalias sets is O(n^2) as metadata elements do not have any ordering. Defining some order on the elements can help significantly.
  • Some optimizations do not scale well when the size of the working instruction set increases, e.g. SCEV functions.

For the size of LLVM IR, the noalias metadata requires a flattened set of metadata nodes. A hierarchical representation can reduce memory footprint.

With these findings, we would like to start a thread to discuss how to express Fortran alias in LLVM IR. Any comments and information regarding any previous approaches are welcome.

Thanks,
Kelvin Li
Tarique Islam

Fortran alias in LLVM.pdf (213 KB)

example.ll (2.59 KB)

Hi,

any idea how aliasing rules in Fortran compare to ‘restrict’ in C/__restrict in C++ ?

If they are comparable, [0].[1] could maybe help ?

Greetings,

Jeroen Dobbelaere

[0] https://lists.llvm.org/pipermail/llvm-dev/2019-October/135672.html

[1] https://reviews.llvm.org/D68484

Jeroen Dobbelaere via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi,

any idea how aliasing rules in Fortran compare to 'restrict' in C/__restrict in C++ ?
If they are comparable, [0].[1] could maybe help ?

"restrict" was originally introduced into C to mimic the Fortran
aliasing rules. I am no expert on it but the fellow who wrote the
"restrict" proposal still works here. I should think Jeroen's work
would be applicable to representing Fortran aliasing rules.

                     -David

Hi Jeroen,

As far as I know, some dummy arguments in Fortran have the restrict property.

Thanks for the information. We were pointed to your patch during the last Flang technical call when we presented our work. We are still going thru it and we will definitely look into if it helps Fortran alias. By the way, the Flang technical meeting will continue to discuss Fortran alias topic next Monday (Apr 20, 2020 @ 11:30 EDT). Hope that you can join and provide your insight.

Thanks,
Kelvin

Hi Kelvin,

recently it came up twice that we might want to have alias information looking something like this:

llvm.assume(i1 true) ['noalias'(%ptrA, %ptrB)]

Which is supposed to mean that under the control condition of the llvm.assume,

anything derived from %ptrA will not alias anything derived from %ptrB.

We will have a separate RFC for this but I was wondering if it might benefit your use case.

Let me know if you need more information :slight_smile:

Looking forward to your thoughts!

Cheers,

 Johannes

Hi Johannes,

Thanks for the info. It sounds interesting. At this point, I am not sure if it helps or not. Could you provide more information? E.g. motivating examples, rationale etc. For the list after the llvm.assume call, does it allow a list of more than two items? How about if I have many calls, does it affect compile time?

Kelvin