This RFC aims to address challenges and concerns (e.g., disjoint development activities) in providing a functional code base for Flang via the main branch of the llvm-project.
Background
When Flang upstreamed to llvm-project, only the frontend (preprocessor, parser, and semantic analysis) went into the main repo. This retained the commit history for the project and post hoc reviews of the upstreamed code were expected. The work on this front-end has continued in llvm-project using the smaller, incremental commits that is normal LLVM practice. The middle-end (lowering from parse trees to the MLIR dialect for the Fortran intermediate representation, FIR, that was not initially upstreamed) has been actively developed in a “fir-dev” fork:
https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev
This codebase is heavily dependent upon MLIR for both Fortran-IR (FIR) and plans for an OpenMP dialect, and obviously on LLVM for the final lowering. Active work across all these components, and efforts to get a working Fortran 77 front-end implemented on a timely schedule, has expanded the "fir-dev” capabilities and has significantly extended the codebase as it currently exists in the main branch.
To date, the community has struggled to reach an agreement on how to best merge the critical functionality in “fir-dev” into the main branch. Unfortunately, the end targets of smaller commits and timely delivery of a working Fortran front end in the main llvm-project branch have been at odds. Furthermore, as time has passed this bifurcation has become detrimental to both Flang and established community practices. Therefore, we would like to propose the following approach to address and reduce the risks associated with the growing divergence across the code base.
Proposed Merge Strategy
The proposed strategy aims to provide a timely push of “fir-dev” capabilities into Flang’s main branch. Although not ideal, we feel it addresses both schedule risks and the overarching goal is to bring the community back together in working on a unified codebase. The proposal is to upstream groups of functionality with a revised history. One can imagine staging a sequence of multiple commits in git as a chain (say a handful of commits at most). This entire chain can be pushed in one go, creating a “history” or by splitting up the group into a few individual commits.
One potential grouping of functionality could be:
- FIR dialect and code generation,
- Optimization passes, and
- Lowering to FIR.
We welcome other suggestions for these groupings.
Unlike previous proposals, we would like to encourage timeliness by avoiding any initial source code modifications beyond organizing the merging of code into these functional groups. We think there are two potential approaches to upstreaming this code in the master branch. The first would handle each group independently as a single or few smaller commits. It may be difficult to completely test each of these individual commits until the overall upstream process is completed, however we will, at the very least, make sure the code still builds appropriately.
Alternatively, we could eliminate the three functional groups and roll up all the “fir-dev” changes into a single commit. Within the broader LLVM effort, there have been cases where very large, cross-cutting changes have been committed in one shot to avoid breaking functionality. The middle stages of Flang’s functionality would meet a similar goal.
The primary advantage of the 2nd (“everything at once”) approach is that it would provide a more timely working (F77) implementation of Flang and avoid a continued risk of a much longer duration and continued bifurcation of efforts across repositories.
We encourage the community’s thoughts on these two paths, any alternative approaches, etc.
A series of important steps need to happen as part of the incorporation of fir-dev into the master repo:
- We need to make certain that all build-bot functionality is sound and doesn’t negatively impact community-wide, LLVM regression checks. Our proposed approach would be to complete all this testing (by hand-running buildbots) before finally landing the fir-dev merged code upstream.
- Make certain those who have contributed to fir-dev have their contributions maintained in the git history. This potentially will require that we temporarily disable LLVM’s commit mailer to avoid an onslaught of email traffic across the community. We need to understand how best to achieve this. One alternative would be to capture and start a contributors list as part of the project if there is no clean way to retain history.
- It would be valuable to identify a small number of people from across the community (2-4 seems reasonable) to help coordinate and oversee this process – it is likely too much for a single person.
- As a final step of the proposed process, the “fir-dev” repository will be archived and all future development activities will use the llvm-project main repository.
We believe with this process, and the fir-dev code successfully merged in, the first functional Fortran 77 front end, and middle stage(s), can be completed and provide the community with a starting point for refactoring, adding new capabilities, and exploring additional opportunities for contributing to Flang and leveraging the broader LLVM code base.
We look forward to your feedback, suggestions, additions, etc.
Thanks,
—Pat McCormick, Los Alamos National Laboratory
—Steve Scalpone, NVIDIA