Hi F18 community
We've been developing on F18 for a few months now at Arm and are finding the codebase nicely accessible and well designed. The one thing we have really noticed though is the build time memory usage. We are really concerned about this and think that this could have serious implications for the viability of F18 as an LLVM project in future. We want to surface the problem and talk it through before sinking a lot of time into investigating mitigations.
The peak memory usage for compiling one of the files in F18 - parsing.cc - is around 5GB for a debug build. This is the worst single file for peak compile time memory usage, but there are many other files that use the parse tree that have 2Gb+ compile time memory usage. The link-time memory usage for any link with these resulting objects (so the f18 binary and any test binaries that need the parse tree) is then around 8Gb. Touching check-omp-structure.cc, which is a user of the parse tree, and doing an incremental build of F18 takes well over two minutes. Doing a similar exercise in clang (touching SemaOpenMP.cpp) takes 50 seconds. Comparing -fsyntax-only builds of both files (a common usage for text editor syntax highlighting, for example) shows the difference in compile time even more starkly - 90s for F18, 10s for Clang.
These files all use parse_tree.h, which heavily uses std::variant causing a lot of template instantiations. If you load up chrome://tracing in Google Chrome and import the attached file you can see graphs of where time is taken to build parsing.cc. When zooming in (press the zoom tooltip then click and hold and move up and down with the mouse) you can see that the time taken in the front end is almost exclusively taken instantiating std::variant templates, and the time taken in the backend is almost exclusively dead code elimination resulting from the huge amount of dead code generated. The heavy use of std::variant in the F18 parse tree means that building this file takes 2+ minutes and makes incremental builds around 2.5x slower than for clang.
In addition, a single threaded build of F18 consumes up to 5GB when compiling files then 8GB when linking. On most systems, this does not leave much scope for parallel building, which makes builds of F18 slower than they could be. A parallel build of F18 on Taishan (64-core AArch64 server node) using all 64 cores takes around the same wallclock time (about 5 minutes) as it does to build clang+llvm in the same way. A large proportion of the build time of clang is linking LLVM, which F18 does not do yet. F18 will need to link to LLVM and also MLIR for codegen, so we can expect the build time of F18 to increase significantly in future. There will be also more usage of the parse tree in the code generator, so more single files with high compile-time memory usage and high-memory links. I think we can predict that when F18(+LLVM) is complete, the build time will be significantly slower than the build time of Clang(+LLVM).
An aside, the parallel F18 on Taishan build uses 36Gb of RAM! The link stage of this build ends up being two parallel link processes consuming about 14Gb RAM between them.
So what could this all mean?
I'm sure we all know the general rule that slow incremental and debug build times generally hurt developer productivity and reduce the appeal of a codebase. Slow parallel builds also reduce turnaround time for CI testing, again hurting developer productivity. David Truby's recent mail about CI shows that we are limiting our choice of free CI services because of the large build resources needed to build F18.
Specifically for F18, we want it to be a fully supported LLVM subproject on a par with Clang. If enabling F18 hugely increases build time, then LLVM developers in the wider community who don't care specifically about Fortran, simply won't build it. The same goes for build bot maintainers, if it hikes up the turnaround time for a buildbot or reduces the frequency of runs then they will not build it for us.
The memory usage is perhaps even more challenging as it will greatly reduce the level of parallelism that the LLVM project can use for a build with F18 enabled because of these large compilations and links taking up multiple GB and that will slow down the whole project build, not just the F18 part. We are also placing a really high bar for developers to play with F18, basically you need a very solid dev machine or a cluster node to build it sensibly. You can't hack on it offline on your laptop on the plane or train. And there will be some folk who are simply left behind - we are turning away potential contributors (for example .)
If most developers are not going to build F18 as a matter of course this has a number of potential downsides:
* Most significantly, it will mean that if LLVMs API's, utilities or other behaviour changes, F18 is less likely to be automatically considered for fixing up at the same time by the same developer. F18 will be a second class citizen in this respect, it will increase the burden on the F18 community of doing this work and reduces F18's credentials as a fully supported F18 project.
* It reduces the diversity of platforms/toolchains routinely used to build F18 on which reduces the robustness of the top of master on these platforms.
As well as the ongoing impact, we think this is going to have a big negative impact when F18 code initially lands in LLVM. When the code lands, we can expect random folks in the community to just try and built it. What does it do for the reputation of F18 when it crashes on their machines because they only have 8Gb or 12GB memory available?
We believe that build time is scaling roughly linearly with every parse node added to the parse tree so this problem is likely to slowly get worse as F18's language support is built out and as it expands to cover new standards. As an empirical data point, David has been setting up his CI on his fork of F18 which he forked a few weeks back. Today he rebased to the latest top of tree and had to drop from -j24 to -j16, and even this runs out of memory occasionally. The drone.io node has 96 cores, so that is a lot of potential parallelism going to waste!
We are aware that this is a difficult point to make given that use of std::variant is key to F18's design, and a specific priority to prioritise the design over the resources needed  But if we can't find a way to mitigate this build-time memory usage it could be a serious threat to the success of F18 in LLVM. Is anyone else as concerned about this as we are?
parsing.json.gz (6.31 MB)